They are the rounds of SHA-256... not a random number generator. I am now completely sure you are completely ignorant. You also talk about #defines like they are a completely new concept to you. Are you even a programmer?
Defines are not a new concept. My line of thinking was that yes having that section of code unroll was precisely the problem. Yes flat code can be nice when it executes in a single stack, but if you call out to a separate device with a single instruction it's faster in most cases than executing a bunch of things on the stack.
This is not my first FPGA project, but it is my second. My first being one I can't go into depth about, but the gloss of it was "Here is an OpenCL FPGA, we already use OpenCL on GPU farms in our datacenter. We are adopting the technology so that we can port our software to hardware and yield better performance for lower cost. Here's a manual, here's a devboard, we're going golfing your's truly management".
That project worked well, most of what we had ported well. It was almost straight across compiles in most cases. When I left that job I was able to take my devboard which I played with and decided to try it at mining LTC.
The rest has been explained here.
As for your comment about whether I'm actually a programmer or not, yes I am, but this experience is making me wonder if maybe I've started to age out. Now I look, it's pretty obvious where I made my mistake and I didn't check a fundamental assumption, I just assumed I knew. There is no excuse. That leaves me looking like a fool and I plan to leave this thread up and check it everytime before I post something "I just know will work".

Thanks for the info.