May I ask you d3m0n1q_733rz, what do you do for a living? I just know a little of NASM and CISCA and this post seems so fancy already, it intrigued me into try to help, but I can't understand almost anything

I'm presently disabled. And I started programming in assembly as well. I have a degree in Network Systems Administration and I would like to find something along those lines in work, but so far not able to do so.
OpenCL isn't TOO difficult to learn, but I have trouble with the syntax of some commands like prefetch and the like. I'm thinking about tossing a prefetch or two into the code to see if it'll increase the speed by much. In particular, just before sharoundC to prepare K and ConstW if it's not already. And then another prefetch to call the parts of H when it's needed for Vals. I don't know if this could shave off a few cycles or not, but I plan to find out.
