Download version 2011-07-11:
http://www.mediafire.com/?k404b6lqn8vu6z6This could be the last version, because there seems no more room for big jumps. I thought I could remove some more additions, but the OpenCL compiler does a better job than I

. This version is faster than all previous kernels (uses the least ALU OPs for 69XX and 58XX). Should also work with SDK 2.1. If it throws an error with 2.1, please post here and include the error message!
Thanks to all donators and your feedback!
Dia
Awesome work. Btw, could you explain why Vals[2] = C1; is needed in the beginning when it is reassigned in round 5? I know removing this line breaks the hashing but I'm not seeing why since it isn't used prior to round 5.