Yup the new kernel doesnt work.
BAH!.. I'll look through it, tonight I am going to a Sublime with Rome and 311 concert... so this weekend.
1st question, how is 0x2004000U in line 170 computed? Currently I don't get it

.
Dia
Basically, since only the last bit is different between the 2 nonces W3.x and W3.y, the first calculation done on those values is P2:
P2(18) = rot(W[3],25)^rot(W[3],14)^((W[3])>>3U);
So, basically, instead of flipping Bit 0 on W[3] and calculating both W[18].x and W[18].y, we can calculate W[18].x and W[18].y will be the same besides bits 25 and 14 being flipped
P2(18).x = rot(W[3].x,25)^rot(W[3].x,14)^((W[3].x)>>3U);
W[3].y = W[3].x ^ 1, therefore:
P2(18).y = P2(18).x ^ (rot(1,25)^rot(1,14)^((1)>>3U));
so, P2(18).y = P2(18).x ^ 0x2004000U;
This is the first change that I implemented into my kernel, but it seems that only 69XX cards do benefit from that change. Will investigate further ...