neoscrypt_vliw.cl v3
HD6970: 19.5KH/s to 50.5KH/s and no HW errors!
Thanks to a much better global memory management with burst writes. Also tried a trick with copying a workgroup buffer to local memory and writing back asynchronously to global memory through async_work_group_copy(), but it didn't work out (36KH/s and 15% of HW errors). A 2.5x speed increase is something to celebrate anyway. Anyone cares to donate towards drinks, whores and blackjack?
