Post
Topic
Board Announcements (Altcoins)
Re: [ANN] [SKC] Skeincoin | Skein-SHA2 | CPU mining | GPU miner available
by
reorder
on 02/01/2014, 23:10:34 UTC
Yes, that W[] array is moved (by compiler) to registers on GCN, but apparently on VLIW it is not and uses global memory, which is slow. This can be improved of course (and first of all it does not have to be 62 elements long, 16 elements is enough if you reuse them). Just wonder how have you managed to compile sha256_res(sha256_res()): it takes uint16 vector as parameter, but returns only one uint.

I've tried both
Code:
(sha256_res((uint16)sha256_res(as_uint16(skein512_mid_impl(state, msg)))) & 0xf0ffffff)
and
Code:
(sha256_res(sha256_res(as_uint16(skein512_mid_impl(state, msg)))) & 0xf0ffffff)

And it compiles, probably only getting wrong results. But it still enough for test, as sha256_res runs twice, maybe only with wrong input on second run Smiley

Besides, double Skein runs and 780MH/s on 5870, so SHA256 is current bottleneck for sure. With good sha implementation we will be able to reach even better performance, than SHA256D Cheesy


Casting uint to uint16 compiles on your system? Guess something is awfully wrong with it, then, it is against OpenCL spec (and common sense Smiley ). Anyway, it would be great if you manage to optimize sha256, I have only quickly thrown together something that worked for me and feel somewhat embarrassed now that it is public.