Thanks for publishing your repo! Appreciated.
I'm not a C programmer (or OpenCL for the matter) but I'm a fan of DRY; so when I was reading input.cl I found the get_row() function and I think we can make it a little bit DRYer by doing something like this:
uint get_row(uint round, uint xi0)
{
uint row;
uint swp;
uint num;
#if NR_ROWS_LOG == 14
swp = 0;
#elif NR_ROWS_LOG == 15
swp = 1;
#elif NR_ROWS_LOG == 16
swp = 2;
#else
#error "unsupported NR_ROWS_LOG"
#endif
num = (40 << swp) - 1);
if (!(round % 2))
row = (xi0 & ((num << 8 | 0xff));
else
row = ((xi0 & (num << 16 | 0xf00)) >> 8) | ((xi0 & 0xf0000000) >> 24);
return row;
}
So, what do you think, @zawawa?
I don't know if this can be useful at all, but if you like it I can make a PR so you can merge the changes later.
I appreciate your enthusiasm and willingness to help, but I will keep the current code. With GPGPU, and especially with AMD OpenCL drivers, repeats are often better because you can keep register usage low that way, which is crucially important. My general approach toward GPGPU is that I sacrifice everything for performance, including readability.