#define SPH_COMPACT_BLAKE_64 1 seems to give just a tiny bit more
Edited all .cl's; running experiment for uptime & stability; will report back.
Appears to add around 50Kh/s on x11.
seems like
#define SPH_LUFFA_PARALLEL 1 = 2%
#define SPH_COMPACT_BLAKE_64 1 = 1%
#define SPH_KECCAK_UNROLL 6 = 1%
substituted loops in groestl.cl = ~5%