Just to let you guys know, it is entirely possible to improve the OpenCL kernel of this miner by eliminating scratchpad registers (register spills). I am currently getting 795KH/s, a 30% increase in speed, with a RX 480 and a heavily modified kernel. It is also possible to optimize the generated GCN code and make the kernel even faster.
I am currently focused on Ethereum mining because it is more profitable, but I might release my custom NeoScrypt kernel when I have a chance.
cool