I'd like you to take a look at
my miner, designed to be understandable and easy to use.
First miner with GPU specific kernels, solved overspilling. Proper GPU OpenCL kernels (instead of just running CPU code on GPU). MIT license. Sort-of-C++11.
If you can add kernels for the coin of your choice that would rock!
Or, you could optimize the kernels since I speculate there's another 70% to squeeze out.