Damn, that's slow. Seems to scale almost perfectly with hardware memory bandwidth when comparing with Claymore's AMD miner. R9 290X has 3.7x theoretical memory bandwidth compared to 750 Ti and does around 600 H/s. Surprise, 600 / 3.7 comes to around 162. Same story with 270X and it's rougly 2x mem bandwidth. Guess that's not entirely unexpected since there's a whole lot of global memory access going on with the cryptonight algo. Still poking at it but I doubt it'll improve much without C&C level voodoo magic and that's well beyond my skillset

Out of curiosity, what does a CPU miner push through for Cryptonight? If you are down in the hash per second vs. kilohash per second range then I'm left wondering if we have an algorithm which is truly better on CPU than GPU.
Kind of falls back on my Proof of Blockchain concept (linked below) to make it too hard for GPU (and especially ASIC) miners to out run a basic CPU miner.