>As i said before, with those low temps, i know this also could go a lot faster. Energy usage confirmed that.
The bottleneck of CryptoNight PoW algo is that it requires loading small data blocks from random memory addresses, a lot of times. Because addresses are random, memory cache is useless. You see low temps because GPU core does not work all the time, most time it waits data from GPU memory due its latency. And it cannot be optimized because the algo was designed to use memory latency to be asic-resistant. Fury X has 4096 bit memory bandwidth, but it is useless for this algo because big bandwidth is not related to memory latency.
same for lyra2re (it looks like fury x performance is worse than 280x), but they are changing the algo (at least for vertcoin) in order to be more gpu-friendly, so that might change.
Did a quick Lyra2re test with stock settings and open source kernel.
Dunno where that cryptoblog got their results.. but they're way off..
here are my results.

Almost 1mhz. That's a serious difference than the 430Kh Cryptoblog wrote...
Energy usage

@Pallas, tried to run your binary but that froze up my rig each time.
it doesn't freeze up when doing that for Quark or any other algo.
Greetings