The paper says:
1The project webpage [37] claims Andersens optimizations be integrated into the miner, but the performance numbers are mainly unchanged since before the cryptanalysis appeared
All my code is right there to inspect and ready to be run if the authors don't trust my stated numbers.
What is the performance per Watt and performance per $ hardware comparing CPU and GPU now for reference miners?
IN another thread I commented
I have very limited data. A GTX980 with a TDP of around 160W was 5x faster than an i7-4790K with a TDP of around 80W. So the GPU appears to be between 2x and 3x more efficient in this case.
But since Cuckoo Cycle is memory bound, neither CPU nor GPU is going to be near their TDP.
So we really need to bench it with power measuring tools, which I'm lacking...
I don't know how to measure power usage of CPU and GPU. What I can measure is that the GPU is about 5x faster.
Was that running all threads of the i7 versus all compute units of the GPU? Did you maximize the # of instances each could do with its available compute units, i.e. normalize for if the GPU has 8 or 16GB as necessary to max out its FLOPS?
That's with maxing out either cores (CPU) or memory bandwidth (GPU).
I see you said you maxed out memory bandwidth, but what about trading some memory for 10X more computation until the memory bandwidth bound and computation bound (FLOPS) are matched?
The only known trade-off uses k times less memory but 15k times more computation *and* memory accesses.