Post
Topic
Board Beginners & Help
Re: CUDA Optimalized BTC miner for NVIDIA cards
by
icedev576
on 12/06/2013, 07:26:35 UTC
Can you explain the increase of hash rate? I thought the original miner got all could be achieved on sm30...
I'll start digging into the code, but I'd like to read high-level explanation.

Everything about the used register-count. If a CUDA code uses too many registers 2 things can happen: you can run the code on less threads, or you can tell the nvcc compiler to force use less register. There is no magic if your code needs 72 register, but you force the compiler use only 63 will mean 9 register will be in  local memory (available via spill loads), and this is slower then the registers.

So optimizing a CUDA code usually means write a code that can run with less register. On the other hand you can use a lot of special functions and types (I moved the input storing to __constant__ memory ). Memory reading and writing is slow, avoid unnecessary memory ops. And there are a lot of other tricks...