I did some more HVC benchmarking of ccminer, varying the launch parameters of the hefty_gpu_hash kernel. I chose this kernel to tweak as the majority of the runtime is spent on it according to nvprof (due to stream synchronization after hefty and sha256 kernels are launched). I based block size on a multiple of SMs per card (e.g. 110 * 5 SMs on 750ti == 550).
Each launch config was tested 5 times over 5 minute intervals (25 minute total sample) at the hvc.1gh.com pool, and results were averaged. Note that I did see CPU validation failures, however both the average hashrate and accepted shares outweighed them, confirmed by the 1gh dashboard. My best configuration was 550 blocks x 768 threads per block (average khash/s rate is per 750ti; share metrics are for all six cards):
is default launch config.
+---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
| || blocks | threads | avg. khash/s rate | shares attempted | shares accepted | shares rejected | shares success % |
+=========++========+=========+===================+==================+=================+=================+==================+
| best || 550 | 768 | 16781 | 32 | 28 | 4 | 87 |
+---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
| default || 683 | 768 | 13987 | 17 | 16 | 1 | 94 |
+---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
| diff || -133 | - | +2794 | +15 | +12 | +3 | -7 |
+---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
Other than the launch parameter change, the miner code under test has no local modifications. I have, however, made a few changes to how the code is compiled:
- Using CUDA 6 RC
- Compiled with relocatable device code support, (--relocatable-device-code=true --compile, requires manual linking for both host and device objects)
- Removed maxrregcount to let compiler choose register count
The full data for all block configs can be found here:
https://docs.google.com/spreadsheets/d/1C6fSk0pkDXBFIzXselXDE8IJP26dj6grWAJxnRrHO3Y/edit?usp=sharingTests run on a system with the following specs:
https://gist.github.com/danryan/7c8762fda4d9783a58aeedits:
- added default block size baseline for comparison
- clarified block size calculation
- added ± diff comparison