Post
Topic
Board Mining (Altcoins)
Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]
by
scriptfu
on 09/04/2014, 19:00:33 UTC
I did some more HVC benchmarking of ccminer, varying the launch parameters of the hefty_gpu_hash kernel. I chose this kernel to tweak as the majority of the runtime is spent on it according to nvprof (due to stream synchronization after hefty and sha256 kernels are launched). I based block size on a multiple of SMs per card (e.g. 110 * 5 SMs on 750ti == 550).

Each launch config was tested 5 times over 5 minute intervals (25 minute total sample) at the hvc.1gh.com pool, and results were averaged. Note that I did see CPU validation failures, however both the average hashrate and accepted shares outweighed them, confirmed by the 1gh dashboard. My best configuration was 550 blocks x 768 threads per block (average khash/s rate is per 750ti; share metrics are for all six cards):

Code:
‡ is default launch config.
+---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
|         || blocks | threads | avg. khash/s rate | shares attempted | shares accepted | shares rejected | shares success % |
+=========++========+=========+===================+==================+=================+=================+==================+
| best    ||   550  |   768   |       16781       |        32        |       28        |        4        |       87         |
+---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
| default || ‡ 683  |   768   |       13987       |        17        |       16        |        1        |       94         |
+---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
| diff    ||  -133  |    -    |       +2794       |       +15        |      +12        |       +3        |       -7         |
+---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+

Other than the launch parameter change, the miner code under test has no local modifications. I have, however, made a few changes to how the code is compiled:
  • Using CUDA 6 RC
  • Compiled with relocatable device code support, (--relocatable-device-code=true --compile, requires manual linking for both host and device objects)
  • Removed maxrregcount to let compiler choose register count

The full data for all block configs can be found here: https://docs.google.com/spreadsheets/d/1C6fSk0pkDXBFIzXselXDE8IJP26dj6grWAJxnRrHO3Y/edit?usp=sharing

Tests run on a system with the following specs: https://gist.github.com/danryan/7c8762fda4d9783a58ae

edits:
  • added default block size baseline for comparison
  • clarified block size calculation
  • added ± diff comparison