Search content
Sort by

Showing 19 of 19 results by scriptfu
Post
Topic
Board Mining (Altcoins)
Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]
by
scriptfu
on 23/04/2014, 18:48:59 UTC
Is the new cudaminer available? I can't find it. Can someone share a link? I'm looking forward to see issue with pci risers x1 fixed, there is 20kh difference now between cards with same settings on x16 and x1

I was able to close the gap between x16 and 1x risers a little more by adjusting the batch size parameter. I found that, by choosing a smaller number than the default (1024), I could yield an additional 10.86 khash/s (+3.9% increase) from each card. Batch sizes that were a factor of the scrypt kernel thread count (768) seemed to achieve the highest speedup. I also ran tests with different launch configs but found little difference. YMMV Smiley

Here are some benchmarks I ran against a single EVGA Superclocked 750ti (factory default) on a 1x USB riser, 10 minute averages:

Code:
cudaminer --algo scrypt --benchmark -b $BATCH_SIZE --launch-config T10x24 --interactive 0 --texture-cache 1 --single-memory 1 --hash-parallel 2 --lookup-gap 1 --time-limit 600

Code:
+-------+----------+
| batch |  khash/s |
+=======+==========+
|  192  |  284.475 |
+-------+----------+
|  384  |  284.153 |
+-------+----------+
|  128  |  282.400 |
+-------+----------+
|  768  |  281.336 |
+-------+----------+
|  256  |  280.745 |
+-------+----------+
|  512  |  279.143 |
+-------+----------+
|  1280 |  276.976 |
+-------+----------+
|  3072 |  276.763 |
+-------+----------+
|  2048 |  275.420 |
+-------+----------+
|  1024 |  273.614 |
+-------+----------+
|  1408 |  273.241 |
+-------+----------+
|  1536 |  272.916 |
+-------+----------+
Post
Topic
Board Mining (Altcoins)
Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]
by
scriptfu
on 11/04/2014, 20:54:31 UTC
After a lengthy trial and error attempting to improve the HVC hashrate of ccminer, I have concluded that the default configuration is optimal. Despite an increased hashrate and number of shares attempted, over a sufficiently long sample time the increase in device errors proved to be less successful than the default configuration.

Here are the results of running 24 hours, both a customized config and the stock one:
 
Code:
‡ is default launch config.
+---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
|         || blocks | threads | avg. khash/s rate | shares attempted | shares accepted | shares rejected | shares success % |
+=========++========+=========+===================+==================+=================+=================+==================+
| best    ||   550  |   768   |       16724       |        7812      |       6796      |      1016       |       87         |
+---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
| default || ‡ 683  |   768   |       13994       |        7584      |       7432      |       152       |       98         |
+---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
| diff    ||  -133  |    -    |       +2794       |        +228      |       -636      |      +864       |       -7         |
+---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+

@cbuchner1, you've got an incoming HVC transaction. Had I not mucked around with the settings, I would have earned myself that much extra HVC from all my rigs yesterday. Consider that my penance for ever doubting the Christians Wink Thanks for the hard work making the CUDA miners so awesome!
Post
Topic
Board Mining (Altcoins)
Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]
by
scriptfu
on 10/04/2014, 16:59:28 UTC
How on earth did you manage that? We havent been able to get over 13Mh/s

Just by benchmarking various launch configs until I found one that worked well, in addition to the other changes I listed in my original post. I modified the hefty_cpu_hash function in cuda_hefty1.cu. Changes made are expressed in this diff: https://gist.github.com/danryan/6a631e0ece773e5f6788

this change is potentially dangerous as the total number of threads run on the GPU is not aligned with the "throughput" variable as used by the heavycoin scanhash function (passed in as the variable "threads" into the function you modified). This could lead to overlapping shares being found (same nonce leading to rejects), part of the nonce space to be skipped (not actually a problem), or buffers to be overrun (potentially serious).

You need to add some code to compute the throughput variable (=total number of GPU threads) based on device properties, e.g. in an early function call to the cuda_hefty1.cu module.

Christian


Oh geez, yep you're absolutely right. I had altered the loop pattern of the hefty_gpu_hash to consider grids of various widths: https://gist.github.com/danryan/fd2a078d33391f2179f7

Appears I neglected to commit this code when running my benchmarks. With the loop changed, I no longer got validations errors from alignment issues but also no longer get the performance increases I saw previously.
Post
Topic
Board Mining (Altcoins)
Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]
by
scriptfu
on 10/04/2014, 03:12:31 UTC
On the original version the program is using 683blocks and 768 threads per block.
threads = 524 288
threadsperblock = 768
blocks = dim3 grid((threads + threadsperblock-1)/threadsperblock); = 682

http://runnable.com/U0YK9Jzak4RoTzpU/ccminer-grid-dimensions-example-for-c%2B%2B
Post
Topic
Board Mining (Altcoins)
Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]
by
scriptfu
on 10/04/2014, 03:00:21 UTC
On the original version the program is using 683blocks and 768 threads per block.
With your modification it is using 32x15=480 and 768 thread/block
However the number of thread is 524288, which in my opininon in the reason why I get
"the does not validate on cpu" and why 683 got chosen, since it is just thread/thread_per_block
This gives me around 36MHash/s

Yes, your numbers are correct, though it is not as simple as dividing the number of total threads by desired threads per block. Not all of the 524288 threads can be executed simultaneously; max resident threads for 3.x-5.x devices is 2048/SM (10240 on 750 Ti for example). However, they can be scheduled, and are processed once resources become available as previous tasks complete.
 
I have the feeling it is faster because it throws away  a lot of things...

This is indeed what happens when you get the "does not validate" error. The CPU tries to recreate the hash one last time before submitting it as proof, and it gets dropped if it fails validation. Work in this case is simply trashed. I have not finished instrumenting the code fully to provide exact details. What I do have is verification from pools through higher reported hashrate (calculated from rate of valid shares) and in particular a correlated increase in valid share counts.

Would be interesting to have Christian opinion on that.
In there a way to decrease the number of thread ? (assuming it works) ?

Agreed, I will 100% defer to Christian on this subject Smiley

Post
Topic
Board Mining (Altcoins)
Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]
by
scriptfu
on 10/04/2014, 01:23:15 UTC
To what corresponds this 768, is this the number of cuda core of the 750ti ? (need to see how this can be updated to the 780ti).

Launching a CUDA kernel uses the following syntax (ignoring optional parameters for now):

Code:
kernel_name<<>>(kernel_function_args...)

768 is the number of threads launched per block. The 750 Ti has 640 cores (128/SM (multiprocessor), 5 SMs/card). The 780 Ti has 2880 cores (192/SM, 15 SMs/card). I used very a basic calculation, essentially choosing a block count that is some multiple of the number of cores. In the case of the 780 Ti, 100 * SM count, or 100 * 15 == 1500. I haven't looked closely at the 780's specs, so one might run into a limitation on how many blocks per grid the card can support. You should be able to glean additional information from the following references:


Post
Topic
Board Mining (Altcoins)
Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]
by
scriptfu
on 09/04/2014, 21:33:45 UTC

wow, i think at over 16Mh/s i would accept 1 validation error everything 12 accepted lmao

O.o   o.O Impressive! What was done to achieve these numbers?
Post
Topic
Board Mining (Altcoins)
Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]
by
scriptfu
on 09/04/2014, 20:03:11 UTC
Can you add to your table what are the change between each lines.

Sure! I will update have updated my post accordingly.

Did you try on windows ?

I haven't because I do not have a Windows rig, and likely will not test this because I do not want to reimage or deal with Windows taking over my boot record Smiley See my diff in a previous post above for the changes I made. If you are capable of compiling this, I'd be very curious to see the results.

Must say I am a bit surprise bu the 23MHash/s. You should run a little longer to make sure everything is stable.

Configurations with the highest hashrates were stable enough to run in the sense that the program would not crash, however they were not stable enough to provide valid shares. For instance, 384 blocks x 768 threads @ 23213 khash/s attempted 27 shares, but only 16 were valid (less than half that of the 550x768 config).
Post
Topic
Board Mining (Altcoins)
Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]
by
scriptfu
on 09/04/2014, 19:54:44 UTC
How on earth did you manage that? We havent been able to get over 13Mh/s

Just by benchmarking various launch configs until I found one that worked well, in addition to the other changes I listed in my original post. I modified the hefty_cpu_hash function in cuda_hefty1.cu. Changes made are expressed in this diff: https://gist.github.com/danryan/6a631e0ece773e5f6788

Correct. I should have been more clear about that. Fixing the original post. Thanks for pointing that out!
Is this with or without the failed hashes included?

Could you clarify what you mean by failed hashes? If you're referring to ones that didn't pass CPU validation, yes they are included in the hashrate average, but they are not included in the share metrics (I care more about these, as these are the canonical numbers by which one gets credited for work).
Post
Topic
Board Mining (Altcoins)
Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]
by
scriptfu
on 09/04/2014, 19:10:07 UTC
Code:
+--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
| blocks | threads | avg. khash/s rate | shares attempted | shares accepted | shares rejected | shares success % |
+========+=========+===================+==================+=================+=================+==================+
|   550  |   768   |       16781       |        32        |       28        |        4        |       87         |
+--------+---------+-------------------+------------------+-----------------+-----------------+------------------+

almost 17MH/s for 1 750Ti?

Correct. I should have been more clear about that. Fixing the original post. Thanks for pointing that out!
Post
Topic
Board Mining (Altcoins)
Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]
by
scriptfu
on 09/04/2014, 19:00:33 UTC
I did some more HVC benchmarking of ccminer, varying the launch parameters of the hefty_gpu_hash kernel. I chose this kernel to tweak as the majority of the runtime is spent on it according to nvprof (due to stream synchronization after hefty and sha256 kernels are launched). I based block size on a multiple of SMs per card (e.g. 110 * 5 SMs on 750ti == 550).

Each launch config was tested 5 times over 5 minute intervals (25 minute total sample) at the hvc.1gh.com pool, and results were averaged. Note that I did see CPU validation failures, however both the average hashrate and accepted shares outweighed them, confirmed by the 1gh dashboard. My best configuration was 550 blocks x 768 threads per block (average khash/s rate is per 750ti; share metrics are for all six cards):

Code:
‡ is default launch config.
+---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
|         || blocks | threads | avg. khash/s rate | shares attempted | shares accepted | shares rejected | shares success % |
+=========++========+=========+===================+==================+=================+=================+==================+
| best    ||   550  |   768   |       16781       |        32        |       28        |        4        |       87         |
+---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
| default || ‡ 683  |   768   |       13987       |        17        |       16        |        1        |       94         |
+---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+
| diff    ||  -133  |    -    |       +2794       |       +15        |      +12        |       +3        |       -7         |
+---------++--------+---------+-------------------+------------------+-----------------+-----------------+------------------+

Other than the launch parameter change, the miner code under test has no local modifications. I have, however, made a few changes to how the code is compiled:
  • Using CUDA 6 RC
  • Compiled with relocatable device code support, (--relocatable-device-code=true --compile, requires manual linking for both host and device objects)
  • Removed maxrregcount to let compiler choose register count

The full data for all block configs can be found here: https://docs.google.com/spreadsheets/d/1C6fSk0pkDXBFIzXselXDE8IJP26dj6grWAJxnRrHO3Y/edit?usp=sharing

Tests run on a system with the following specs: https://gist.github.com/danryan/7c8762fda4d9783a58ae

edits:
  • added default block size baseline for comparison
  • clarified block size calculation
  • added ± diff comparison
Post
Topic
Board Mining (Altcoins)
Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]
by
scriptfu
on 06/04/2014, 02:15:48 UTC
I did some benchmarking and tweaking of the ccminer code and was able to squeeze out a 5% hashrate increase mining HVC out of my 750 Ti rig (+800 khash/s per card). Not a big advancement but I think it's pretty good for a guy whose name is not Christian Cheesy

I had initially removed support for all Compute versions except CC5.0, but I was able to get CC2.0+ compiled. Alas, I have no way to test whether this fork will work with CC3.5 and below, or on Windows for that matter. Therefore I can make no guarantee of your success if your rig uses either or both.

If you're brave, you can checkout my fork or view the full diff of changes..

Summary of changes:

* Compiled with CUDA 6 RC
* Made modest changes to hefty1 kernel. Honestly not sure these even made a difference; the original code from the C+C hash factory was already damn near perfect Smiley
* Changed code compilation
  * relocatable device code support
  * explicit linking via nvlink
* Removed maxrregcount to let compiler choose register count

My testbed specs:

OSUbuntu 13.10 x64, 3.13.6 kernel, NVIDIA 334.21 driver
CPUIntel Pentium G3220 @ 3.00GHz (2 core)
MotherboardMSI Z87-GD65
RAM4GB DDR3 PC1333
GPUs(6) PNY 750 Ti OC (stock, no mods, all 1x risers)
Risers(6) 1x PCIe via USB 3.0 risers (slim)

And some performance metrics:

BeforeAfter
Hashrate/card13400 khash/s14200 khash/s
GPU RAM usage186MiB200MiB
GPU Temp (avg)55C56C

edited: more specs
Post
Topic
Board Mining (Altcoins)
Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]
by
scriptfu
on 27/03/2014, 05:42:14 UTC
Anyone having trouble with hvy.1gh.com?

The first 25 units work fine, but then everything after that is a 'boooooo'.

The same setup just with the different wallet settings works fine at heavycoin.com

Close and start again in short intervals until vardiff stops being drunk and goes home.
Takes me 2-3 attempts at worst but I've found that if you leave it alone and just wait through all the boooos, it fixes itself eventually.

^ This. 1gh's stratum implement is fragile. Restarting ccminer a couple times should do the trick. I've also found that it'll autocorrect after a few minutes.
Post
Topic
Board Mining (Altcoins)
Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]
by
scriptfu
on 27/03/2014, 05:28:22 UTC
root@arren-mining:/home/arren/cuda-extract/cudaminer-2014-02-28/cudaminer-src-2014.02.28# ./autogen.sh && ./configure && make
./autogen.sh: line 1: $'autoconf\r': command not found
root@arren-mining:/home/arren/cuda-extract/cudaminer-2014-02-28/cudaminer-src-2014.02.28#

Execute this in a terminal:

    sudo apt-get install automake

This should install automake and its dependencies, which include autoconf. This does not include gcc, CUDA toolkit, or other libraries which you may need. You did not specify whether you followed the README.txt guide that directed you to install those dependencies so I have omitted them.
Post
Topic
Board Mining (Altcoins)
Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]
by
scriptfu
on 14/03/2014, 19:26:24 UTC
another concerns (although it seems to work) does the usb 3.0 cable has been designed to carry large current ?
(like that I would assume data transfer requires a rather small intensity and usually usb device doesn't use lot of power)

AFAICT, the USB cable is solely for data transfer and does not carry any power to the riser. Looking at the USB risers I have, I see traces for pins 11, 14, 15, & 17 (B side), and 1, 11, 13, 14, 16, & 17 (A side). 17/B and 1/A are connected and considered one so that makes 9 used pins in total, perfect for the USB 3.0's 9 wire design, and leaves no room for power. Power pins are not connected. [edit] This is why USB 3.0 cables are necessary and USB 2.0 cables won't work. The former has 9 pins while the latter only has 4.
Post
Topic
Board Mining (Altcoins)
Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]
by
scriptfu
on 14/03/2014, 17:23:24 UTC
Even at the default TDP of 38.5w (4 cards @ 154w), that's still dangerously close to the limit. Try to connect as few cards as possible to each power lead, or use a higher rated connector designed for more power, like the 6-pin or 8-pin PEG connectors.

I don't get why some are voluntarily going from 38.5 W to 65 W for maybe 10-15% higher kHash/s rates.
This kills Maxwells excellent power efficiency.... and apparently also PSU cable connectors.

After unlocking the TDP (and before catching fire), I was able to overclock a card to 1450MHz core/3900MHz memory, resulting in a hashrate for vanilla scrypt of 340kh/s, and still at a core temp of < 65°C.

65w is still exceptionally efficient. For me, it was to see how far I could push the card. Sometimes that means you'll burn through a card or a cable, but it's all in the name of science (or something like that). It's an interesting experiment nonetheless.

That said, one could damage their rigs even without bumping the power thresholds, and that's the point I had hoped to get across.
Post
Topic
Board Mining (Altcoins)
Re: [ANN] cudaMiner - a new litecoin mining application [Windows/Linux]
by
scriptfu
on 14/03/2014, 16:45:13 UTC
PSA

I got a nice refresher course in power usage today. I was running 4x750 ti cards flashed with an unlocked TDP VBIOS, using powered USB risers, connected to a single 4-connector Molex cable. One unlocked card can pull 65.5w from the PCI express slot.  Typical molex connectors have a max power handling of 187w (11A). With 4 unlocked cards, I was shoving 262w@12v down one cable. The end result:

http://m.imgur.com/vDqGNmh,787qoci

Even at the default TDP of 38.5w (4 cards @ 154w), that's still dangerously close to the limit. Try to connect as few cards as possible to each power lead, or use a higher rated connector designed for more power, like the 6-pin or 8-pin PEG connectors.
Post
Topic
Board Computer hardware
Re: [WTB] 19" RackMount GPU Frame
by
scriptfu
on 05/03/2014, 04:27:58 UTC

Sweet science, I've been dreaming of a case like this. Please tell me you made them and are selling them. If so, I would like to place an order.
Post
Topic
Board Announcements (Altcoins)
Re: [GPUC] GPU Coin
by
scriptfu
on 04/03/2014, 07:32:36 UTC
http://www.webcountdown.de/?a=J3RcLku

-00 DAYS 03 HOURS 32 MINUTES 03 SECONDS