Post
Topic
Board Bitcoin Discussion
Re: Bitcoin puzzle transaction ~32 BTC prize to who solves it
by
b0dre
on 04/09/2025, 20:26:43 UTC
Hello, guys!
Ultra lightweight CUDACyclone is ready, speed is 1.3Gkeys/s on RTX4060.
Key feature - extremely low VRAM usage for rented gpu. Less than 500Mb VRAM on RTX4090.
It will work even if Vanity or Keyhunt doesn’t start.
And also this is a good studying sample for your education (why not)? Total 7 small files.
Link: https://github.com/Dookoo2/CUDACyclone

Great work, thanks for sharing!
Do you have any idea why there’s such a big performance difference between the old and new versions? For example, the old one hits ~1048.8 Mkeys/s in 9s using only 512MB VRAM, while the new one runs ~933.5 Mkeys/s in 71s using 3GB VRAM on the same RTX 3060.

Code:
./CUDACyclone_old --range 2000000000:3FFFFFFFFF --address 1HBtApAFA9B2YZw3G2YKSMCtb3dVnjuNe2 --grid 256,512
======== PrePhase: GPU Information ====================
Device               : NVIDIA GeForce RTX 3060 (compute 8.6)
SM                   : 28
ThreadsPerBlock      : 256
Blocks               : 8192
Points batch size    : 256
Batches/SM           : 512
Memory utilization   : 4.3% (512.2 MB / 11.6 GB)
-------------------------------------------------------
Total threads        : 2097152

======== Phase-1: Brooteforce =========================
Time: 9.0 s | Speed: 1048.8 Mkeys/s | Count: 8897329760 | Progress: 6.47 %

======== FOUND MATCH! =================================
Private Key   : 00000000000000000000000000000000000000000000000000000022382FACD0
Public Key    : 03C060E1E3771CBECCB38E119C2414702F3F5181A89652538851D2E3886BDD70C6

Code:
./CUDACyclone --range 2000000000:3FFFFFFFFF --address 1HBtApAFA9B2YZw3G2YKSMCtb3dVnjuNe2 --grid 256,512
======== PrePhase: GPU Information ====================
Device               : NVIDIA GeForce RTX 3060 (compute 8.6)
SM                   : 28
ThreadsPerBlock      : 256
Blocks               : 8192
Points batch size    : 256
Batches/SM           : 512
Batches/launch       : 64 (per thread)
Memory utilization   : 26.5% (3.08 GB / 11.6 GB)
-------------------------------------------------------
Total threads        : 2097152

======== Phase-1: BruteForce (sliced) =================
Time: 71.2 s | Speed: 933.5 Mkeys/s | Count: 70322919104 | Progress: 51.17 %%

================================= FOUND MATCH! =================================
Private Key   : 00000000000000000000000000000000000000000000000000000022382FACD0
Public Key    : 03C060E1E3771CBECCB38E119C2414702F3F5181A89652538851D2E3886BDD70C6


Interesting, that a new version is FASTER on a top-tier GPU like 5090. I have seen 9Gkeys/s On 5090 with —grid 1024,512. But another version of 5090 is slower, speed around 8.2-8.3 Gkeys/s. It depends on power consuption limit.
But I will check speed differences between versions. 4060 speed is the same both version.

Thanks for sharing your observations! On my side, I’ve noticed that memory utilization has increased significantly while speed has dropped on a 3060 GPU. It seems performance really varies a lot depending on the version and hardware.