Post
Topic
Board Development & Technical Discussion
Re: Cuda scripts for point addition , multiplication etc
by
brainless
on 28/09/2022, 20:37:15 UTC
...
maybe you can find tweek python for better speed, but in my view, c++ or cuda, will do this process in maximum 2 seconds
write solutions here or at github issue area
https://github.com/onetrader2022/python-secp-compare


I tested this python compare script. I only added the time output. Here are the results:
Code:
start = 2022-09-27 16:58:32.784093
1000 2022-09-27 17:01:12.165173
2000 2022-09-27 17:03:50.849176
3000 2022-09-27 17:06:31.387720
4000 2022-09-27 17:09:12.735062
5000 2022-09-27 17:11:53.893209
6000 2022-09-27 17:14:32.976817
7000 2022-09-27 17:17:07.546195
8000 2022-09-27 17:19:46.752069
9000 2022-09-27 17:22:27.578180
10000 2022-09-27 17:25:03.305788
11000 2022-09-27 17:27:38.671975
12000 2022-09-27 17:30:17.019777
13000 2022-09-27 17:32:57.495925
14000 2022-09-27 17:35:35.705045
15000 2022-09-27 17:38:13.914685
16000 2022-09-27 17:40:52.500355
17000 2022-09-27 17:43:34.293753
18000 2022-09-27 17:46:18.734021
19000 2022-09-27 17:48:56.165592
20000 2022-09-27 17:51:35.212564
21000 2022-09-27 17:54:26.738358
22000 2022-09-27 17:57:09.514139
23000 2022-09-27 17:59:47.929463
24000 2022-09-27 18:02:22.697234
25000 2022-09-27 18:05:00.128646
26000 2022-09-27 18:07:41.784672
27000 2022-09-27 18:10:22.883876
28000 2022-09-27 18:13:03.262512
29000 2022-09-27 18:15:41.925345
30000 2022-09-27 18:18:21.313147
31000 2022-09-27 18:21:06.744216
32000 2022-09-27 18:23:49.502670
33000 2022-09-27 18:26:32.710866
34000 2022-09-27 18:29:16.496636
35000 2022-09-27 18:32:00.074711
36000 2022-09-27 18:34:43.907887
37000 2022-09-27 18:37:28.137582
38000 2022-09-27 18:40:07.641310
39000 2022-09-27 18:42:52.397884
40000 2022-09-27 18:45:52.935812
41000 2022-09-27 18:48:32.099146
42000 2022-09-27 18:51:11.315534
43000 2022-09-27 18:53:49.433403
44000 2022-09-27 18:56:31.338416
45000 2022-09-27 18:59:13.230555
46000 2022-09-27 19:01:54.819965
47000 2022-09-27 19:04:34.469577
48000 2022-09-27 19:07:12.553220
49000 2022-09-27 19:09:56.293709
end = 2022-09-27 19:10:23.538917



aprox 2 hours 15 minutes for 50k pubkey comparison , script generate addition in 200 pubkeys, total 10 million keys generate from 50k pubkeys, to verify
remember its simple addition, but for each key from 50k pubkeys,
here problem is python, uses 1 by 1 thread,
if this same tool in cuda would be done in 2 seconds
or maybe tool in c maybe do it less then 10 minutes

any c or cuda developer can help us to write such tools, for further research
thankx

Hello i wrote a custom Cuda library for point addition , multiplication etc  (adapted from Jean-Luc Pons one)...the use of pycuda (python) to launch the kernel is possible  
But can you explain what the purpose of the script you want to made?
Because that's right, that the speed of point addition will be around 1Gigakeys/sec on a RTX3070, only if you work with pubkey loaded in ram.


The main problem of what you want to do will be the speed bottleneck of reading the input file and writing the result in the output file on the HDD (1Giga Key are more than 32GB of size on the HDD )
So you can compute 1Giga Key in one sec with the GPU.but you will spend a long time to convert your keys sequentially in hexed-ASCII  and write the result (work done by the cpu) because a gpu can't write or read a file.

To resume
Convert input file to chunks of block in a grid of threads-> lauch kernel on the GPU -> return and convert the result
                              SLOW                                                               FAST                                    SLOW


purpose of the script is filter duplicate (not indentical), duplicate in series
example
file b have 10 pubkeys (from dec 1 to 10) and load file A in mem for compare, where have 10 pubkeys ( from dec 1000 to 1010)
file b load all pubkeys as point and start addition with 1, at first found is pubkey dec 1000, print it
from both files i will have final point is pubkey dec 1000

in this process
10 pubkeys start addition to reach 1000
10*1000 = 10000 keys process
but print will be only 10 pubkeys same as file B count of pubkeys

if file B have 1m pubkeys
process keys would be 1000m pubkeys total, but print result after compare first match would be 1m pubkeys

if you can take compare.py script to pycuda, maybe its greate to work around