Post
Topic
Board Development & Technical Discussion
Merits 3 from 1 user
Re: the fastest possible way to mass-generate addresses in Python
by
citb0in
on 01/01/2023, 18:01:47 UTC
⭐ Merited by ETFbitcoin (3)
but now you are comparing a library written in C against pure python, it seems not fair Smiley
The statement is justified. I didn't know until now that the iceland2k14/secp256k1 library is written in C++ and only imported into Python. I understand that the huge speed advantage of this library is due to the native implementation. Despite the justified criticism regarding security (iceland2k14/secp256k1 is closed-source because the source code is not publicly known), the library is nevertheless widely used by various developers and tools with success and satisfaction. Nevertheless, I will include this note in my original post, as I consider it important.

I'm pretty sure that more c functions you introduce in your python script, more fast it becomes.
This is understandable. Nevertheless, I am very curious and interested in running the so far simplistic and demonstrated code in the GPU using PyCuda or similar. I suspect that PyCuda could give very good results after all what I have read and understood so far on the PyCuda project website. So I'm still asking for helpful ideas and code suggestions to achieve the goal via GPU using PyCuda or similar.

And how can we measure speed. We all have different hardware. And Linux will be faster than Windows.
That's right. It would be fatal to compare my results with arulbero's, because we simply use too different hardware. A comparison in this context is only valid with significance if everyone compares between different program versions on its own hardware. As you can clearly see from the results posted so far, arulbero uses faster hardware than I do, for example. But that's not a problem, you just have to be careful to compare your own results with each other on your own rig and not with the results of other users. The posted results are just to illustrate what differences the different programs provide.

I see that you're using gen_private_key. This generates arbitrary secure random private keys, using urandom. urandom might be slow depending on your system - do you really need it? I assume whatever you're doing (brute force search?) will generate the private keys in some other fashion, so you can just set prvkey_dec to an integer representing the private key directly.
That's a good point as well, which I unfortunately overlooked and only became aware of when re-reading the thread. The library I originally used, FASTECDSA, really seems to be too slow for the purpose intended here, as it is too complicated. Accordingly, I have started another optimization attempt. Thereby I don't use fastecdsa anymore for the randomly generated private keys, instead I use Python's secure library:
Code:
secrets.randbelow(2**256)
This resulted in a performance boost of approximately +30%

using fastecda, 1 thread ==> 29.681 sec
using secrets, 1 thread ==> 19.923 sec +32.9 %

using fastecdsa, 16 threads ==> 20.736 sec
using secrets, 16 threads ==> 13.612 sec +34.4 %

using fastecdsa, ProcessPoolExecutor, 1 core ==> 28.961 sec
using secrets, ProcessPoolExecutor, 1 core ==> 19.921 sec +31.3 %

using fastecdsa, ProcessPoolExecutor, 16 cores ==> 5.120 sec
using secrets, ProcessPoolExecutor, 16 cores ==> 3.744 sec +36.9 %

To achieve this further speed advantage, the following change is required:

- removing the fastecdsa library by removing those two lines from the top of the python code :
Quote
#!/usr/bin/env python3
import fastecdsa.keys as fkeys
import fastecdsa.curve as fcurve

- instead we import the Python library "secrets" :
Quote
import secrets

- we replace the randomly generated 256-bit private key command:
Quote
  # Generate a NumPy array of random private keys using fastecdsa
  private_keys = np.array([fkeys.gen_private_key(fcurve.P256) for _ in range(start, end)])


  # Generate a NumPy array of random private keys using "secrets" library
  private_keys = np.array([secrets.randbelow(2**256) for _ in range(start, end)])

here's the updated complete code variant with using multicore functionality and the secrets library for enhanded speed:

Code:
#!/usr/bin/env python3
# 2023/Jan/01, citb0in_multicore_secrets.py
import concurrent.futures
import os
import numpy as np
import secrets
import secp256k1 as ice

# how many cores to use
#num_cores = 1
num_cores = os.cpu_count()

# Set the number of addresses to generate
num_addresses = 1000000

# Define a worker function that generates a batch of addresses and returns them
def worker(start, end):
  # Generate a NumPy array of random private keys using "secrets" library
  private_keys = np.array([secrets.randbelow(2**256) for _ in range(start, end)])

  # Use secp256k1 to convert the private keys to addresses
  thread_addresses = np.array([ice.privatekey_to_address(2, True, dec) for dec in private_keys])

  return thread_addresses

# Use a ProcessPoolExecutor to generate the addresses in parallel
with concurrent.futures.ProcessPoolExecutor() as executor:
  # Divide the addresses evenly among the available CPU cores
  addresses_per_core = num_addresses // num_cores

  # Submit a task for each batch of addresses to the executor
  tasks = []
  for i in range(num_cores):
    start = i * addresses_per_core
    end = (i+1) * addresses_per_core
    tasks.append(executor.submit(worker, start, end))

  # Wait for the tasks to complete and retrieve the results
  addresses = []
  for task in concurrent.futures.as_completed(tasks):
    addresses.extend(task.result())

# Write the addresses to a file
np.savetxt('addresses_1M_multicore_secrets.txt', addresses, fmt='%s')