Post
Topic
Board Development & Technical Discussion
Merits 1 from 1 user
Re: the fastest possible way to mass-generate addresses in Python
by
citb0in
on 30/12/2022, 20:13:26 UTC
⭐ Merited by ETFbitcoin (1)
Thank you arulbero for your efforts. Very interesting. I was curious to see the comparison of random_generated_addresses and sequentially_generated_address as you showed with your code.

Following was my python program that used 16 threads and multithreading, but without / before the implementation of the ProcessPoolExecutor

Code:
#!/usr/bin/env python3
# 2022/Dec/26, [b]citb0in_multithreaded.py[/b]
import threading
import numpy as np
import fastecdsa.keys as fkeys
import fastecdsa.curve as fcurve
import secp256k1 as ice

# Set the number of addresses to generate and the number of threads to use
num_addresses = 1000000
num_threads = 16

# Divide the addresses evenly among the threads
addresses_per_thread = num_addresses // num_threads

# Create a list to store the generated addresses
addresses = []

# Define a worker function that generates a batch of addresses and stores them in the shared list
def worker(start, end):
  # Generate a NumPy array of random private keys using fastecdsa
  private_keys = np.array([fkeys.gen_private_key(fcurve.P256) for _ in range(start, end)])

  # Use secp256k1 to convert the private keys to addresses
  thread_addresses = np.array([ice.privatekey_to_address(2, True, dec) for dec in private_keys])

  # Add the addresses to the shared list
  addresses.extend(thread_addresses)

# Create a list of threads
threads = []

# Create and start the threads
for i in range(num_threads):
  start = i * addresses_per_thread
  end = (i+1) * addresses_per_thread
  t = threading.Thread(target=worker, args=(start, end))
  threads.append(t)
  t.start()

# Wait for the threads to finish
for t in threads:
  t.join()

# Write the addresses to a file
np.savetxt('addresses_1M_multithreaded.txt', addresses, fmt='%s')

Code:
$ time python3 citb0in_multithreaded.py
Quote
real   0m19,759s
user   0m36,834s
sys   0m6,259s

==> It took less than 20 seconds to generate 1 million addresses using 16 threads on my system.

Then I modified that code to use only one single thread:
Code:
#!/usr/bin/env python3
# 2022/Dec/30, [b]citb0in_singlethreaded.py[/b]
[...]
num_threads = 1
[...]

Code:
$ time python3 citb0in_singlethreaded.py
Quote
real   0m26,317s
user   0m26,309s
sys   0m1,568s

==> It took about 26 seconds to generate 1 million addresses using 16 threads on my system.

Then I modified the program to use ProcessPoolExecutor which distributes the load to all available cores and it's multithreaded. This gave me the best results:
Code:
#!/usr/bin/env python3
# 2022/Dec/30, [b]citb0in_multicore.py[/b]
import concurrent.futures
import os
import numpy as np
import fastecdsa.keys as fkeys
import fastecdsa.curve as fcurve
import secp256k1 as ice

# how many cores to use
#num_cores = 1
num_cores = os.cpu_count()

# Set the number of addresses to generate
num_addresses = 1000000

# Define a worker function that generates a batch of addresses and returns them
def worker(start, end):
  # Generate a NumPy array of random private keys using fastecdsa
  private_keys = np.array([fkeys.gen_private_key(fcurve.P256) for _ in range(start, end)])

  # Use secp256k1 to convert the private keys to addresses
  thread_addresses = np.array([ice.privatekey_to_address(2, True, dec) for dec in private_keys])

  return thread_addresses

# Use a ProcessPoolExecutor to generate the addresses in parallel
with concurrent.futures.ProcessPoolExecutor() as executor:
  # Divide the addresses evenly among the available CPU cores
  addresses_per_core = num_addresses // num_cores

  # Submit a task for each batch of addresses to the executor
  tasks = []
  for i in range(num_cores):
    start = i * addresses_per_core
    end = (i+1) * addresses_per_core
    tasks.append(executor.submit(worker, start, end))

  # Wait for the tasks to complete and retrieve the results
  addresses = []
  for task in concurrent.futures.as_completed(tasks):
    addresses.extend(task.result())

# Write the addresses to a file
np.savetxt('addresses_1M_multicore.txt', addresses, fmt='%s')

Quote
real   0m4,785s
user   0m54,832s
sys   0m1,992s

==> It took less than 5 seconds to generate 1 million addresses using all available cores (=16) on my system.

Now I change the settings to use only one single core...

Code:
#!/usr/bin/env python3
# 2022/Dec/30, [b]citb0in_singlecore.py[/b]
[...]
num_cores = 1
#num_cores = os.cpu_count()
[...]
np.savetxt('addresses_1M_singlecore.txt', addresses, fmt='%s')

Quote
real   0m26,271s
user   0m26,077s
sys   0m1,618s

==> It takes about 26 seconds to generate 1 million addresses using one single core on my system.

Well, so far so good. I showed this just to summarize things up. Afterwards I modified my code according to your suggestion so the private keys are not generated randomly but instead they are sequentially generated from a pre-defined starting point. Here's the modified version:

Code:
#!/usr/bin/env python3
# 2022/Dec/26, [b]citb0in_seq.py[/b]
import concurrent.futures
import os
import numpy as np
import secp256k1 as ice

# how many cores to use
num_cores = 1
#num_cores = os.cpu_count()

# Number of addresses to generate
num_addresses = 1000000

# Starting point (decimal/integer) for private key
starting_point = 123456789

# Define a worker function that generates a batch of addresses and returns them
def worker(start, end, starting_point):
  # Initialize the private key to the starting point
  private_key = starting_point

  # Initialize the list to hold to private keys
  private_keys = []

  # Generate a batch of private keys sequentially
  for i in range(start, end):
    # Increment the private key
    private_key += 1

    # Add the private key to the list
    private_keys.append(private_key)

  # Use secp256k1 to convert the private keys to addresses
  thread_addresses = np.array([ice.privatekey_to_address(2, True, dec) for dec in private_keys])

  return thread_addresses
  #return (thread_addresses, private_keys, start_int)

# Use a ProcessPoolExecutor to generate the addresses in parallel
with concurrent.futures.ProcessPoolExecutor() as executor:
  # Divide the addresses evenly among the available CPU cores
  addresses_per_core = num_addresses // num_cores

  # Submit a task for each batch of addresses to the executor
  tasks = []
  for i in range(num_cores):
    start = i * addresses_per_core
    end = (i+1) * addresses_per_core
    tasks.append(executor.submit(worker, start, end, starting_point))

  # Wait for the tasks to complete and retrieve the results
  addresses = []
  for task in concurrent.futures.as_completed(tasks):
    addresses.extend(task.result())

# Write the addresses to a file
np.savetxt('addresses_1M_seq_singlecore.txt', addresses, fmt='%s')

Quote
real   0m10,049s
user   0m10,012s
sys   0m1,442s

==> By this change it takes only 10 seconds to generate 1 million addresses using one single core on my system. By replacing the private key generation from randomly to sequentially I got 2.6x better performance.

Now let's activate all available cores...
Code:
[...]
#num_cores = 1
num_cores = os.cpu_count()
[...]
# Write the addresses to a file
np.savetxt('addresses_1M_seq_singlecore.txt', addresses, fmt='%s')

Quote
real   0m2,481s
user   0m18,081s
sys   0m1,661s

==> It takes only 2.5 seconds to generate 1 million addresses using all 16 cores on my system. By replacing the private key generation from randomly to sequentially gradually resulted in higher performance.

Now as final comparison to your program:
Quote
[...]
('1FZqHE6MndwVFKDD9vPYjHQqCNzf9f4V4G', '1BvzQLEoausQEjdQZnaC9TsHEd8CfL54yy')
('19UvEYkDhi7WzP2rFufkWEhbJhzh1DBKed', '1geYTWHYsgokQ2HuELJALYSaBw8an5q7r')
('1N8vnptoRf7HAmdLt2iV7Tr1n6BBff5Ltd', '18x8zZjHqP4d3nUaNPLz7TWfYdYLgiHPsx')
('1PTofsPjc2FUtKoM3nWfCfCQi8KfnNMJy2', '1GYfAZ7R6mnMcRFUgTaELUqLEjcdV9CYjX')
('169x29jPKW3ehUWZcWXRgG8tCwqjDhPAn7', '1LiTZGU8B6gK5zNxBSiRBDyAxZLYSfaHnZ')
('13Xm9hWcfgNfamDEhfuisnnnAD6nCxS9k2', '1NXKfq5eqFJmmCaarjhKoqQXGqfVJKJZYq')
('1NJzBnxVzYRGUSSBqVpV5tpxGMb25hroQj', '1FtgciBUc2mmioTpxgSMvANw7JB3nQEhFc')
('19Qkdthp9m8uqJys58zBgfp4kbp6bS8ZA4', '1KiAJjTY7BvsogAF1suffpbsmB1VB6dLHu')
You have just generated 1,007,862 keys (1 MKeys)

real   0m14,765s
user   0m14,187s
sys   0m0,448s


==> I have generated 1,007,862 addresses (about 1 million keys) in about 15 seconds using one single core with your program.
==> I have generated exactly 1 millions addresses in 10 seconds using one single core with my program.


Conclusion:

The originally enormous speed advantage is therefore, according to these tests, not really to be found in the code difference but in the fact that randomly generated keys are simply more computationally intensive and cost more time than simple sequential key generation. So just as you originally said arulbero.

It always depends on the purpose of use, of course, but I have my doubts that the sequential generation of private keys is not the best way. I therefore think, also in terms of security for other applications, that random private keys are always preferable to sequential ones despite the fact they are more time-intensive.

Thank you very much for your effort and your valuable contribution dear arulbero.

My conclusion so far remains: secp256k1 seems to be pretty damn fast and so far I haven't seen any example that can beat it. Of course I would still be happy if other suggestions flow in. In particular, I would still be interested if someone is able to move the already working code via pycuda, numba, jut or similar solutions into the GPU using CUDA to speed up the computation process.

I am looking forward to further feedback and thank you all in advance.