the fastest possible way to mass-generate addresses in Python

Hello everybody,

in the last days I am trying various approaches in Python to generate random private keys (hex) and the corresponding bitcoin address (optionally uncompressed, compressed, bech32 or mixed) as efficiently as possible and with good performance. There are several different approaches. I am actually very satisfied with the "bit" library from Ofek, because it is the fastest I experienced so far. However, if you also want to generate bech32 addresses, you have to do some manual extra work and the speed advantage of bit is then lost. So, as a very good and satisfying average I find the use of fastecdsa and iceland2k14/secp256k1 very fast.

In several of my python programs I am testing, I have integrated a performance counter, which shows me the total number of addresses and the calculated rate (addresses/sec) while the program is running. However, I have taken this out in this examples to keep the code as simple and clear as possible. For relative ease of comparison, I do create 1 million randomly generated keys from which the Bech32 Bitcoin address (prefix 'bc1q' is then calculated. I write the output into a file, so I can check it afterwards to ensure everything went correct and the data is correct. I "time" the Python program to see how long it took to create the 1 million addresses. Let's get into it...

Here is an example which generated bech32 addresses:

Code:

#!/usr/bin/env python3
# 2022/Dec/26, citb0in
from fastecdsa import keys, curve
import secp256k1 as ice

# Set the number of addresses to generate
num_addresses = 1000000

# Open a file for writing
with open('addresses_1M.out', 'w') as f:
# Generate and write each address to the file
for i in range(num_addresses):
prvkey_dec = keys.gen_private_key(curve.P256)
addr = ice.privatekey_to_address(2, True, prvkey_dec)
f.write(f'{addr}\n')

It took 82 seconds to generate those 1 million addresses. This is a rate about 12,195 addresses/sec which is way too slow.

Quote

real   1m22,192s
user   1m21,461s
sys   0m0,640s

I often read about numpy and from what I could find on the internet about it, it sounded interesting. So I tried to rewrite the code to use numpy and compare the results.

Code:

import numpy as np
import fastecdsa.keys as fkeys
import fastecdsa.curve as fcurve
import secp256k1 as ice

# how many addresses to generate
num_addresses = 1000000

# Generate a NumPy array of random private keys using fastecdsa
private_keys = np.array([fkeys.gen_private_key(fcurve.P256) for _ in range(num_addresses)])

# Use secp256k1 to convert the private keys to addresses
addresses = np.array([ice.privatekey_to_address(2, True, dec) for dec in private_keys])

# Write the addresses to a file
np.savetxt('addresses_numpy.out', addresses, fmt='%s')

Quote

real   1m19,636s
user   1m18,826s
sys   0m1,027s

As you can easily see, this did not bring any speed advantage, if you disregard the 1-2sec difference.

However, what caused significant speed boost was the attempt to distribute the program code into several threads and thus enable parallel processing. As I didn't see any disadvantage by using numpy I kept the numpy part in the code. Here is the extended python code:

Code:

import threading
import numpy as np
import fastecdsa.keys as fkeys
import fastecdsa.curve as fcurve
import secp256k1 as ice

# Set the number of addresses to generate and the number of threads to use
num_addresses = 1000000
num_threads = 16

# Divide the addresses evenly among the threads
addresses_per_thread = num_addresses // num_threads

# Create a list to store the generated addresses
addresses = []

# Define a worker function that generates a batch of addresses and stores them in the shared list
def worker(start, end):
# Generate a NumPy array of random private keys using fastecdsa
private_keys = np.array([fkeys.gen_private_key(fcurve.P256) for _ in range(start, end)])

# Use secp256k1 to convert the private keys to addresses
thread_addresses = np.array([ice.privatekey_to_address(2, True, dec) for dec in private_keys])

# Add the addresses to the shared list
addresses.extend(thread_addresses)

# Create a list of threads
threads = []

# Create and start the threads
for i in range(num_threads):
start = i * addresses_per_thread
end = (i+1) * addresses_per_thread
t = threading.Thread(target=worker, args=(start, end))
threads.append(t)
t.start()

# Wait for the threads to finish
for t in threads:
t.join()

# Write the addresses to a file
np.savetxt('addresses_1M_multithreaded.txt', addresses, fmt='%s')

My system has a total of 16 threads available, so I distributed the load across all threads for that test. Now look at this:

Code:

$ time python3 create_1M_addresses_using_fastecdsa_and_ice_numpy_multithreaded.py

Quote

real   0m19,764s
user   0m38,147s
sys   0m6,367s

It took only 20 seconds to generate 1 million addresses. That is equal to the rate of 50.000 keys/sec. Much better as I would say, but I think it's still too slow. How can we further improve the performance without using GPU ? Is there anything else you can suggest, that will speed up the process? Let's make a contest-like game. Create a simple code like this and try to beat the high score of currently 19.7 seconds for 1million bech32 addresses.