Re: the fastest possible way to mass-generate addresses in Python

Quote from: citb0in on December 27, 2022, 07:51:03 PM

Quote

real   0m4,768s
user   0m54,444s
sys   0m1,894s

Perfect! Now I'm under 5 seconds for 1 million addresses. That's a rate of roughly 208,000 addresses/sec
How can we further improve the performance without using GPU? Is there anything else you can suggest, that will speed up the process?
Let's make a contest-like game. Create a simple code like this and try to beat the high score of currently 4.7 seconds for 1million bech32 addresses without using GPU.

The next step would be ==> how can we shuffle this code to the GPU for ultra-fast performance ?

Long time ago I wrote a python script that computes 16.4 million of consecutive (non random) public keys (not addresses) in 12.3 s.

No numpy, pure python, only 1 thread.

ecc.py

Code:

#see http://www.secg.org/sec2-v2.pdf at page 9

#Gx=55066263022277343669578718895168534326250603453777594175500187360389116729240
#Gy=32670510020758816978083085130507043184471273380659243275938904335757337482424
#p=115792089237316195423570985008687907853269984665640564039457584007908834671663
#n=115792089237316195423570985008687907852837564279074904382605163141518161494337

Gx=0x79BE667EF9DCBBAC55A06295CE870B07029BFCDB2DCE28D959F2815B16F81798
Gy=0x483ADA7726A3C4655DA4FBFC0E1108A8FD17B448A68554199C47D08FFB10D4B8
p=0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEFFFFFC2F # Fp
n=0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFEBAAEDCE6AF48A03BBFD25E8CD0364141 # E(Fp)

#endomorphism
lambd=0x5363ad4cc05c30e0a5261c028812645a122e22ea20816678df02967c1b23bd72 # (lambda^3 = 1 mod n)
lambd2=0xac9c52b33fa3cf1f5ad9e3fd77ed9ba4a880b9fc8ec739c2e0cfc810b51283ce #(lambda^2)

beta=0x7ae96a2b657c07106e64479eac3434e99cf0497512f58995c1396c28719501ee # (beta^3 = 1 mod p)
beta2=0x851695d49a83f8ef919bb86153cbcb16630fb68aed0a766a3ec693d68e6afa40 # (beta^2)
############################################################################################
###Field operations

#a must be a number between 1 and p-1; input: a output: 1/a
#see http://www.springer.com/?SGWID=4-102-45-110359-0 page 40 alg. 2.20
def inv(a,p):
u, v = a%p, p
x1, x2 = 1, 0
while u != 1:
#q = v//u
#r = v-q*u
q, r = divmod(v,u)
x = x2-q*x1
v = u
u = r
x2 = x1
x1 = x
return x1%p

#inverse of a batch of x
#input: batch of x : [x1,x2,x3,x4,x5,x6,x7,x8,...,x2048]
#x is known (x of kG)
#output: batch of inverse: 1/(x1-x), 1/(x2-x), 1/(x3-x), ....
#3M for each inverse
def inv_batch(x,batchx,p):

a = (batchx[1]-x) % p
partial= [0]*2049
partial[1]=a

for i in range(2,2049):
a = (a*(batchx[i]-x)) % p #2047M
partial[i]=a

inverse=inv(partial[2048],p) # 1I
batch_inverse=[0]*2049

for i in range(2048,1,-1):
batch_inverse[i-1]=(partial[i-1]*inverse) % p #2047M
inverse=(inverse*(batchx[i]-x)) %p #2047M

batch_inverse[0]=inverse

return batch_inverse

############################################################################################
##Group operations

# https://en.wikipedia.org/wiki/Elliptic_curve_point_multiplication#Point_addition
#'double' addition
#add 2 points P and Q --> P+Q
#add 2 points P and Q --> P-Q
#
#(X1,Y1) + (X2,Y2) --> (X3,Y3) (the inverse of (x2-x1) must be known)
#(X1,Y1) + (X2,-Y2) --> (X4,Y4) (the inverse of (x2-x1) must be known)
#
#input: (X1,Y1), (X2,Y2), the inverse of (X2 - X1): 5 scalars
#output: (X3,Y3), (X4,Y4) : 4 scalars
#4M + 2S

def double_add_P_Q_inv(x1, y1, x2, y2, invx2x1):

dy = (y2-y1) #% p
a = dy*invx2x1 % p #1M
a2 = a**2 #1S
x3 = (a2 - x1 -x2) % p
y3 = (a*(x1-x3) -y1) % p #1M

dy = p-(y2+y1) #% p # only the sign of y2 changes, "y of -Q = -y of +Q"
a = dy*invx2x1 % p #1M x2 and then the inverse invx2x1 are the same
a2 = a**2 #1S
x4 = (a2 - x1 -x2) % p
y4 = (a*(x1-x4) -y1) % p #1M

return x3, y3, x4, y4

#https://en.wikibooks.org/wiki/Cryptography/Prime_Curve/Jacobian_Coordinates
#jacobian coordinates
def add_j_j(jax, jay, jaz, jbx, jby, jbz):

u1 = (jax*jbz**2) % p
u2 = (jbx*jaz**2) % p
s1 = (jay*jbz**3) % p
s2 = (jby*jaz**3) % p
h = (u2-u1) % p
r = (s2-s1) % p
jcx = (r**2 -h**3-2*u1*h**2) % p
jcy = (r*(u1*h**2-jcx)-s1*h**3) % p
jcz = (h*jaz*jbz) % p
return jcx, jcy, jcz

def double_j_j(jx, jy, jz):
s = (4*jx*jy**2) % p
m = (3*jx**2) % p
jx2 = (m**2 - 2*s) % p
jy2 = (m*(s-jx2) - 8*jy**4) % p
jz2 = (2*jy*jz) % p
return jx2, jy2, jz2

def mul(k,jx,jy,jz):

jkx, jky, jkz = 0, 0, 1
if (k%2 == 1):
jkx, jky, jkz = jx, jy, jz #only if d is odd
q=k//2
while q>0:
jx, jy, jz = double_j_j(jx,jy,jz)
a=q%2
if (a > jkx):
jkx, jky, jkz = jx, jy, jz #only if d is even
elif (a):
jkx, jky, jkz = add_j_j(jx, jy, jz, jkx, jky, jkz)
q=q//2

return jkx, jky, jkz

############################################################################################
#Coordinates

#from jacobian to affine
def jac_to_aff(jax, jay, jaz, invjaz):

ax, ay = (jax*invjaz**2) % p, (jay*invjaz**3) % p

return ax, ay

gen_batches.py

Code:

#!/usr/bin/env python

#this script computes 3x1334 batchs of 4097 points (1 single point + 2048 couples of points -> (kG+mG, kG-mg))
#16,4 millions of points
#3 : 1 batch + 2 endomorphism

from ecc import *

########################################################################
#in this section the script pre-computes 1G, 2G, 3G, ..., 2048G
mG = [(0,0)]*2049
mGx = [0]*2049
mGy = [0]*2049

mGx[1]=Gx
mGy[1]=Gy
mG[1]=(Gx,Gy) #list of multiples of G,
#you have to precompute and store these multiples somewhere

mGx[2]=0xC6047F9441ED7D6D3045406E95C07CD85C778E4B8CEF3CA7ABAC09B95C709EE5
mGy[2]=0x1AE168FEA63DC339A3C58419466CEAEEF7F632653266D0E1236431A950CFE52A
mG[2]=(mGx[2],mGy[2])

for i in range(3,2049):

jx,jy,jz=add_j_j(Gx, Gy, 1, mGx[i-1], mGy[i-1], 1) #this is not a efficient way, but it is not important
jzinv = inv(jz,p)
(x,y) = jac_to_aff(jx, jy, jz, jzinv)
mG[i] = (x,y)
mGx[i] = x
mGy[i] = y

###########################################################################
lowest_key=2**55+789079076 #put here the lowest key you want to generate
number_of_batches=1334 #put here how many consecutive batches you want to generate
number_of_keys=4097*3*number_of_batches #this is the number of consecutive keys you are going to generate
#this number is always a multiple of 3*4097=12291, because 12291 is the minimum size
#remember: you generate always 3 batches at once, 1 in (1,2^160) and 2 out of this range
#their name are: batch (the main), batch2, batch3 (that are derived from the main with endomorphism)

id_batch=lowest_key+2048 #id_batch is now the private key of the middle point of the first batch that you want compute
distance_between_successive_batches=4097 #this is the distance from the middle point of the first batch and the middle point of the successive batch in (1,2^160)
#4097 means that I'm generating only consecutive batches in (1,2^160), this is the default

start=id_batch #the first key to generate of the the first batch is the lowest key + 2048 --> the middle point of the first batch
stop=id_batch + number_of_batches*distance_between_successive_batches #the key of the middle point of the last batch

#k is always the id of the current main batch and the key of the middle point of the current main batch
for k in range(start,stop,distance_between_successive_batches):

jkx,jky,jkz = mul(k,Gx,Gy,1) #here we use the slow function mul to generate the middle point
invjkz = inv(jkz,p)
(kx,ky) = jac_to_aff(jkx, jky, jkz, invjkz)

kminverse = [0]*2048
kminverse = inv_batch(kx,mGx,p) #you need to compute 2048 inverse, 1 for each couple
#to perform this task you need only the x coordinates of the multiples of G and the x coordinate of kG: kx
kminverse = [invjkz]+kminverse
#the element number 0 of the list is the inverse invjkz that you have computed to get kG=(kx,ky)
#the element number 1 of the list is the inverse necessary to perform kG + 1G (and kG-1G) ---> 1/(x_of_1G - kx)
#the element number 2 of the list is the inverse necessary to perform kG + 2G (and kG-2G) ---> 1/(x_of_2G - kx)
#...
#the element number m of the list is the inverse necessary to perform kG + mG (and kG-mG) ---> 1/(x_of_mG - kx)
#then the name: "kminverse", the inverse to get the generic kG+mG
#...
#the element number 2048 of the list is the inverse necessary to perform kG+2048G and kG-2048G ---> 1/(x_of_2048G - kx)

kxl = [kx] * 2049 #this is a list of kx, all elements are the same, only to use the function map of python
kyl = [ky] * 2049 #this is a list of ky, all elements are the same, only to use the function map of python

batch=[(0,0,0,0)]*2048
#the element number 0 of the list will be the middle point kG=(kx,ky), you can see it as the couple ((k+0)G, (k-0)G)
#the element number 1 will be the couple (k+1)G, (k-1)G
#the element number 2 will be the couple (k+2)G, (k-2)G
#...
#the element number 2048 will be the couple (k+2048)G, (k-2048)G
#each element of this list is a couple of points -> 4 scalars: x_(k+m)G,y_(k+m)G,x_(k-m)G,y_(k-m)G,
#but the first element is only a fake couple
batch = list(map(double_add_P_Q_inv,kxl[1:],kyl[1:],mGx[1:],mGy[1:],kminverse[1:]))
#the function double_add_P_Q_inv computes kG +/-mG, it uses the list of the inverse and return a list of couple of points
#each element of this list is a couple of points, 4 scalars, x_(k+m)G,y_(k+m)G,x_(k-m)G,y_(k-m)G
#this list has 2049 elements

batch = [(kx,ky,kx,ky)]+batch #the element number 0 of the list is now the middle point kG=(kx,ky), you can see it as the fake couple (k+0)G, (k-0)G

batch2=[0] * 2049 #this is the batch2 with lambda*kG, ( lambda*(k+1)G, lambda*(k-1)G ), ( lambda*(k+2)G, lambda*(k-2)G ), ....,
# (lambda*(k+2048)G, lambda*(k-2048)G)
#this list actually has only the x-cooordinates of the points, because the y-coordinates are the same of main batch
#each element of this list is a couple of x -> 2 scalars: x_lambda*(k+m)G, x_lambda*(k-m)G
#this list has 2049 elements

batch3=[0] * 2049 #this is the batch3 with lambda^2*kG, ( lambda^2*(k+1)G, lambda^2*(k-1)G ), ( lambda^2*(k+2)G, lambda^2*(k-2)G ), ....,
# (lambda^2*(k+2048)G, lambda^2*(k-2048)G)
#this list actually has only the x-cooordinates of the points, because the y-coordinates are the same of main batch
#each element of this list is a couple of x -> 2 scalars: x_lambda^2*(k+m)G, x_lambda^2*(k-m)G
#this list has 2049 elements

for i in range(0,2049): #endomorphism
batch2[i] = [(batch[i][0]*beta % p), (batch[i][2]*beta % p)] #only x coordinates, the first and the third scalar of each element of the main batch
#private key: lambda*k mod n, coordinate x: beta*x mod p
batch3[i] = [(batch[i][0]*beta2 % p), (batch[i][2]*beta2 % p)] #only x coordinates, the first and the third scalar of each element of the main batch
#private key: lambda^2*k mod n, coordinate x: beta^2*x mod p

#k in this case is the id of the last batch (or if you prefer, k is the private key of the middle point of the last batch)
m=465 #a random number between 0 and 2048, you can pick up a couple of points in the last main batch (or in batch2 / batch3)
(kmx1,kmy1,kmx2,kmy2)=batch[m]
(kmx1,kmx2)=batch3[m] #in this case the points chosen are in batch3
#the y are from main batch, because batch3 (and batch2) have only x coordinates

print ('kG+mG')
print (hex(lambd2*(k+m)%n)) #private key
print (hex(kmx1), hex(kmy1)) #public key (first and second coordinate)
print ('*******')

print ('kG-mG')
print (hex(lambd2*(k-m)%n)) #private key
print (hex(kmx2), hex(kmy2)) #public key (first and second coordinate)
print ('*******')

print (' ')
a=number_of_keys//1000000
print ('You have just generated ' + str(a) + ' Millions of keys!!!')

Code:

time python3 gen_batches.py

You have just generated 16.4 Millions of keys!!!

real 0m12,303s
user 0m12,302s
sys 0m0,000s

Generating consecutive public keys is way faster than generate random public keys.