Hello,
Affine coordinates for search (faster):
Each group perform p = startP + i*G, i in [1..group_size] where i*G is a pre-computed table containing G,2G,3G,.... in affine coordinates. The inversion of deltax (dx1-dx2) is done once per group (1 ModInv and 256*3 mult). group_size is 256 key long.
Protective coordinates for EC multiplication (computation of starting keys). Normalization of the key is done after the multiplication for starting key.
Edit:
You also may have noticed that I have an innovative implementation of modular inversion (DRS62) which is almost 2 times faster than the Montgomery one. Some benchmark and comments are available in IntMop.cpp.
Ok.
two questions:
1) why only 256 for the group size? There is a memory problem? Less inversions are better
2) the field multiplication a*b = c mod p ; why do you use Montgomery, are you sure it is worth it?