Post
Topic
Board Bitcoin Discussion
Re: Science Fair Project to trap Bitcoin private keys using Kangaroos!
by
Telariust
on 10/09/2019, 16:55:56 UTC
The claimed avg "expected of 2w^(1/2) group operations" is not achievable with m = w^(1/2))/2 (mean jump size from acticles by Pollard)
Empirical tests have shown that 2w1/2 is achievable at m * 8.
I ask you to check on your implementation what speed will be if the MID/MAX step size is increased by 8 times, plz.
thx, no need
all norm after add
Code:
# get JmaxofSp
def getJmaxofSp(optimalmeanjumpsize, dS):
if flag_verbose > 0:
print('[optimal_mean_jumpsize] %s' % optimalmeanjumpsize)

sumjumpsize = 0

for i in range(1,len(dS)):

#sumjumpsize = (2**i)-1
#sumjumpsize += 2**(i-1)
sumjumpsize += dS[i-1]

now_meanjumpsize = int(round(1.0*(sumjumpsize)/(i)))

#next_meanjumpsize = int(round(1.0*(sumjumpsize+2**i)/(i+1)))
next_meanjumpsize = int(round(1.0*(sumjumpsize+dS[i])/(i+1)))

if flag_verbose > 1:
print('[meanjumpsize#Sp[%d]] %s(now) <= %s(optimal) <= %s(next)' % (i, now_meanjumpsize, optimalmeanjumpsize, next_meanjumpsize ))


if  optimalmeanjumpsize - now_meanjumpsize <= next_meanjumpsize - optimalmeanjumpsize :
if flag_verbose > 0:
print('[meanjumpsize#Sp[%d]] %s(now) <= %s(optimal) <= %s(next)' % (i, now_meanjumpsize, optimalmeanjumpsize, next_meanjumpsize ))

if flag_verbose > 0:
print('[JmaxofSp] Sp[%s]=%s nearer to optimal mean jumpsize of Sp set' % (i, now_meanjumpsize))

return i

print("\n[FATAL_ERROR] JmaxofSp not defined!\n"); exit(-1)

#################
about speed

We have updated the OP with our latest results.  Here is a summary of the results of runs for tests 0 through 49:...
That is already good result!
no, absolutely not

August 26, 2019
Code:
pkey    Standard OpenSSL library     secp256k1 (S 8X32, F 10X26)
bits   hops/second   test run time   hops/second   test run time
====   ===========================   ===========================
  31   1942.378857   28.4496 secs    5520.634781   10.0240 secs
Standard OpenSSL library averaged 1897 hops per second    
secp256k1 (S 8X32, F 10X26) averaged 5258 hops per second  

September 09, 2019 (secp256k1)
Code:
test       time         hps       hops
----  --------------  -------  ----------
31    9.2328 secs    11,884  109705
already better, but..

it very slow for 1core C/C++ openssl/secp256k1, it should be more!
(even if u built early classic 1 Wild algo with 3.28w^(1/2), this does not explain x10 speed drop)

Quote
We were just trying to quickly get everything to work so we have not optimized anything yet.
ok, we expect and hope

see for compare

Algo: 1 Tame + 1 Wild with distinguished points,  expected of 2w^(1/2) group operations
avg stat after 10tests

1core i5-2540, python37x64, win7x64
Code:
[i] 8.6K j/s; ...
...
[################################################]
[averages] expected of 2w^(1/2) group operations
-------|--------/--------|---------------------------------/---------------------------------|
   W   |jump avg/2w^(1/2)| time                         avg/2w^(1/2)                         |
-------|--------/--------|---------------------------------/---------------------------------|
>2^031 |   84.0K/  92.7K |      0y  0m  0d 00:00:10s 002ms /      0y  0m  0d 00:00:11s 042ms |
----------------------------------------------------------------------------------------------
(it raw python!)

1core i5-2540, python37x64 + gmpy2, win7x64
Code:
[i] 73.6K j/s; ...
...
[################################################]
[averages] expected of 2w^(1/2) group operations
-------|--------/--------|---------------------------------/---------------------------------|
   W   |jump avg/2w^(1/2)| time                         avg/2w^(1/2)                         |
-------|--------/--------|---------------------------------/---------------------------------|
>2^031 |   90.0K/  92.7K |      0y  0m  0d 00:00:01s 150ms /      0y  0m  0d 00:00:01s 185ms |
----------------------------------------------------------------------------------------------

#################
about secp256k1 library

Legendary main question from legendary master   Smiley  root of all ecc problems)
Are you using affine or jacobian coordinates for the points?
42: Answer to the Ultimate Question of Life, the Universe, and Everything"

secp256k1 have functions J+J->J , J+A->J , and no have functions A+A->A (Is this a precisely optimized library for bitcoin? Smiley)
(all simple, because in real often need multiply rand point instead adding, and jacobian is better for multiply)
J+J->J (12M, 4S) - bad choice
J+A->J (8M, 3S) - already better
(look eprint.iacr.org/2016/103.pdf)
you can take the code from break_short by Jochen Hoenicke, it fits perfectly.
this code uses J+A->J and convert only X to Affine xJ ->xA (its max what we have in secp256k1 as it is)
where A is S-table with pre-computed G,2G,4G,8G,16G..
..u used pre-computed, y? not say me what u calc every Si permanently used multiplication, nonono,plz,no! X%mask instead pow2(X%k)?
(..by the way, that would explain your terrible speed..)
..and use old alternatively functions without protection against side channel attacks, +10-20% speed
secp256k1_gej_add_ge_var() instead secp256k1_gej_add_ge()
(warn, _var() need normalize before use in some functions.. my early mistake, "leading zeros" github.com/bitcoin-core/secp256k1/issues/601)
in ideal - need write new functions:
A+A->A (1I, 2M, 1S) - best choice, because in the end we need X in affine for unique representation each point to adding in storage hashtable
..u used hashtable, y?)
after which do not need to spend 8M,3S, only nearly 20M to each 1I, it +35% speed-up
like that

########
I seem to know what the problem is.
This is a very obvious fact for me, I should have started with it.
(I should have guessed at a time when about ECMULT_WINDOW_SIZE)

No multiplies the points inside the loop!
I warn, multiplication points is very expensive, more expensive than the slowest addition points.
Shouldn't use it at all in multi-million integration cycles.
You must calculate all possible displacement points before the start of the cycle.
Then just add them, being inside cycle.

I think you are using a mask
https://bitcointalk.org/index.php?topic=5166284.msg52068840#msg52068840
..so you can’t store several million pre-calculated points in memory. it explains your choice.
Quote
   x = PRF(X)
      = 1 << (X % m)
...
ec_point_mul_and_add(r,a,n)
not mask, i see,.. then simple, u tried built classic pattern of a*b+c

Quote
..rewritten the following functions..
I tell you - take loop from break_short as it is (with standart functions), rewrite to kangaroos, and benchmark it.
if after you still want to rewrite what, then ok (but i think what not)
I think the "secret sauce" will not create, it will turn out an exotic slow implementation through multiplication.
I would like to be mistaken, but my experience says that you lose time.
..How do I know about multiplication?.. I checked it myself,.. and from the topic about why brainflayer is slow and why it cannot be done faster,.. and it was on one of the topics pages LBC.
This is not the most famous fact, it is not surprising that you did not know about it, no fault.

Its not random, not brainflayer,  its pseudo random walk, so we can pre-compute our jumps size (offset points) for using addition points instead multiplication points.