Re: Science Fair Project to trap Bitcoin private keys using Kangaroos!

Quote from: Telariust

it very slow for 1core C/C++ openssl/secp256k1, it should be more!

talk less, work more... my C99 prototype ready!
everything is exactly as I described above

1core i7-6820, win7x64

// MODE: J + A -> J, xJ->xA ; 0.23Mk/s; secp256k1_gej_add_ge_var(), convert only X jacobian to affine

Code:

C:\cygwin64\home\User\pollard-kangaroo>pollard-kangaroo.exe

[###########################################################]
[# Pollard-kangaroo PrivKey Recovery Tool #]
[# C99, bitcoin-core/secp256k1 library #]
[# singlecore #]
[# ver 0.1 #]
[###########################################################]
[DATE(utc)] 16 Sep 2019 18:23:53
[~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~]
[pow2bits] 2^40
[range] 2^39..2^40 ; W = U - L = 0x0000008000000000 (2^39)
[maxDP] 1024 (max elements in hashtable)
[DPmodule] 0x0000000000040000
[pow2Jmax] 23
[~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~]
--> TEST MODE
[pubkey(tests)] 03A2EFA402FD5268400C77C20E574BA86409EDEDEE7C4020E4B9F0EDBEE53DE0D4
--> pubkey valid!
[~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~]
[+] Sp-table of pow2 points - ready
[~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~]
[+] T1+W1 herds - ready
--> [0y 0m 0d 00:00:06s] [0.23Mk/s] [0y 0m 0d 00:00:11s]
[~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~]
[i] 1700051j; DP T+W=4+3=7; dp/kgr=3.5;
[prvkey#40] 000000000000000000000000000000000000000000000000000000E9AE4933D6
[#############################; passtime: 0y 0m 0d 00:00:06s]
[####################################; precision: 6s 38ms]
[~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~]
[DATE(utc)] 16 Sep 2019 18:23:59
[~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~]
[x] EXIT

40bits, w=39bit, 2w^(1/2) at 17sec

// MODE: 1*J+k*G->J, xJ->xA ; 0.09Mk/s; secp256k1_ecmult(), convert only X jacobian to affine

Code:

--> [0y 0m 0d 00:00:16s] [0.09Mk/s] [0y 0m 0d 00:00:27s]
[~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~]
[i] 1700051j; DP T+W=4+3=7; dp/kgr=3.5;
[prvkey#40] 000000000000000000000000000000000000000000000000000000E9AE4933D6
[####################################; precision: 17s 18ms]

40bits, w=39bit, 2w^(1/2) at 43sec

// MODE: k*G->J, J+J->J, xJ->xA; 0.03Mk/s; secp256k1_ecmult_gen() + secp256k1_gej_add_var() , convert only X jacobian to affine

Code:

--> [0y 0m 0d 00:00:48s] [0.03Mk/s] [0y 0m 0d 00:01:19s]
[~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~]
[i] 1700051j; DP T+W=4+3=7; dp/kgr=3.5;
[prvkey#40] 000000000000000000000000000000000000000000000000000000E9AE4933D6
[####################################; precision: 51s 595ms]

40bits, w=39bit, 2w^(1/2) at 127sec

#################

Quote from: BurtW

Quote from: Telariust

so as not to create possible problems for you, I will publish the source code only after you.

We are doing this to learn. It would be no problem for us if you publish your source code. We already have the published python source code and have looked at it in order to improve our program. You do not need to wait for us to publish. The science fair is not until next year so we have a lot of time to get everything just right and then move the code on to a GPU.

Ok! ~~I will publish a little later, need to tidy the code.~~

github.com/Telariust/pollard-kangaroo-c99/releases/
Enjoy! Grin

#################

Quote from: Telariust

At this time, I will try to write addition in affine coordinates, which is the fastest way, +30% expected speed-up.

Quote from: Telariust

in ideal - need write new functions:
A+A->A (1I, 2M, 1S) - best choice, because in the end we need X in affine for unique representation each point to adding in storage hashtable..
after which do not need to spend 8M,3S, only nearly 20M to each 1I, it +35% speed-up

Affinity A + A-> A ready.
Real growth was about 6-8% (probably 20M per 1I from the article is overpriced)

runing test, task is 50
1core i7-6820, win7x64

Code:

[pow2bits] 2^50
[range] 2^49..2^50 ; W = U - L = 0x0002000000000000 (2^49)

[algo#1] J + A -> J, xJ->xA ; secp256k1_gej_add_ge_var()

Code:

--> [0y 0m 0d 00:09:16s] [0.27Mk/s] [0y 0m 0d 00:00:00s]
[i] 152944440j; DP T+W=4+6=10; dp/kgr=5.0;
[prvkey#50] 00000000000000000000000000000000000000000000000000022BD43C2E9354
[#############################; passtime: 0y 0m 0d 00:09:16s]
[####################################; precision: 556s 699ms]

[algo#0] A + A -> A, xA ready ; the fastest!

Code:

--> [0y 0m 0d 00:08:42s] [0.29Mk/s] [0y 0m 0d 00:00:00s]
[i] 152944440j; DP T+W=4+6=10; dp/kgr=5.0;
[prvkey#50] 00000000000000000000000000000000000000000000000000022BD43C2E9354
[#############################; passtime: 0y 0m 0d 00:08:42s]
[####################################; precision: 522s 492ms]

runtime: 556/522 = x0.065 (+6,5%)
hashrate: 0.29/0.27 = x0.074 (+7,4%)

miserable improvement, less statistical error... certainly better than nothing;
but checked the hypothesis A+A->A;

Short compare algoritms:
1core i7-6820, win7x64

Code:

// 0: A + A -> A, xA ready ; 0.291Mk/s; the fastest
// 1: J + A -> J, xJ->xA ; 0.274Mk/s; secp256k1_gej_add_ge_var()
// 2: J + A -> J, xJ->xA ; 0.267Mk/s; secp256k1_gej_add_ge()
// 3: 0*P0+k*G->J, xJ->xA ; 0.091Mk/s; secp256k1_ecmult()
// 4: k*G->J, J+J->J, xJ->xA; 0.031Mk/s; secp256k1_ecmult_gen() + secp256k1_gej_add_var()

ready-made, if someone needs it.
Affine points addition, custom functions, C99, bitcoin-core/secp256k1
2A -> A (1I, 2M, 2S)
A + A -> A (1I, 2M, 1S)

Code:

// 2A -> A (1I, 2M, 2S)
void secp256k1_ge_double_var(secp256k1_ge *r, const secp256k1_ge *a){

secp256k1_fe rX, rY;
secp256k1_fe tmp1, tmp2;

tmp1 = a->y;
if ( (a->infinity == 1) || secp256k1_fe_normalizes_to_zero_var(&tmp1) ) {
secp256k1_ge_set_infinity(r);
return;
}

// s = (3x^2 + A)/(2y) # 1I 2M 1S
// A=0 B=7 (secp256k1)
secp256k1_fe_sqr(&tmp2, &a->x);//1S
tmp1 = tmp2;
secp256k1_fe_add(&tmp1, &tmp2);
secp256k1_fe_add(&tmp1, &tmp2);
secp256k1_fe_normalize_weak(&tmp1);

tmp2 = a->y;
secp256k1_fe_add(&tmp2, &a->y);
secp256k1_fe_normalize_weak(&tmp2);
secp256k1_fe_inv_var(&tmp2, &tmp2);//1I

secp256k1_fe_mul(&tmp1, &tmp1, &tmp2);//1M

// x' = s^2-2x # 1S
secp256k1_fe_sqr(&rX, &tmp1);//1S
tmp2 = a->x;
secp256k1_fe_add(&tmp2, &a->x);
secp256k1_fe_negate(&tmp2, &tmp2, 1);
secp256k1_fe_add(&rX, &tmp2);
secp256k1_fe_normalize_weak(&rX);

// y' = s(x-x')-y # 1M
secp256k1_fe_negate(&tmp2, &rX, 1);
rY = a->x;
secp256k1_fe_add(&rY, &tmp2);
secp256k1_fe_normalize_weak(&rY);
secp256k1_fe_mul(&rY, &rY, &tmp1);//1M
secp256k1_fe_negate(&tmp2, &a->y, 1);
secp256k1_fe_add(&rY, &tmp2);
secp256k1_fe_normalize_weak(&rY);

r->x = rX;
r->y = rY;
}

// A + A -> A (1I, 2M, 1S)
void secp256k1_ge_add_ge_var(secp256k1_ge *r, const secp256k1_ge *a, const secp256k1_ge *b){

if ( (a->infinity == 1) || (b->infinity == 1) ) {
if ( (a->infinity == 1) ^ (b->infinity == 1) ) {
if (a->infinity == 1) { r->x = b->x; r->y = b->y; return; }
if (b->infinity == 1) { r->x = a->x; r->y = a->y; return; }
}else{
secp256k1_ge_set_infinity(r);
return;
}
}

secp256k1_fe rX, rY;
secp256k1_fe tmp1, tmp2;

secp256k1_fe aXneg;
secp256k1_fe_negate(&aXneg, &a->x, 1);
secp256k1_fe aYneg;
secp256k1_fe_negate(&aYneg, &a->y, 1);

//dx = B.x - A.x
tmp1 = b->x;
secp256k1_fe_add(&tmp1, &aXneg);

//dy = B.y - A.y
tmp2 = b->y;
secp256k1_fe_add(&tmp2, &aYneg);

if (secp256k1_fe_normalizes_to_zero_var(&tmp1)) {
if (secp256k1_fe_normalizes_to_zero_var(&tmp2)) {
secp256k1_ge_double_var(r, a);
return;
}else{
secp256k1_ge_set_infinity(r);
return;
}
}

secp256k1_fe_normalize_weak(&tmp1);
secp256k1_fe_normalize_weak(&tmp2);

secp256k1_fe_inv_var(&tmp1, &tmp1);//1I

//c = dy * invert(dx, p) % p # 1I,1M
secp256k1_fe c;
secp256k1_fe_mul(&tmp2, &tmp2, &tmp1);//1M

//R.x = (c**2 - A.x - B.x) % p # 1S
secp256k1_fe_sqr(&rX, &tmp2);//1S
secp256k1_fe_add(&rX, &aXneg);
secp256k1_fe_negate(&tmp1, &b->x, 1);
secp256k1_fe_add(&rX, &tmp1);
secp256k1_fe_normalize_weak(&rX);

//R.y = (c*(A.x - R.x) - A.y) % p # 1M
rY = a->x;
secp256k1_fe_negate(&tmp1, &rX, 1);
secp256k1_fe_add(&rY, &tmp1);
secp256k1_fe_normalize_weak(&rY);
secp256k1_fe_mul(&rY, &rY, &tmp2);//1M
secp256k1_fe_add(&rY, &aYneg);
secp256k1_fe_normalize_weak(&rY);

r->x = rX;
r->y = rY;
}

From https://www.nayuki.io/page/elliptic-curve-point-addition-in-projective-coordinates

#################

Quote from: Telariust

Quote from: 57fe

..Tests with C++ (VS2013) leads to x2..x3 faster calculations only (single-thread), but much more complicated code..

..because this is the limit of computation on the cpu

and now we have the answer

ver0.1 C99, win7x64, algo A+A->A
1core i7-6820
0.291Mk/s
1core i5-2540
0.219Mk/s

ver0.9 python37x64+gmpy2, win7x64, algo A+A->A
1core i7-6820
174.3K j/s;
1core i5-2540
99.7K j/s;