Search Posts

Post

Topic

Board Development & Technical Discussion

Re: Pollard's kangaroo ECDLP solver

Telariust

on 31/05/2020, 22:40:51 UTC

Quote

at cinema for next show

y, now some show

Quote from: zielar

You got zero because the work you did for me is unusable.

You have funny memory problems (as -d 42).
I remind you that I got zero because you did not fulfill your part of the contract.
It was a simple job and (simultaneously) checking you for competence and honesty.
You failed miserably. Work with you? It’s useless to talk to you. Your word is zero, sad but true.

Read more about the events of September 2019.
VanitySearch1.17fix successfully adapted for vanitypool.appspot.com
Agreed, my fix, u gpurigs, 50/50.
Just on the calculated day, the reward leaves the account.
..And zielar disappears for a month! (wow, nice coincidence, y?)
Upon returning, he declares that he has nothing to do with the transaction!
Ok, look at the blockchain:
#60: 1Kn5h2qpgw9mWE5jKpk8PP4qvvJ1QVy8su -> 14yX9jrgmtJByn5YSTdA2uZM6kFQ9z9BUu -> 35HzrskKgRekhB9RgFGNrfhNbdqC1PvEUL
#61: 1AVJKwzs9AskraJLGHAZPiaZcrpDr1U6AB -> 13aNcjjyQYgQZxnxz6Ar9eRqQENexDBPjh -> 35HzrskKgRekhB9RgFGNrfhNbdqC1PvEUL
#62: 1Me6EfpwZK5kQziBwBfvLiHjaPGxCKLoJi -> bc1q055ws5xnrnjtf767cxuklrjl8vw5sst2j4a9lm -> 35HzrskKgRekhB9RgFGNrfhNbdqC1PvEUL
#63: 1NpYjtLira16LfGbGwZJ5JbDPh3ai9bjf4 -> bc1qdqtkkmwtxk5ce0xvp57k46hnx8t43ngw97twmx -> 35HzrskKgRekhB9RgFGNrfhNbdqC1PvEUL
and..
1ThePiachu -> 18A3UeLrcJbi58jci1s7xQss8v8QK8YJPK -> 1JSVnxtzLxNpW6YkXQbV1guW6FdZAaKM4L -> 35HzrskKgRekhB9RgFGNrfhNbdqC1PvEUL
Tadaaa!
Even if 35HzrskKgRekhB9RgFGNrfhNbdqC1PvEUL does not belong to zielar..
..already in the third transfer, money RANDOMLY passes through this wallet.
And here I have to believe that zielar has nothing to do with what is happening)))
Its epic, really.

Open eyes. This is either you or your partner (the one you trust).
And, probably, the events in January are connected with this.

The truth is somewhere nearby, but I still will not see my piece of money.
So by right of verification I declare once again to everyone - be careful with zielar.

Some "unusable sticky thing" by me
https://github.com/Telariust/VanitySearch-appspot/releases
https://github.com/Telariust/VanitySearch-bitcrack/releases
danger! very unusable)

Quote from: zielar

Get to work better

y, brutal insolent)

Quote from: zielar

You are an ordinary shit for me

this phrase very expresses the attitude of the hero to the people around him.
if you google it, this is not the first time he has said this.
lack/ignore of arguments and explosive emotionality. love it

Quote from: zielar

He renounced it twice

Apparently, brichard19 also refused twice. Five times in a row.
(I recall how he answered in issues that he could not improve the program because he did not have enough money for a modern video card. But who read issues?)
Oh, it simple, BitCrack is also a "sticky thing", and brichard19 needs "get to work better", y.

Quote from: zielar

What hurts you, son? Do you have any complex?

y, i have big complex. intolerance of irresponsible people

Quote from: zielar author=zielar

What do you say... that I should transfer some sum..

i say, how about transfer to brichard19, let be 10% of #59-63, ~=0.35BTC

Quote from: zielar

..to be burned to keep my honor?

Transfer, if there is still something to keep.

Gentlemen, the time of bets - is whether the transfer will take place or if there will be another excuse.

Quote from: zielar

you pour out your bitter regrets here because..

..When the world record is set, remember, at what cost.

Post

Topic

Board Development & Technical Discussion

Re: Pollard's kangaroo ECDLP solver

Telariust

on 31/05/2020, 09:29:42 UTC

It is time for those present to take off the pink glasses.

Quote from: BitCrack on May 30, 2020, 10:49:27 PM

Jean Luc should get at least half, IMO. He spent a lot of time developing the software and fixing bugs and adding features that other people requested.

Yes, definitely. No less than you for #59, #60, #61, #62 and #63.
Half. From ZERO.

Quote from: Telariust

How much will you get for VSbitcrack and VSvanitypool.appspot?

Half. From ZERO.

Quote from: Tamarindei on May 30, 2020, 10:09:26 PM

How much will jean_luc get Cool

Guess, what be.

Quote from: PietCoin97 on May 30, 2020, 10:02:53 PM

great to hear

how much power cost do you have ?

Guess, what answer.
ZERO (it's not even his equipment)

Quote from: Jean_Luc on May 29, 2020, 01:43:47 PM

I helped him a bit yesterday to have a clean setup.

God, he can't even run someone else's program.

Congratulation and nice job
https://www.youtube.com/watch?v=q6mMh4pRki8

Babbler only. Be careful with him.

30 days.. 30 pages.. want news? doubling Ygrid gives +25% speed

Post

Topic

Board Bitcoin Discussion

Re: Science Fair Project to trap Bitcoin private keys using Kangaroos!

Telariust

on 09/10/2019, 04:05:30 UTC

Quote from: racminer

..where exactly do you store the tame and wild kangaroos as it is done in the "tame.txt" and "wild.txt" files for the python code..

in hashtable
ok, i will add output to file distinguished points in next release.
there is only one practical use for saving points to disk - distributed enumeration on different devices without a pool.
I myself plan to do so, since writing a pool is much more difficult than collecting a save once a day.

Quote from: st0ffl

The idea is of course to only make sqr(2^(bit-2)) operations.

x2 speed-up, nice idea, i will add as option of boost in next release

Quote from: racminer

The Hash table is only 1024 entries, is it enough for cases like #110 ?

the formula calculates a number of points about 10 pieces per kangaroo for 100% of the lead time, taking into account the size of the range.
Now the overflow of the table will occur with a probability of 50% if launched on 96 cores or more. It needs to be fixed.

Quote from: PrivatePerson on October 10, 2019, 04:26:40 AM

I understand that -t is the number of threads.
The question is why the speed increases, but time does not decrease, more often it even increases.

1) Algo of Pollard is probablistic nature (Birthday Paradox, Kruskal's card trick, etc)
The solution time is not stable; several attempts (timeit loop in python script) are needed to obtain the avg runtime and jumps.
heuristic time rather than deterministic

2) threads should run at equal speed.
If input -t N over than really N cores, so threads will compete, some will be faster and some will be left behind.

3) maybe my code have bug, need more tests

#################
post dedicated cuda/opencl implementations

we have some ways for just ready GPU implement
good candidates to rewrite:

1) cuda, c++, VanitySearch, by Jean_Luc
https://github.com/JeanLucPons/VanitySearch

2) cuda/opencl, c++, BitCrack, by brichard19
https://github.com/brichard19/BitCrack

2) cuda, python, pollard-rho, by brichard19
https://github.com/brichard19/ecdl

3) cuda, ?, pollard-rho
github.com/beranm14/pollard_rho/

4) opencl, C#, pollard-rho
github.com/twjvdhorst/Parallel-Pollard-Rho/

I intend to evaluate their performance, as well as rewrite one or more to kangaroo.

#################

multicore, CPU only, c++, based on engine VanitySearch1.15
https://github.com/Telariust/vs-kangaroo/releases

bignum lib by Jean_Luc is good, only %15 slower than bitcoin-core/secp256k1 lib (with raw asm mult)

Code:

C:\Users\User\source\repos\VanitySearch-1.15_kangaroo\x64\Rel_SM52>vs-kangaroo -v 1 -t 4 -bits 42
[###########################################################]
[# Pollard-kangaroo PrivKey Recovery Tool #]
[# (based on engine of VanitySearch 1.15) #]
[# bitcoin ecdsa secp256k1 #]
[# ver0.01 #]
[###########################################################]
[DATE(utc)] 08 Oct 2019 23:07:47
[~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~]
[pow2bits] 42
[rangeW] 2^41..2^42 ; W = U - L = 2^41
[DPsize] 1024 (hashtable size)
[~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~]
[pubkey#42] loaded
[Xcoordinate] EEC88385BE9DA803A0D6579798D977A5D0C7F80917DAB49CB73C9E3927142CB6
[Ycoordinate] 28AFEA598588EA50A6B11E552F8574E0B93ABD5595F5AA17EA3BE5304103D255
[~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~]
[+] Sp-table of pow2 points - ready
[~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~]
[+] recalc Sp-table of multiply UV
[UV] U*V=1*3=3 (0x03)
[optimal_mean_jumpsize] 2097152
[meanjumpsize#24] 2097151(now) <= 2097152(optimal) <= 4026531(next)
[i] Sp[24]|J------------------------------------------------------------|Sp[25]
[JmaxofSp] Sp[24]=2097151 nearer to optimal mean jumpsize of Sp set
[DPmodule] 2^18 = 262144 (0x0000000000040000)
[+] 1T+3W kangaroos - ready
[CPU] threads: 4
[th][tame#1] run..
[th][wild#1] run..
[th][wild#2] run..
[th][wild#3] run..
[|][ 00:00:04 ; 1.1M j/s; 4.0Mj 107.4%; dp/kgr=10.0; 00:00:00 ]
[~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~]
[prvkey#42] 0x000000000000000000000000000000000000000000000000000002A221C58D8F
[i] 1.0M j/s; 5.0Mj of 4.0Mj 123.6%; DP 1T+3W=8+14=22; dp/kgr=11.0;
[runtime] 00:00:05
[~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~]
[DATE(utc)] 08 Oct 2019 23:07:53
[~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~]
[x] EXIT

#################

Quote from: j2002ba2 on July 11, 2019, 12:14:49 PM

BitCrack runs at about 715 MKeys/s on Tesla V100 - look here.
If we remove hash160, the rate would be 1430 Mj/s. Mj here stands for million kangaroo jumps.

not

BitCrack use batch packet inversion (x8 speed-up).
We cant(OR CAN?..) use its in kangaroo method, so 1430/8 = 175M j/s

but on screen

Quote from: j2002ba2 on July 12, 2019, 10:53:45 AM

https://imgur.com/a/f3zTmTa

see 4xV100=6515M j/s, each 1600M j/s

now unclear..

#################

https://bitcointalk.org/index.php?topic=1306983.msg48224432#msg48224432
November 25, 2018 this man speaks as if he already has a finished implementation.

Quote from: j2002ba2 on July 12, 2019, 10:53:45 AM

https://imgur.com/a/f3zTmTa

O great master, j2002ba2, I urge you!

Plz, come and explain to the miserable mortals, which means s2,s1,m3,m4 in your program!

#################
https://docs.nvidia.com/cuda/volta-tuning-guide/index.html
he high-priority recommendations from those guides are as follows:
- Find ways to parallelize sequential code;
- Minimize data transfers between the host and the device;
- Adjust kernel launch configuration to maximize device utilization;
- Ensure global memory accesses are coalesced;
- Minimize redundant accesses to global memory whenever possible;
- Avoid long sequences of diverged execution by threads within the same warp;

about last, " - Avoid long sequences of diverged execution by threads within the same warp;"
parallelism algo by Pollard (U+V) escape problem of collisions between kangaroos of the same herd;
this allows you to completely abandon the correction block because adding IF () inevitably breaks the parallelism of the GPU;

#################

to be continued..

Post

Topic

Board Bitcoin Discussion

Merits 1 from 1 user

Re: Science Fair Project to trap Bitcoin private keys using Kangaroos!

Telariust

on 16/09/2019, 18:58:06 UTC

⭐ Merited by A-Bolt (1)

Quote from: Telariust

it very slow for 1core C/C++ openssl/secp256k1, it should be more!

talk less, work more... my C99 prototype ready!
everything is exactly as I described above

1core i7-6820, win7x64

// MODE: J + A -> J, xJ->xA ; 0.23Mk/s; secp256k1_gej_add_ge_var(), convert only X jacobian to affine

Code:

C:\cygwin64\home\User\pollard-kangaroo>pollard-kangaroo.exe

[###########################################################]
[# Pollard-kangaroo PrivKey Recovery Tool #]
[# C99, bitcoin-core/secp256k1 library #]
[# singlecore #]
[# ver 0.1 #]
[###########################################################]
[DATE(utc)] 16 Sep 2019 18:23:53
[~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~]
[pow2bits] 2^40
[range] 2^39..2^40 ; W = U - L = 0x0000008000000000 (2^39)
[maxDP] 1024 (max elements in hashtable)
[DPmodule] 0x0000000000040000
[pow2Jmax] 23
[~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~]
--> TEST MODE
[pubkey(tests)] 03A2EFA402FD5268400C77C20E574BA86409EDEDEE7C4020E4B9F0EDBEE53DE0D4
--> pubkey valid!
[~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~]
[+] Sp-table of pow2 points - ready
[~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~]
[+] T1+W1 herds - ready
--> [0y 0m 0d 00:00:06s] [0.23Mk/s] [0y 0m 0d 00:00:11s]
[~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~]
[i] 1700051j; DP T+W=4+3=7; dp/kgr=3.5;
[prvkey#40] 000000000000000000000000000000000000000000000000000000E9AE4933D6
[#############################; passtime: 0y 0m 0d 00:00:06s]
[####################################; precision: 6s 38ms]
[~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~]
[DATE(utc)] 16 Sep 2019 18:23:59
[~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~]
[x] EXIT

40bits, w=39bit, 2w^(1/2) at 17sec

// MODE: 1*J+k*G->J, xJ->xA ; 0.09Mk/s; secp256k1_ecmult(), convert only X jacobian to affine

Code:

--> [0y 0m 0d 00:00:16s] [0.09Mk/s] [0y 0m 0d 00:00:27s]
[~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~]
[i] 1700051j; DP T+W=4+3=7; dp/kgr=3.5;
[prvkey#40] 000000000000000000000000000000000000000000000000000000E9AE4933D6
[####################################; precision: 17s 18ms]

40bits, w=39bit, 2w^(1/2) at 43sec

// MODE: k*G->J, J+J->J, xJ->xA; 0.03Mk/s; secp256k1_ecmult_gen() + secp256k1_gej_add_var() , convert only X jacobian to affine

Code:

--> [0y 0m 0d 00:00:48s] [0.03Mk/s] [0y 0m 0d 00:01:19s]
[~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~]
[i] 1700051j; DP T+W=4+3=7; dp/kgr=3.5;
[prvkey#40] 000000000000000000000000000000000000000000000000000000E9AE4933D6
[####################################; precision: 51s 595ms]

40bits, w=39bit, 2w^(1/2) at 127sec

#################

Quote from: BurtW

Quote from: Telariust

so as not to create possible problems for you, I will publish the source code only after you.

We are doing this to learn. It would be no problem for us if you publish your source code. We already have the published python source code and have looked at it in order to improve our program. You do not need to wait for us to publish. The science fair is not until next year so we have a lot of time to get everything just right and then move the code on to a GPU.

Ok! ~~I will publish a little later, need to tidy the code.~~

github.com/Telariust/pollard-kangaroo-c99/releases/
Enjoy! Grin

#################

Quote from: Telariust

At this time, I will try to write addition in affine coordinates, which is the fastest way, +30% expected speed-up.

Quote from: Telariust

in ideal - need write new functions:
A+A->A (1I, 2M, 1S) - best choice, because in the end we need X in affine for unique representation each point to adding in storage hashtable..
after which do not need to spend 8M,3S, only nearly 20M to each 1I, it +35% speed-up

Affinity A + A-> A ready.
Real growth was about 6-8% (probably 20M per 1I from the article is overpriced)

runing test, task is 50
1core i7-6820, win7x64

Code:

[pow2bits] 2^50
[range] 2^49..2^50 ; W = U - L = 0x0002000000000000 (2^49)

[algo#1] J + A -> J, xJ->xA ; secp256k1_gej_add_ge_var()

Code:

--> [0y 0m 0d 00:09:16s] [0.27Mk/s] [0y 0m 0d 00:00:00s]
[i] 152944440j; DP T+W=4+6=10; dp/kgr=5.0;
[prvkey#50] 00000000000000000000000000000000000000000000000000022BD43C2E9354
[#############################; passtime: 0y 0m 0d 00:09:16s]
[####################################; precision: 556s 699ms]

[algo#0] A + A -> A, xA ready ; the fastest!

Code:

--> [0y 0m 0d 00:08:42s] [0.29Mk/s] [0y 0m 0d 00:00:00s]
[i] 152944440j; DP T+W=4+6=10; dp/kgr=5.0;
[prvkey#50] 00000000000000000000000000000000000000000000000000022BD43C2E9354
[#############################; passtime: 0y 0m 0d 00:08:42s]
[####################################; precision: 522s 492ms]

runtime: 556/522 = x0.065 (+6,5%)
hashrate: 0.29/0.27 = x0.074 (+7,4%)

miserable improvement, less statistical error... certainly better than nothing;
but checked the hypothesis A+A->A;

Short compare algoritms:
1core i7-6820, win7x64

Code:

// 0: A + A -> A, xA ready ; 0.291Mk/s; the fastest
// 1: J + A -> J, xJ->xA ; 0.274Mk/s; secp256k1_gej_add_ge_var()
// 2: J + A -> J, xJ->xA ; 0.267Mk/s; secp256k1_gej_add_ge()
// 3: 0*P0+k*G->J, xJ->xA ; 0.091Mk/s; secp256k1_ecmult()
// 4: k*G->J, J+J->J, xJ->xA; 0.031Mk/s; secp256k1_ecmult_gen() + secp256k1_gej_add_var()

ready-made, if someone needs it.
Affine points addition, custom functions, C99, bitcoin-core/secp256k1
2A -> A (1I, 2M, 2S)
A + A -> A (1I, 2M, 1S)

Code:

// 2A -> A (1I, 2M, 2S)
void secp256k1_ge_double_var(secp256k1_ge *r, const secp256k1_ge *a){

secp256k1_fe rX, rY;
secp256k1_fe tmp1, tmp2;

tmp1 = a->y;
if ( (a->infinity == 1) || secp256k1_fe_normalizes_to_zero_var(&tmp1) ) {
secp256k1_ge_set_infinity(r);
return;
}

// s = (3x^2 + A)/(2y) # 1I 2M 1S
// A=0 B=7 (secp256k1)
secp256k1_fe_sqr(&tmp2, &a->x);//1S
tmp1 = tmp2;
secp256k1_fe_add(&tmp1, &tmp2);
secp256k1_fe_add(&tmp1, &tmp2);
secp256k1_fe_normalize_weak(&tmp1);

tmp2 = a->y;
secp256k1_fe_add(&tmp2, &a->y);
secp256k1_fe_normalize_weak(&tmp2);
secp256k1_fe_inv_var(&tmp2, &tmp2);//1I

secp256k1_fe_mul(&tmp1, &tmp1, &tmp2);//1M

// x' = s^2-2x # 1S
secp256k1_fe_sqr(&rX, &tmp1);//1S
tmp2 = a->x;
secp256k1_fe_add(&tmp2, &a->x);
secp256k1_fe_negate(&tmp2, &tmp2, 1);
secp256k1_fe_add(&rX, &tmp2);
secp256k1_fe_normalize_weak(&rX);

// y' = s(x-x')-y # 1M
secp256k1_fe_negate(&tmp2, &rX, 1);
rY = a->x;
secp256k1_fe_add(&rY, &tmp2);
secp256k1_fe_normalize_weak(&rY);
secp256k1_fe_mul(&rY, &rY, &tmp1);//1M
secp256k1_fe_negate(&tmp2, &a->y, 1);
secp256k1_fe_add(&rY, &tmp2);
secp256k1_fe_normalize_weak(&rY);

r->x = rX;
r->y = rY;
}

// A + A -> A (1I, 2M, 1S)
void secp256k1_ge_add_ge_var(secp256k1_ge *r, const secp256k1_ge *a, const secp256k1_ge *b){

if ( (a->infinity == 1) || (b->infinity == 1) ) {
if ( (a->infinity == 1) ^ (b->infinity == 1) ) {
if (a->infinity == 1) { r->x = b->x; r->y = b->y; return; }
if (b->infinity == 1) { r->x = a->x; r->y = a->y; return; }
}else{
secp256k1_ge_set_infinity(r);
return;
}
}

secp256k1_fe rX, rY;
secp256k1_fe tmp1, tmp2;

secp256k1_fe aXneg;
secp256k1_fe_negate(&aXneg, &a->x, 1);
secp256k1_fe aYneg;
secp256k1_fe_negate(&aYneg, &a->y, 1);

//dx = B.x - A.x
tmp1 = b->x;
secp256k1_fe_add(&tmp1, &aXneg);

//dy = B.y - A.y
tmp2 = b->y;
secp256k1_fe_add(&tmp2, &aYneg);

if (secp256k1_fe_normalizes_to_zero_var(&tmp1)) {
if (secp256k1_fe_normalizes_to_zero_var(&tmp2)) {
secp256k1_ge_double_var(r, a);
return;
}else{
secp256k1_ge_set_infinity(r);
return;
}
}

secp256k1_fe_normalize_weak(&tmp1);
secp256k1_fe_normalize_weak(&tmp2);

secp256k1_fe_inv_var(&tmp1, &tmp1);//1I

//c = dy * invert(dx, p) % p # 1I,1M
secp256k1_fe c;
secp256k1_fe_mul(&tmp2, &tmp2, &tmp1);//1M

//R.x = (c**2 - A.x - B.x) % p # 1S
secp256k1_fe_sqr(&rX, &tmp2);//1S
secp256k1_fe_add(&rX, &aXneg);
secp256k1_fe_negate(&tmp1, &b->x, 1);
secp256k1_fe_add(&rX, &tmp1);
secp256k1_fe_normalize_weak(&rX);

//R.y = (c*(A.x - R.x) - A.y) % p # 1M
rY = a->x;
secp256k1_fe_negate(&tmp1, &rX, 1);
secp256k1_fe_add(&rY, &tmp1);
secp256k1_fe_normalize_weak(&rY);
secp256k1_fe_mul(&rY, &rY, &tmp2);//1M
secp256k1_fe_add(&rY, &aYneg);
secp256k1_fe_normalize_weak(&rY);

r->x = rX;
r->y = rY;
}

From https://www.nayuki.io/page/elliptic-curve-point-addition-in-projective-coordinates

#################

Quote from: Telariust

Quote from: 57fe

..Tests with C++ (VS2013) leads to x2..x3 faster calculations only (single-thread), but much more complicated code..

..because this is the limit of computation on the cpu

and now we have the answer

ver0.1 C99, win7x64, algo A+A->A
1core i7-6820
0.291Mk/s
1core i5-2540
0.219Mk/s

ver0.9 python37x64+gmpy2, win7x64, algo A+A->A
1core i7-6820
174.3K j/s;
1core i5-2540
99.7K j/s;

Post

Topic

Board Bitcoin Discussion

Merits 1 from 1 user

Re: Science Fair Project to trap Bitcoin private keys using Kangaroos!

Telariust

on 14/09/2019, 20:34:32 UTC

⭐ Merited by A-Bolt (1)

Quote from: BurtW on September 10, 2019, 09:20:51 PM

static void secp256k1_ecmult_gen(const secp256k1_ecmult_gen_context *ctx, secp256k1_gej *r, const secp256k1_scalar *gn)
...
#define ec_point_mul_and_add(r,a,n) {

.. why the name of this function is not familiar to me? (although I used multiplication) .. and why I myself didnt feel the need to write a*b+c ?..
i looked at my source code - always use
// R = na*A + ng*G
// secp256k1_ecmult(*ctx, gej *r, const gej *a, const scalar *na, const scalar *ng);

Quote

1899. Mr. Deull's most famous attributed utterance is that "everything that can be invented has been invented."

I bet your experiments with ECMULT_WINDOW_SIZE didnt give anything, because secp256k1_ecmult_gen() doesnt use it Cheesy

I used exactly secp256k1_ecmult() because it is accelerated in ECMULT_WINDOW_SIZE.
moreover, secp256k1_ecmult() uses endomorphism to accelerate multiplication (but only if you use both na*A+ng*G, not one of them)

also why _ecmult() more faster than _gen()
https://github.com/bitcoin-core/secp256k1/pull/41
https://github.com/bitcoin-core/secp256k1/pull/210#issuecomment-96145070
https://github.com/bitcoin-core/secp256k1/pull/210#issuecomment-96166989

simple init for secp256k1_ecmult()

Code:

//secp256k1_context *ctx = secp256k1_context_create(SECP256K1_CONTEXT_NONE);
secp256k1_context *ctx = secp256k1_context_create(SECP256K1_CONTEXT_SIGN | SECP256K1_CONTEXT_VERIFY);

// ecmult vars
secp256k1_scalar scalarK; secp256k1_scalar_set_int(&scalarK, 123456);
secp256k1_scalar scalar0; secp256k1_scalar_set_int(&scalar0, 0);
secp256k1_gej point_0gej; secp256k1_gej_set_infinity(&point_0gej);

// jacobian base point G
secp256k1_gej secp256k1_gej_const_g;
secp256k1_gej_set_ge(&secp256k1_gej_const_g, &secp256k1_ge_const_g);

#################

I was interested to compare these functions.

1core i7-6820
secp256k1 (now, without endomorphism)

Code:

make clean
./autogen.sh
./configure --enable-ecmult-static-precomputation --with-bignum=gmp --with-scalar=64bit --with-field=64bit --with-asm=x86_64
make

default ECMULT_WINDOW_SIZE = 15

########

Test#1, multiplication, increment sequential points, calculation 1million pubkeys in loop:

Code:

for(uint64_t i = 1; i<1000000 ; ++i){
secp256k1_scalar_set_uint64(&scalarK, i); // sequential points
MULT_FUNCTION_HERE(&context, point_jacobian, &scalarK)
}

results of benchmark

Code:

//// Multiply: R = q*A (in constant-time)
secp256k1_ecmult_const(&TestGej, &secp256k1_ge_const_g, &scalarK, 256);

1M at 58,0 sec; 0,017Mk/s
constant-time? hoho)

Code:

//// Multiply: R = a*G (with the generator)
//// To harden against timing attacks .. Break up the multiplicand into groups of..
secp256k1_ecmult_gen(&ctx->ecmult_gen_ctx, &TestGej, &scalarK);

1M at 26,5 sec; 0,037Mk/s
.. against timing attacks? y, its need rewrite, its right.

Code:

//// R = na*A + ng*G
//// secp256k1_ecmult(*ctx, gej *r, const gej *a, const scalar *na, const scalar *ng);
//secp256k1_ecmult(&ctx->ecmult_ctx, &TestGej, &secp256k1_gej_const_g, &scalarK, NULL); // NULL not recommended!
secp256k1_ecmult(&ctx->ecmult_ctx, &TestGej, &secp256k1_gej_const_g, &scalarK, &scalar0); // k*G + 0*G

1M at 8,2 sec; 0,121Mk/s

same function, another

Code:

//// R = na*A + ng*G
//secp256k1_ecmult(&ctx->ecmult_ctx ,&TestGej, NULL, &scalar0, &scalarK); // NULL not recommended!
secp256k1_ecmult(&ctx->ecmult_ctx, &TestGej, &point_0gej, &scalar0, &scalarK); // 0*P0 + k*G

1M at 3,5 sec; 0,285Mk/s
now u can see, how efficiency working ECMULT_WINDOW_SIZE (because we calculation sequential points)

secp256k1_ecmult() (option "0*P0 + K*G") - best choice (x7.5 faster than secp256k1_ecmult_gen() for multiply sequential points)

########

Test#2, multiplication, pseudo random points, calculation 1million pubkeys in loop:

Code:

for(uint64_t i = 1; i<1000000 ; ++i){
secp256k1_scalar_set_uint64(&scalarK, TestGej.x.n[0]); // pseudo random points
MULT_FUNCTION_HERE(&context, point_jacobian, &scalarK)
}

results of benchmark

Code:

secp256k1_ecmult_const(&TestGej, &secp256k1_ge_const_g, &scalarK, 256);

1M at 58,5 sec; 0,017Mk/s
same

Code:

secp256k1_ecmult_gen(&ctx->ecmult_gen_ctx, &TestGej, &scalarK);

1M at 26,5 sec; 0,037Mk/s
same

Code:

secp256k1_ecmult(&ctx->ecmult_ctx, &TestGej, &secp256k1_gej_const_g, &scalarK, &scalar0); // k*G + 0*G

1M at 16,4 sec; 0,061Mk/s
diff! slower

same function, another

Code:

secp256k1_ecmult(&ctx->ecmult_ctx, &TestGej, &point_0gej, &scalar0, &scalarK); // 0*P0 + k*G

1M at 9,5 sec; 0,105Mk/s
diff! slower
efficiency lower, ECMULT_WINDOW_SIZE working is not so good (because we calculation pseudo random points)

secp256k1_ecmult() with "0*P0 + K*G" - best choice (x2.8 faster than secp256k1_ecmult_gen() for multiply pseudo random points)

########

Quote from: Telariust

..multiplication points is very expensive, more expensive than the slowest addition points.

Test#3, addition points, calculation 1million pubkeys in loop:

Code:

for(uint64_t i = 1; i<1000000 ; ++i){
ADD_FUNCTION_HERE(&result_point_jacobian, 1point_jacobian, 2point_affine)
}

results of benchmark

Code:

secp256k1_gej_add_ge(&TestGej, &TestGej, &secp256k1_ge_const_g);

1M at 0sec 337msec; 2,9Mk/s

Code:

secp256k1_gej_add_ge_var(&TestGej, &TestGej, &secp256k1_ge_const_g, NULL);

1M at 0sec 285msec; 3,5Mk/s
3,5Mk/s, Karl!

Code:

secp256k1_gej_double_var(&TestGej, &TestGej, NULL);

1M at 0sec 183msec; 5,4Mk/s
(its double, rarely used)

secp256k1_gej_add_ge_var() - best choice (x33.3 faster than secp256k1_ecmult(), x92.9 faster than secp256k1_ecmult_gen())

https://github.com/Telariust/pollard-kangaroo-c99/blob/master/compare-test.c

Post

Topic

Board Bitcoin Discussion

Re: Science Fair Project to trap Bitcoin private keys using Kangaroos!

Telariust

on 10/09/2019, 16:55:56 UTC

~~The claimed avg "expected of 2w^(1/2) group operations" is not achievable with m = w^(1/2))/2 (mean jump size from acticles by Pollard)~~
~~Empirical tests have shown that 2w^1/2 is achievable at m * 8.~~
~~I ask you to check on your implementation what speed will be if the MID/MAX step size is increased by 8 times, plz.~~
thx, no need
all norm after add

Code:

# get JmaxofSp
def getJmaxofSp(optimalmeanjumpsize, dS):
if flag_verbose > 0:
print('[optimal_mean_jumpsize] %s' % optimalmeanjumpsize)

sumjumpsize = 0

for i in range(1,len(dS)):

#sumjumpsize = (2**i)-1
#sumjumpsize += 2**(i-1)
sumjumpsize += dS[i-1]

now_meanjumpsize = int(round(1.0*(sumjumpsize)/(i)))

#next_meanjumpsize = int(round(1.0*(sumjumpsize+2**i)/(i+1)))
next_meanjumpsize = int(round(1.0*(sumjumpsize+dS[i])/(i+1)))

if flag_verbose > 1:
print('[meanjumpsize#Sp[%d]] %s(now) <= %s(optimal) <= %s(next)' % (i, now_meanjumpsize, optimalmeanjumpsize, next_meanjumpsize ))

if optimalmeanjumpsize - now_meanjumpsize <= next_meanjumpsize - optimalmeanjumpsize :
if flag_verbose > 0:
print('[meanjumpsize#Sp[%d]] %s(now) <= %s(optimal) <= %s(next)' % (i, now_meanjumpsize, optimalmeanjumpsize, next_meanjumpsize ))

if flag_verbose > 0:
print('[JmaxofSp] Sp[%s]=%s nearer to optimal mean jumpsize of Sp set' % (i, now_meanjumpsize))

return i

print("\n[FATAL_ERROR] JmaxofSp not defined!\n"); exit(-1)

#################
about speed

Quote from: Firebox on September 10, 2019, 09:01:32 AM

Quote from: BurtW on September 09, 2019, 03:56:42 PM

We have updated the OP with our latest results. Here is a summary of the results of runs for tests 0 through 49:...

That is already good result!

no, absolutely not

August 26, 2019

Code:

pkey Standard OpenSSL library secp256k1 (S 8X32, F 10X26)
bits hops/second test run time hops/second test run time
==== =========================== ===========================
31 1942.378857 28.4496 secs 5520.634781 10.0240 secs

Standard OpenSSL library averaged 1897 hops per second
secp256k1 (S 8X32, F 10X26) averaged 5258 hops per second

September 09, 2019 (secp256k1)

Code:

test time hps hops
---- -------------- ------- ----------
31 9.2328 secs 11,884 109705

already better, but..

it very slow for 1core C/C++ openssl/secp256k1, it should be more!
(even if u built early classic 1 Wild algo with 3.28w^(1/2), this does not explain x10 speed drop)

Quote

We were just trying to quickly get everything to work so we have not optimized anything yet.

ok, we expect and hope

see for compare

Algo: 1 Tame + 1 Wild with distinguished points, expected of 2w^(1/2) group operations
avg stat after 10tests

1core i5-2540, python37x64, win7x64

Code:

[i] 8.6K j/s; ...
...
[################################################]
[averages] expected of 2w^(1/2) group operations
-------|--------/--------|---------------------------------/---------------------------------|
W |jump avg/2w^(1/2)| time avg/2w^(1/2) |
-------|--------/--------|---------------------------------/---------------------------------|
>2^031 | 84.0K/ 92.7K | 0y 0m 0d 00:00:10s 002ms / 0y 0m 0d 00:00:11s 042ms |
----------------------------------------------------------------------------------------------

(it raw python!)

1core i5-2540, python37x64 + gmpy2, win7x64

Code:

[i] 73.6K j/s; ...
...
[################################################]
[averages] expected of 2w^(1/2) group operations
-------|--------/--------|---------------------------------/---------------------------------|
W |jump avg/2w^(1/2)| time avg/2w^(1/2) |
-------|--------/--------|---------------------------------/---------------------------------|
>2^031 | 90.0K/ 92.7K | 0y 0m 0d 00:00:01s 150ms / 0y 0m 0d 00:00:01s 185ms |
----------------------------------------------------------------------------------------------

#################
about secp256k1 library

Legendary main question from legendary master

root of all ecc problems)

Quote from: arulbero on February 27, 2019, 07:25:46 PM

Are you using affine or jacobian coordinates for the points?

~~42: Answer to the Ultimate Question of Life, the Universe, and Everything"~~

secp256k1 have functions J+J->J , J+A->J , and no have functions A+A->A (Is this a precisely optimized library for bitcoin?

)
(all simple, because in real often need multiply rand point instead adding, and jacobian is better for multiply)
J+J->J (12M, 4S) - bad choice
J+A->J (8M, 3S) - already better
(look eprint.iacr.org/2016/103.pdf)
you can take the code from break_short by Jochen Hoenicke, it fits perfectly.
this code uses J+A->J and convert only X to Affine xJ ->xA (its max what we have in secp256k1 as it is)
where A is S-table with pre-computed G,2G,4G,8G,16G..
..u used pre-computed, y? not say me what u calc every Si permanently used multiplication, nonono,plz,no! X%mask instead pow2(X%k)?
(..by the way, that would explain your terrible speed..)
..and use old alternatively functions without protection against side channel attacks, +10-20% speed
secp256k1_gej_add_ge_var() instead secp256k1_gej_add_ge()
(warn, _var() need normalize before use in some functions.. my early mistake, "leading zeros" github.com/bitcoin-core/secp256k1/issues/601)
in ideal - need write new functions:
A+A->A (1I, 2M, 1S) - best choice, because in the end we need X in affine for unique representation each point to adding in storage hashtable
..u used hashtable, y?)
after which do not need to spend 8M,3S, only nearly 20M to each 1I, it +35% speed-up
like that

########
I seem to know what the problem is.
This is a very obvious fact for me, I should have started with it.
(I should have guessed at a time when about ECMULT_WINDOW_SIZE)

No multiplies the points inside the loop!
I warn, multiplication points is very expensive, more expensive than the slowest addition points.
Shouldn't use it at all in multi-million integration cycles.
You must calculate all possible displacement points before the start of the cycle.
Then just add them, being inside cycle.

I think you are using a mask
https://bitcointalk.org/index.php?topic=5166284.msg52068840#msg52068840
..so you cant store several million pre-calculated points in memory. it explains your choice.

Quote

x = PRF(X)
= 1 << (X % m)
...
ec_point_mul_and_add(r,a,n)

not mask, i see,.. then simple, u tried built classic pattern of a*b+c

Quote

..rewritten the following functions..

I tell you - take loop from break_short as it is (with standart functions), rewrite to kangaroos, and benchmark it.
if after you still want to rewrite what, then ok (but i think what not)
I think the "secret sauce" will not create, it will turn out an exotic slow implementation through multiplication.
I would like to be mistaken, but my experience says that you lose time.
..How do I know about multiplication?.. I checked it myself,.. and from the topic about why brainflayer is slow and why it cannot be done faster,.. and it was on one of the topics pages LBC.
This is not the most famous fact, it is not surprising that you did not know about it, no fault.

Its not random, not brainflayer, its pseudo random walk, so we can pre-compute our jumps size (offset points) for using addition points instead multiplication points.

Post

Topic

Board Development & Technical Discussion

Re: BitCrack - A tool for brute-forcing private keys

Telariust

on 07/09/2019, 10:58:04 UTC

Quote

What about pollard-rho versions of this people talk in other threads, but there is no publicly available tool to try it (to my knowledge)?

https://bitcointalk.org/index.php?topic=5166284.msg52318676#msg52318676

Quote from: almightyruler on September 07, 2019, 01:40:14 AM

You'd probably need to perform string mangling on the GPU to keep the cores working as much as possible, because the transfer of each and every candidate phrase from CPU/system memory to the GPU could end up being a severe bottleneck. Perhaps a deliberately slower algorithm like warp may work better, since the GPU will spend most of its time calculating rather than transferring to/from the host... but how common are non SHA256 brainwallets? As you state, any passphrase based search (which is essentially a set of random keys) will be slower than a sequential search.

I also thought so for a long time.
not really if the size does not exceed 49152

Code:

size_t stackSize = 49152;
err = cudaDeviceSetLimit(cudaLimitStackSize, stackSize);

Quote from: Jean_Luc on March 14, 2019, 06:19:22 AM

..For VanitySearch, having a smaller group size is better (This is a reason why I worked a lot on this DRS62 ModInv implementation). I can double the size of the group (I will definitely do it) but not more. The GPU kernel performs one group per thread and send back hash160 to the CPU. If the group size is too large, memory transfer and allocation become a problem. Divide and rule Wink

VanitySearch restarts the kernel about 1000 times per second (!!!), and it works fine.
opencl unknow

..and this is one of the main technical reasons why gpu BF do not.
cpu BF checks huge dictionaries in a few hours. loading such volumes into gpu is problematic.
therefore gpu BF should work using the built-in generator, e.g. brute force seed.

Post

Topic

Board Development & Technical Discussion

Re: BitCrack - A tool for brute-forcing private keys

Telariust

on 06/09/2019, 17:09:37 UTC

Quote from: Angelo1710

..Ty for your explanation, I can't pm u back because of messaging limits. So basically there is no way to combine really fast random hex generator to work with the calculation process? So all we can do is randomly get starting points and from there work in increments.

True, according to my knowledge after 1 year of research.

In fact, you want a gpu BrainFlayer. Everyone wants it)
Ryan refuses to write it because he is a WhiteHat.

If successful, it will be slower than sequentially calculating points.

##########################
heuristic calculate the hashrate for BrainFlayer cuda/opencl

BrainFlayer cpu sse
0,1 Mk/s - 1core i7-6820
8core - 4core real, 4core hyper-threading, so x6 instead x8
0,6 Mk/s - 8core i7-6820

VanityGen x64 opencl
60.7 Mk/s - gtx980
0,52 Mk/s - 8core i7-6820

60.7/0,52 = x116,7 (vg opencl gpu/cpu)

VanityGen x64 cpu sse
1,38 Mk/s - 8core i7-6820

1,38/0,52 = x2,6 (vg sse/opencl)

now if imagine
BrainFlayer opencl
0,6x(1/2,6) = 0,23 Mk/s - 8core i7-6820
0,23x116,7 = 26,8 Mk/s - gtx980

if
clBitCrack opencl
67.7 Mk/s - gtx980
cuBitCrack cuda
153.5 Mk/s - gtx980
241.3 Mk/s - gtx1070

153.5/67.7 = x2,26 (bc cuda/opencl)

then
BrainFlayer cuda
26,8x2,26 = 60,5 Mk/s - gtx980
60,5x(241.3/153.5) = 94,9 Mk/s - gtx1070

Post

Topic

Board Development & Technical Discussion

Re: BitCrack - A tool for brute-forcing private keys

Telariust

on 05/09/2019, 23:11:03 UTC

Quote

..random on every point across selected keyspace. So every jump/try is random. Not like in pikachunakapika version where it chooses random staring points and then starts doing them in 1 increment at a time..

The Engine of these programs is divided into two types:

1) optimized point addition (for quick calculation of sequential keys)
sample programs: vanitygen, bitcrack, vanitysearch, break_short;
methods:
- calc only X coordinates (for compressed keys);
- use of functions without protection against side channel attacks (openssl/secp256k1);
main problem - inversion calculation is very expensive, ~20mult per 1 point;
can use batch inversion (Simultaneous algorithm, 1 inversion per 1000 keys) + Affines coordinates + symmetry(only full range) + endomorphism(only full range);

2) optimized point multiplication (for quick calculation of random keys)
sample programs: brainflayer;
methods:
- secp256k1 library with pre-calculated mult table in ram;
- Jacobian coordinates;
main problem random mult - inversion need too, but can not use batch inversion because each key is random.

BitCrack is optimized for sequentially calculating points in a limited range using addition.
VanitySearch is optimized for sequentially calculating points in the full range using addition (and 4 multiplication algorithms that work only for the full range).
BrainFlayer (cpu only) is optimized for random calculating points in a full range using multiplication.
You want to get random points in a limited range using multiplication.
These are fundamentally different tasks.

If you delve into the study of random generators, you will find that they have top speed.
Typically, the problem is that the overhead of generating random numbers is greater than the useful calculation itself.

PS:
about multiplication
At start BitCrack, pre-computes the starting points using multiplication.
It is so slow that starting from version 0.15 the author transferred the procedure to gpu.

see 1post - v0.0.6... the author does not come here for more than a year

Post

Topic

Board Bitcoin Discussion

Re: Bitcoin challenge transaction: ~100 BTC total bounty to solvers!

Telariust

on 01/09/2019, 18:38:49 UTC

compare

Code:

>cuBitCrack -c -d 0 1AVJKwzs9AskraJLGHAZPiaZcrpDr1U6AB
[2019-09-01.21:24:56] [Info] Compression: compressed
...
[2019-09-01.21:24:56] [Info] Initializing GeForce GTX 1070
[2019-09-01.21:24:56] [Info] Generating 262,144 starting points (10.0MB)
...
GeForce GTX 1070 232/8192MB | 1 target 73.28 MKey/s (403,701,760 total) [00:00:10]

and

Code:

>cuBitCrack -c -d 0 -b 15 -t 512 -p 1024 1AVJKwzs9AskraJLGHAZPiaZcrpDr1U6AB
[2019-09-01.21:24:56] [Info] Compression: compressed
...
[2019-09-01.21:23:20] [Info] Initializing GeForce GTX 1070
[2019-09-01.21:23:21] [Info] Generating 7,864,320 starting points (300.0MB)
...
GeForce GTX 1070 928/8192MB | 1 target 241.32 MKey/s (1,753,743,360 total) [00:00:10]

see?

need new column - optimal argv
i know
# GTX1060 = -b 9 -t 512 -p 1024
# GTX1070 = -b 15 -t 512 -p 1024
# GTX1080Ti= -b 28 -t 512 -p 1024

and 2cols of hashrate - with default and with optimal argv

..and its good content for add to 1post

Post

Topic

Board Bitcoin Discussion

Merits 1 from 1 user

Re: Bitcoin challenge transaction: ~100 BTC total bounty to solvers!

Telariust

on 31/08/2019, 10:12:18 UTC

⭐ Merited by th3nolo (1)

origin link
https://fe57.org/forum/thread.php?board=4&thema=1#1

Quote from: PrivatePerson on August 29, 2019, 07:17:40 PM

..Let's discuss the Pollard's kangaroo script from 57fe since it is not possible to register on its forum. Has anyone managed to increase the speed above 150k?..

Let's!

I will conduct a lot of tests and update this post.
Last update at Oct 09 2019, edit#34

#################
Notice, about power, it is the same

2¹²³ == 2^(123) == 2**(123) == pow(2,123) == pow2(123) == 1<<123
sqrt(2) == 2^1/2 == 2**0.5

2¹²³ - only this forum, outside of quote/code tags
2** - its python specific
pow() - popularity in many programm langs
2^ - classic old standart math text format, so i use it in 99%, not beautiful but it works

here x^y is math text format of power as x^y and not bit operation xor as [x xor y], do not confuse!
#################
# gmpy2

When I ported the code to coincurve lib, I was surprised to find that gmpy2 is faster.
Until this moment, I have not seen an implementation faster than in coincurve, my respect!
1core i5-2540 python37x64 win7x64

Code:

coincurve: 30528.2 j/s
coincurve+cffi: 41165.6 j/s
gmpy2: 90194.8 j/s

(with python27x64 -10%)
(on linux +20% and more!)

how to install gmpy2 on windows
1) download file .whl from https://www.lfd.uci.edu/~gohlke/pythonlibs/
2) put to root dir of python
3) install
Python27>python -m pip install gmpy2-2.0.8-cp27-cp27m-win_amd64.whl
Python37>python -m pip install gmpy2-2.0.8-cp37-cp37m-win_amd64.whl

#################
# speed cpu

Quote

..Tests with C++ (VS2013) leads to x2..x3 faster calculations only (single-thread), but much more complicated code..

..because this is the limit of computation on the cpu
take for example break_short
1core i5-2540 = 0,16 Mkeys/s (C secp256k1 x64 without calc Y) (A + J -> J; xJ -> xA)
add(points) using every loop in break_short
add(points) using every jump in kangaroo
therefore their speed is equivalent

#################
# about kangaroos algo by 57fe

Quote

..Now we give the improved algorithm of [11] (for a serial computer).
The two kangaroos now make alternate jumps, and T starts at the middle of the interval, i.e. x0 = r^M , where M = [(L + U)/2].
Whenever xi falls in D, we store the pair (xi, di) together with the name of the kangaroo (T or W).
This information may be stored in a hash table, hashing on xi.
When some xi is stored for the second time, having been reached by both kangaroos, we can compute the solution.
Thus, if xi(forT)=x'j(forW), we have z = M + di - d'j .
In this version, unlike [5], we do not know which kangaroo is in the lead,..

this is based of [5]

Quote

Launch m/2 tame kangaroos with starting points g^(b/2)+iv, for 0<=i(avoid a power of 2 for v so that the probability of tame kangaroos following the same path is not abnormally high).
At the same time, launch m/2 wild kangaroos with starting points yg^b+iv, for 0<=iWhenever a kangaroo lands on a distinguished point, store it in a list common to all processors.
With each distinguished point, store a flag indicating the kangaroo type (tame or wild) and the distance
from g0 (for tame kangaroos) or from y (for wild kangaroos).
Suppose that a kangaroo reaches a distinguished point that is already in the list.
If the kangaroos are of the same type, then move the trailing kangaroo forward by some small random distance so that it will not repeat the other kangaroos path.
If the distinguished point was produced by kangaroos of different type, then subtract the distances to get the desired logarithm and we are done.

and this is parallelism of [11]

[5] J. M. Pollard, Monte Carlo methods for index computation (mod p), Math. Comp., #32 (1978)
[11] P. C. van Oorschot and M. J. Wiener, Parallel collision search with cryptanalytic applications, J. Cryptology, #12 (1999)

I studied the source:
- used pre-calc Set jumps with size pow2 (by Pollard)
- creates 8 Tame kangaroos + 8 Wild kangaroos (idea 1T+1W by Oorschot&Wiener of [11]);
..but why 8/8?? 1/1 be must norm.. if set 1/1 - speed drops - calculation of jumpsize and startrange not optimal..
..i think, author try fix low efficiency (see below "anomaly detected"), or its trick to avg runtime, but fact - its slower -30%
(answer: variable kangoo_power (number kangaroos in herd) affects the step size, so the illusion is created that T8+W8 is optimal, although in fact it should not differ from T1+W1)
- gradually added new distinguished points in T+W sets that pass through DP_rarity (distinguished points for economy ram by Oorschot&Wiener);
hmm... has problem:

Quote

..where 0 <= uj,vj < r are chosen uniformly at random..
...
Launch m/2 tame kangaroos with starting points g^(b/2)+iv, for 0 <= i < m, where v is a small constant
(avoid a power of 2 for v so that the probability of tame kangaroos following the same path is not abnormally high).
...
We analyse a variation of their method, which avoids thepossibility of collisions between members of the same herd.
The numbers of kangaroos in the herds will be coprime integers u and v, nearly equal, with u+v=P.
Thus if P=4p, we could take u=2p+1 and v=2p-1. Their jumps will all be multiples of uv..

coprime integers! or 2p(+/-)1 integers! not equal random.randint(1,(1<<(problem-1))) !
and uniqueness new point is not checked.
some kangaroos can falls into the trap of same sequences (without new distinguished points);

Quote

..If the kangaroos are of the same type, then move the trailing kangaroo forward by some small random distance so that it will not repeat the other kangaroos path..

this reduces speed, jumps indicating norm, but really there are many calculation useless same sequences.

Quote

..If there is a collision between two kangaroos of the same herd then it will eventually be detected when the second one lands on the same distinguished point as the first.
In [473] it is suggested that in this case the server should instruct the second kangaroo to take a jump of random length so that it no longer follows the path of the front kangaroo.
Note that Teske [606] has shown that the expected number of collisions within the same herd is 2, so this issue can probably be ignored in practice.

omg, but ok

#################
# Most famous optimizations of Pollard
# about complexity, runtime and RAM

Quote

1) not use distinguished points (early, of [5])
..runtime of approximately 3.28w^1/2 group operations
minimal RAM

2) use distinguished points
..The kangaroo method (of [11, p. 13]) can solve in about 2w^1/2 expected operations..
It is expected to take 1.64 times faster than early, of [5].
need more RAM for saved points

why use distinguished points more efficiency

Quote

..by keeping previous data, the number of collisions found grows as the square of the time spent..
(because, the number of pairs of points grows as the square of the number of points produced)

"three-kangaroo algorithm"
("Using Inversion", but I think it means symmetry of Y)
One can perform a version of the kangaroo method with three kangaroos:
One tame kangaroo starting from g^u for an appropriate value of u and two wild kangaroos starting from h and h^(-1) respectively.
runtime (average-case expected) of 1.897w^1/2 group operations if m=0.316w^1/2 (m is avg size 1 jump)
runtime (average-case expected) of 1.818w^1/2 group operations if m=0.375w^1/2

"four-kangaroo algorithm"
runtime (average-case expected) of 1.714w^1/2 group operations if m=0.375(2w^1/2)

other algo avg runtime for compare

"Gaudry-Schost algorithm"
The basic idea of the Gaudry-Schost algorithm is as follows. One has pseudorandom walks in two (or more) subsets of the group such that a collision between walks of different types leads to a solution to the discrete logarithm problem.
runtime (average-case expected) of 1.660w^1/2 group operations

"Baby-step Giant-step method of Shanks
runtime (average-case expected) of (3/2)w^1/2 group operations if first compute Baby and save in hashtable
runtime (average-case expected) of (4/3)w^1/2 group operations if compute both sets at the same time (double RAM)
very more RAM, w^1/2

pollard-rho
Theorem1 - 1.25w^1/2

Quote

..Theorem1 makes the assumption of true randomness.
However, it has been shown empirically that this assumption does not hold exactly for Pollards iteration function.
The actual performance is worse than the expected value given in Teske..

In prime order subgroups.. is 1.55w^1/2 and Teske is 1.27w^1/2
..while in groups of points of elliptic curves over finite fields.. is 1.60w^1/2 and 1.29w^1/2
1) Floyd improvement, y value is updated at each step y=F(F(y))
gcd is calculated for N and (y-x)
Floyds method needs on average 2.47(lt + lh) group operations (2.47 is three times the expected value of the epact).
2) Brent improvement (Brent improves Floyd)
check only (xi, xj), where j=2^k; k=1,2,3..a; i=[2^k+1; 2^(k+1)]
Brent has given an improved cycle finding method that still only requires storage for two group elements but which requires fewer group operations.
Brents method is 30-40% faster than Floyds. minimal RAM.
3) Teske improvement, used 20 kangaroo instead of 3(rho), +20%.
..running time the assumption is made that a random walk in the underlying group is simulated.
..this assumption does not hold for the walk originally suggested by Pollard: its performance is worse than in the random case.
..introduce a class of walks that lead to the same performance as expected in the random case.

Quote

Thus the complexity of Pollard rho method is determined by the size of the base group G on which random walk is performed and the running time of the iterating function.
Reducing either one of these two factors will result in speedup of the Pollard rho
...
In fact, the result on generic algorithm complexity implies that, to achieve results better than normal Pollard rho, we should either exploit the group encoding or invent a method that mostly uses the fastest of the group operations.

#################
# Compare pollard-rho vs pollard-kangaroo(lambda method*)

*parallel version rho resembles an image of the shape of the greek letter lambda, do not confuse!
*endomorphism have the constant is called lambda, do not confuse!

pollard-rho method effective for full field size Fp;
pollard-kangaroo method effective for partly field size Fp;
we can use rho for partly Fp, but runtime will remain as for full;
we can use kangaroo for full Fp, but it is about 1.60 times slower than rho; it becomes faster when b < 0.39p;

#################
# batch inversion, affine/jacobian, endomorphism, symmetry Y

Batch inversion is impossible.
Because its adding single points with pseudo random size step, instead total sequential compute.

Quote from: arulbero

"Are you using affine or jacobian coordinates for the points?"

its affine

Code:

# A + A -> A
# 1I, 2M, 1S
def add(P, Q, p=modulo):
R = Point()
dx = Q.x - P.x
dy = Q.y - P.y
if flag_gmpy2: # 1I, 1M
c = dy * gmpy2.invert(dx, p) % p
else:
c = dy * rev(dx, p) % p
R.x = (c*c - P.x - Q.x) % p # 1S
R.y = (c*(P.x - R.x) - P.y) % p # 1M
return R

But it's useless, we cannot economy multiples without batch inversion.

..how about using jacobian coordinates for adding points?
(to economy on inversions, 1I equal 20M, its very expensive)

Quote

One often uses projective coordinates to speed up elliptic curve arithmetic, so it is natural to use
projective coordinates when implementing these algorithms. But to define the pseudorandom walk one
needs a unique representation for points, so projective coordinates are not appropriate. See Remark 13.3.2.

test showed - X jacobian doesnt unique representation and distinguished points not found.
after convert J->A (only X) - all work, but -40% speed. conclusion - A+A->A is optimal. close.

Endomorphism is impossible.
Applying it to X coordinate is the same as changing discriminator(DP_rarity) - it will not give acceleration.
Speed depends only on number and size of jumps.

SymmetryY used in "three-kangaroo algorithm".
but 1.818w^1/2 vs 2w^1/2 very small speed-up efficiency. close.
SymmetryY used in "four-kangaroo algorithm"
1.714w^1/2 ..its (1.7/2)*100 = +15%. close.

Quote from: st0ffl

The idea is of course to only make sqr(2^(bit-2)) operations.

nice idea by st0ffl about x2 boost used SymmetryY
https://bitcointalk.org/index.php?topic=5173445.msg52616670#msg52616670

lol, in python x*x*x*... more faster than x**y

and byte_shift 1<<123 more fastest (x10) than 2**123

#################
# popular questions
# I carefully read your posts in the thread and reply to them here

Quote

Why does the work suddenly increase (by a factor of about 8 to 10, depending on time or jumps)

Quote

why do you need 10 Ntimeit(N_tests)?

Algo of Pollard is probablistic nature (Birthday Paradox, Kruskal's card trick, etc)
The solution time is not stable; several attempts are needed to obtain the avg runtime and jumps.

Quote

..(heuristic time rather than deterministic)..

######

Quote

pubkeys = {
, 33: ('02ed949eaca31df5e8be9bf46adc1dfae1734b8900dcc303606831372955c728da', False) #0x01abcd1234
,105: ('03bcf7ce887ffca5e62c9cabbdb7ffa71dc183c52c04ff4ee5ee82e0c55c39d77b', False)
..what does 3 column mean? And why do 33 and 105 matter False?..

it private key for double-check solution (demo/debug mode)
False if it unknow (recovery mode)
######

Quote

what i don't understand is , if 2⁵⁰ bit address can be found in 570 second.
then why, we can't find 2¹⁰⁵ bits of puzzle address in 1140 second? or bit more? but why it takes so much time than it should simply do...

105/50=2,1 ? not, wrong logic!
2⁵⁰ = 1125899906842624
2¹⁰⁵ = 40564819207303340847894502572032
2¹⁰⁵/2⁵⁰ = 36028797018963968 times more?
..no, its square dependence - need sqrt: x^1/2
((2¹⁰⁵)^1/2) / ((2⁵⁰)^1/2) =
((40564819207303340847894502572032)^1/2) / ((1125899906842624)^1/2) =
6369051672525772 / 33554432 =
189812531 times more! its right result
and now if 2⁵⁰ found in 570sec, then 2¹⁰⁵ is
189812531 * 570 = 108193142670 sec = 3478y 5m 5d 10:44:30s
######

Quote

..multi pubkeys at once check inside bit range..
..this script check only 1 pubkey in bit range..
..but would it work does someone a test if multiple pub keys work?..

yes, only 1
"multi" check at the cost of "multi" drop in performance!
(if u dont know, break_short has the same problem on Gigant-step calc)
sry, but no
######

Quote

The Pollard-Kangaroo ... seems to find keys even when the specified bit size doesn't match the key mask, but it does take longer. I guess the bits parameter is a hint at where to start searching, rather than a maximum search space limit.

Pollard answers

Quote

If z is outside the interval, the algorithm will still work but will take longer.

#################
# CPU build

Python implementation:
- python2/3 compatibility (print/time/input/xrange/IntDiv)
- auto adaptation under environment
- raw python/coincurve+cffi/gmpy2
- some checks, debug, more info, minor accelerate
- (un)compress/custom/random pubkey
- format time: 0y 0m 0d 00:00:00s 000ms
- format prefix SI: Kilo/Mega/Giga/Tera/Peta/Exa
- heuristic prognosis (avg)time and (avg)jump per bits
- repair kangaroos, that falls into the trap of same sequences
- location privkey in keyspace on the strip
- percent status progress, lost/left time
- support arbitrary range (keyspace) start:end

Python multicore implementation:
1) naive parallelism, split equally the search range
- warning! the lowest efficiency (w/Ncores)^1/2
2) parallelism by Oorschot&Wiener
- norm efficiency w^1/2/Ncores
- has problem of collisions between members of the same herd
3) parallelism by Pollard (best choice)
- norm efficiency w^1/2/Ncores
- coprime/odd numbers U=V=2p+/-1, U+V<=Ncores
- without problem of collisions between members of the same herd

*you can already use option 1) by running script in Ncores copies, manually spliting the search range

*Acceleration on several devices (i.e. with shared RAM) for 2) and 3) (w^1/2/Ncores) can only be achieved using the pool.
Filtered distinguished points should be sent to the pool in a single hashtable.

*removed support coincurve lib, reason: if u can install coincurv - that u can install gmpy2

python multicore cpu, python2/3, raw/gmpy2
https://github.com/Telariust/pollard-kangaroo/blob/master/pollard-kangaroo-multi.py

my cpu benchmark libs
Algo: 1 Tame + 1 Wild with distinguished points, expected of 2w^(1/2) group operations
1core i5-2540, win7x64

Code:

python2x64
[lib#raw] 7.1K j/s
[lib#coincurve] 24.5K j/s
[lib#coincurve+cffi] 35.4K j/s
[lib#gmpy2] 89.4K j/s

python3x64
[lib#raw] 8.9K j/s
[lib#coincurve] 31.2K j/s
[lib#coincurve+cffi] 43.2K j/s
[lib#gmpy2] 97.8K j/s

1core i7-6820

Code:

python3x64
[lib#gmpy2] 174.3K j/s;

my cpu heuristic prognose
Algo: 1 Tame + 1 Wild with distinguished points, expected of 2w^1/2 group operations
avg stat after 10tests
1core i5-2540, python37x64 + gmpy2, win7x64

Code:

[i] 97.8K j/s;..
...
[prognose] expected of 2w^(1/2) group operations
-------|--------/--------|---------------------------------/---------------------------------|
W |jump avg/2w^(1/2)| time avg/2w^(1/2) |
-------|--------/--------|---------------------------------/---------------------------------|
..
2^020 | 1.8K/ 2.0K | 0d 00:00:00s 018ms / 0d 00:00:00s 020ms |
..
2^030 | 57.3K/ 65.5K | 0d 00:00:00s 586ms / 0d 00:00:00s 670ms |
>2^031 | 81.1K/ 92.7K | 0d 00:00:00s 829ms / 0d 00:00:00s 948ms |
2^032 | 114.6K/ 131.1K | 0d 00:00:01s 173ms / 0d 00:00:01s 341ms |
..
2^040 | 1.8M/ 2.1M | 0d 00:00:18s 777ms / 0d 00:00:21s 471ms |
..
2^050 | 58.7M/ 67.1M | 0d 00:10:00s 886ms / 0d 00:11:27s 086ms |
..
2^060 | 1.9G/ 2.1G | 0d 05:20:28s 355ms / 0d 06:06:26s 759ms |
..
2^070 | 60.1G/ 68.7G | 7d 02:55:07s 378ms / 8d 03:26:16s 292ms |
..
2^080 | 1.9T/ 2.2T | 7m 17d 21:23:56s 096ms / 8m 20d 14:00:41s 355ms |
..
2^090 | 61.5T/ 70.4T | 20y 3m 2d 12:45:55s 095ms / 23y 1m 28d 16:22:03s 390ms |
..
2^100 | 2.0P/ 2.3P | 648y 2m 21d 00:29:23s 067ms / 741y 2m 17d 19:45:48s 481ms |
..
2^105 | 11.1P/ 12.7P | 3.7Ky 10m 29d 06:43:07s 581ms / 4.2Ky 11m 12d 16:12:57s 514ms |
..
2^110 | 63.0P/ 72.1P | 20.7Ky 2m 12d 15:40:18s 166ms / 23.7Ky 11m 0d 08:25:51s 416ms |
..
2^120 | 2.0E/ 2.3E | 663.8Ky 5m 14d 21:29:41s 312ms / 759.0Ky 4m 11d 05:47:25s 328ms |
----------------------------------------------------------------------------------------------

C99 implementation
- singlecore
- bitcoin-core/secp256k1 library
- A+A->A; 0.219M j/s (1core i5-2540)
https://bitcointalk.org/index.php?topic=5173445.msg52473992#msg52473992

C++ implementation
- multicore
- based on engine VanitySearch1.15, vs bignum lib
- A+A->A; 0.175M j/s (1core i5-2540)
https://bitcointalk.org/index.php?topic=5173445.msg52698284#msg52698284

#################
# GPU build (cuda/opencl)

Quote from: TRINITY

Tank, I need a pilot program for a ~~V-212 helicopter~~ cuda/opencl.. Lets go.

https://bitcointalk.org/index.php?topic=1306983.msg51796187#msg51796187
https://imgur.com/a/f3zTmTa
thank you, it helped a lot

i imagine how j2002ba2 read topic and thinks

Quote

omg, he used python.. what a primitive primate..

article cuda + pollard-rho
https://github.com/beranm14/pollard_rho/blob/master/report/main.pdf

rho, by brichard19, good candidate to rewrite
https://github.com/brichard19/ecdl

roadmap:
- rewrite VanitySearch to pollard-kangaroo

in process, wait..

#################
# parallelism problems

Quote

Maybe you can split equally the search range among threads ...

Pollard answers

Quote

6. The Kangaroo Method on a Parallel Computer
..A bad method is to divide the interval [L, U] into P equal parts, and use each of P processors(cores) to search one part, using the serial method. This will take time O((w/P)^1/2)..
van Oorschot and Wiener [11] employ two herds of kangaroos in a method which runs in time O((w^1/2)/P)..

i.e. if have P cpu and split equally the search range, than speed-up only P^1/2

Quote

..The lambda method suffers the same parallelization problem as the rho method..
Note that here each parallel processor is producing its own sequence of points independently of the others and each particular processor
does not increase the probability of success of any other processor.
..m cores have m^1/2 accelerate..

Quote

..Suppose that there are P identical artists.
If we use P different sequences (that is, different polynomials F(x)), then the probability that the first k numbers in these sequences will be different modulo p will be approximately equal to exp ((-k^2P)/2p). Thus, the maximum acceleration can be estimated as P^1/2..

Quote

One important property of the lambda method is the fact that it is easily distributed on several computers. Each participant in distributed computing selects a random number r and begins to take pseudo-random steps from the number b^r, where b is the element of the group for which a discrete logarithm is sought. Each participant uses the same easily computable pseudorandom function f(G)->S, where S is a relatively small set of numbers with an average value comparable to the size of a group G of order n. The powers a^s for S are calculated in advance..
...
A small difficulty is that a match can occur within the same sequence. However, if the number of participants in the calculations is large enough, then the probability of a match between sequences is greater than the probability of a match within one sequence.
...
The central computer must track all sequences from all participants to identify matches. According to the birthday paradox, a match is expected when the number of elements in all sequences is of the order of O(N^1/2). Obviously, in the described form, this method requires a large expenditure of memory of the central computer. The following idea, described by van Orschot, greatly reduces memory requirements and thus makes this method applicable to solving complex problems. The idea is to consider the so-called selected points. It is assumed that the elements of the group are represented by integers (or, possibly, sets of integers). A distinguished binary field of length k in this number will consist of zeros of about (1/2)k of the entire time. A random walk will go through such marked points on average every 2^k steps. If two random sequences intersect somewhere, then they will intersect further and together will get to the next selected point. So, the idea is to send only these dedicated points to the central computer, which will reduce the required memory size by 2^k times.

Quote

There are two versions of the distributed algorithm, one by van Oorschot and Wiener [473] and another by Pollard [488].
The difference is how they handle the possibility of collisions between kangaroos of the same herd.
The former has a mechanism to deal with this, which we will explain later. The latter paper elegantly ensures that there will not be collisions between individuals of the same herd.

Van Oorschot and Wiener Version
..kangaroos are spaced a (small) distance s apart..
..If there is a collision between two kangaroos of the same herd then it will eventually be detected when the second one lands on the same distinguished point as the first.
In [473] it is suggested that in this case the server should instruct the second kangaroo to take a jump of random length so that it no longer follows the path of the front kangaroo.
Note that Teske [606] has shown that the expected number of collisions within the same herd is 2, so this issue can probably be ignored in practice.

Pollard Version
Let N be the number of processors and suppose we can write N=U+V where gcd(U,V)=1 (i.e. U,V is prime numbers) and U,V = N/2.
The number of tame kangaroos is U and the number of wild kangaroos is V.
The (super) kangaroos perform the usual pseudorandom walk with steps {U*V*u0,...,U*V*u(n-1)} having mean m=N(w^1/2)/4
(this is UV times the mean step size for solving the DLP in an interval of length w/UV =~ 4w/(N²).
As usual we choose either uj=2^j or else random values between 0 and 2m/UV.
The U tame kangaroos start at g^(w/2)+i*V for 0 <= i < U.
The V wild kangaroos start at hg^j*U for 0 <= j < V.

Lemma 14.6.5. Suppose the walks do not cover the whole group, i.e., 0 <= an < r (r is order).
Then there is no collision between two tame kangaroos or two wild kangaroos.
There is a unique pair of tame and wild kangaroos who can collide.

#################
How Bitcoin works, a simple explanation.

Private Key is the number N (very large, 2²⁵⁶, so that it is impossible to guess or count).
The Public Key is the point (x, y) on the ec curve type "secp256k1".
privkey is primary, and pubkey is computed from it. They exist in pairs.

Knowing the pubkey is almost impossible to restore its privkey. This is where safety is based.

There is a basic-point (i.e. basic-pubkey but dont say that), everyone knows it, and the privkey for this point is 1.
To calculate pubkeyX from privkey = X, multiply basic-point by X.

#################
# How pollard-kangaroo works, the Tame and Wild kangaroos, is a simple explanation.

Suppose there is pubkeyX, unknow privkeyX, but privkeyX is in range w=[L..U]
The keys have a property - if we increase pubkey by S, then its privkey will also increase by S.
We start step-by-step to increase pubkeyX by S(i), keeping sumX(S(i)). This is a Wild kangaroo.
We select a random privkeyT from range [L..U], compute pubkeyT.
We start step-by-step to increment pubkeyT by S(i) while maintaining sumT(S(i)). This is a Tame kangaroo.
The size of the jump S(i) is determined by the x coordinate of the current point, so if a Wild or Tame kangaroo lands on one point, their paths will merge.
(we are concerned with pseudo random walks whose next step is determined by the current position)
Thanks to the Birthday Paradox (Kruskal's card trick), their paths will one day meet.
Knowing each traveled path, privkeyX is calculated.
The number of jumps is approximately 2*(w^1/2) group operations, which is much less than a full search w.

#################
# articles
(links from other users, all in one place)

base
https://andrea.corbellini.name/2015/05/17/elliptic-curve-cryptography-a-gentle-introduction/
hand translate to ru/ua (recommend instead translate.google)
https://habr.com/ru/post/335906/

0) best of the best, all in 1, epic, 2012
Chapter 14. Factoring and Discrete Logarithms using Pseudorandom Walks
https://www.math.auckland.ac.nz/~sgal018/crypto-book/ch14.pdf

1) with reference to old
J. M. Pollard, Kangaroos, monopoly and discrete logarithms, Journal of Cryptology, #13 (2000).
https://web.northeastern.edu/seigen/11Magic/KruskalsCount/PollardKangarooMonopoly.pdf
(good dir web.northeastern.edu/seigen/11Magic/KruskalsCount/)

2) about parallelism problems
P. C. van Oorschot and M. J. Wiener, Parallel collision search with cryptanalytic applications, J. Cryptology, #12 (1999)
https://people.scs.carleton.ca/~paulv/papers/JoC97.pdf

advice acceleration tips by fernand77
https://github.com/JeanLucPons/VanitySearch/issues/25#issuecomment-509293732

#################
# polard implementations
(links from other users, all in one place)

kangaroo (now no info), by BurtW, (openssl,secp256k1?)
https://bitcointalk.org/index.php?topic=5173445.0

kangaroo (classic, not use distinguished points), by natmchugh, python
https://bitcointalk.org/index.php?topic=5166284.msg52233957#msg52233957

rho, by brichard19
github.com/brichard19/ecdl
python, rho
rho, by Andrea Corbellini
github.com/andreacorbellini/ecc/blob/master/logs/pollardsrho.py
rho, by qubd
raw.githubusercontent.com/qubd/mini_ecdsa/master/mini_ecdsa.py
rho, by David Cox 2016
gist.github.com/davidcox143/c68a477b8390f35cb9a19084edd2a1e5
rho, by ralphleon
github.com/ralphleon/Python-Algorithms/blob/master/Cryptology/pollard.py
rho, by thomdixon
gist.github.com/thomdixon/dd1e280681f16535fbf1

other lang
github.com/nikitaneo/rho_pollard
github.com/twjvdhorst/Parallel-Pollard-Rho/
github.com/beranm14/pollard_rho/
github.com/grouville/factrace/

#################
# motivation

Quote from: supika on July 19, 2019, 11:24:04 AM

..they don`t want to share the tools like Pollard kangaroo script..

Quote from: BENDER

..Well, Im gonna go build my own theme park, with blackjack and hookers!..

#################

why not use endomorphism for pollard-rho?
point after multiplying by beta is enough pseudo random?
if I'm right, this will be a new superfast method to calculate pollard-rho at the cost of 1M per 1 point
didnt it occur to anyone before me, I need to test

#################

to be continued..

Post

Topic

Board Bitcoin Discussion

Re: Science Fair Project to trap Bitcoin private keys using Kangaroos!

Telariust

on 24/08/2019, 18:30:14 UTC

Quote from: BurtW on August 24, 2019, 03:24:55 PM

..OpenSSL library is pretty slow.

OpenSSL's advantage in versatility, but not in speed, no.

I bet python+coincurve will be faster!
ideally you need C bitcoin/secp256k1
I am porting your code to bitcoin/secp256k1 as soon as the source code is published.
(there is the usual addition and multiplication of points, nothing complicated)
(..but I feel you want to do it yourself, your own and not someone else's head?)

Post

Topic

Board Bitcoin Discussion

Merits 1 from 1 user

Re: Bitcoin challenge transaction: ~100 BTC total bounty to solvers!

Telariust

on 22/08/2019, 13:47:03 UTC

⭐ Merited by zielar (1)

https://gist.github.com/natmchugh/e094232a3975b89bff2d

Quote from: itod on August 04, 2019, 09:37:06 PM

Quote from: zielar on August 04, 2019, 04:34:43 PM

...
The problem is the initial stage, because python error:

Code:

Traceback (most recent call last):
File "polard3.py", line 2, in
from Ecc import Ecc
ImportError: cannot import name Ecc

Uncle Google has no idea how to get out of it :-)

In terminal:

Code:

pip install ecc

lol, not so simple

Its used import some old version(maybe clone) ecdsa library (there are also "class Point")
Also we can use raw.githubusercontent.com/andreacorbellini/ecc/master/logs/common.py
Also we can use raw.githubusercontent.com/qubd/mini_ecdsa/master/mini_ecdsa.py
(mini_ecdsa need some fix for python3)
(and it lib have own pollard-rho! but dont try run it for secp256k1, need custom ecc with low order! example look at homepage)
Also we can use the fastest coincurve github.com/ofek/coincurve

######

Fact: origin basepoint in code is currupted.. and origin value some exotics..

Code:

if self.__curve: assert self.__curve.contains_point( x, y )
AssertionError

######

Its code is classic/early/not_optimized kangaroo

- not use distinguished points
- total run-time of approximately 3.28(w^(1/2)) group operations

######

Who cares, i builded 4 realease with each lib
www.sendspace.com/file/wdyg2o
here the fastest based coincurve
------------------------------------
update, add cffi, +30% speed
get X,Y using .point() is slow, cffi faster

Code:

coincurve: 30528.2 j/s
cffi+coincurve: 41165.6 j/s

Code:

from cffi import FFI
ffi = FFI()
...
#x,y = Y.point()
tmp_pubkey = ffi.buffer(Y.public_key, 64)[:]
#x = bytes_to_int(tmp_pubkey[31::-1]);
y = bytes_to_int(tmp_pubkey[:31:-1]);

------------------------------------

Code:

#!/usr/bin/python

# [windows:python -m] pip install coincurve
from coincurve import PrivateKey as ECprvKey, PublicKey as ECpubKey
from coincurve.utils import int_to_bytes, hex_to_bytes, bytes_to_int, bytes_to_hex, int_to_bytes_padded

from cffi import FFI
ffi = FFI()

#####################
# secp256k1
A = 0
B = 7
p = 2**256-2**32-2**9-2**8-2**7-2**6-2**4-1
n = 115792089237316195423570985008687907852837564279074904382605163141518161494337
Gx = 55066263022277343669578718895168534326250603453777594175500187360389116729240
Gy = 32670510020758816978083085130507043184471273380659243275938904335757337482424
k = 15

#Gx,Gy = ECpubKey.from_valid_secret(int_to_bytes_padded(1)).point()

#####################
#
# PublicKey(data(bytes))
ecc = ECpubKey

G = ecc.from_point(Gx, Gy) # basePoint

#####################

import random
#prvkey = random.randint(1, n-1)
prvkey = random.randint(1, 2**20) # for pollard kangaroo

print('[privkey] %s (%s)' % (prvkey, hex(prvkey)))

pubkey = G.multiply(int_to_bytes(prvkey))

#####################

def f(Y):
#x,y = Y.point()
tmp_pubkey = ffi.buffer(Y.public_key, 64)[:]
#x = bytes_to_int(tmp_pubkey[31::-1]);
y = bytes_to_int(tmp_pubkey[:31:-1]);
return pow(2, (y % k))

a = prvkey - pow(2, 20)
b = prvkey + pow(2, 20)

print('a = %s' % a)
print('b = %s' % b)
print('k = %s' % k)
"""
Tame Kangaroo
   xT := 0
   yT := g^b

   for i in 1..N:
   xT := xT + f(yT)
   yT := yT * g^f(yT)

"""

xT = 0
yT = G.multiply(int_to_bytes(b))

y = pubkey

N = ( f(G) + f(G.multiply(int_to_bytes(b)))) / 2 * 2
N = int(N)

for i in range(1, N):
   xT += f(yT)
   yT = ecc.combine_keys([yT, G.multiply(int_to_bytes(f(yT)))]);

print(" %s %s" % (xT, yT.point()))

"""
Wild Kangaroo
   xW := 0
   yW := y

   while xW < b - a + xT:
   xW := xW + f(yW)
   yW := yW * g^f(yW)

   if yW = yT:
   return b + xT - xW
"""

print(" Setting wild kangaroo off")

def wildKangaroo(ecc, y, yT, xT, G, b, a):
   xW = 0
   yW = y
   while xW < (b - a + xT):
   xW += f(yW)
   yW = ecc.combine_keys([yW, G.multiply(int_to_bytes(f(yW)))]);

   if yW == yT:
   print(' Catch: %s %s' % (yW.point(),yT.point()))
   return b + xT - xW

   print("Not found.")

A = wildKangaroo(ecc, y, yT, xT, G, b, a)
print(" b + xT - xW = %s (%s)" % (A, hex(A)) )

Post

Topic

Board Bitcoin Discussion

Re: Bitcoin challenge transaction: ~100 BTC total bounty to solvers!

Telariust

on 20/08/2019, 03:17:34 UTC

Quote from: BurtW on August 05, 2019, 12:38:57 PM

For my real job I am writing all the TCG and secure boot ROM firmware for a next gen SSD controller ASIC. This SSD controller ASIC happens to have a built in hardware crypto engine for AES, SHA, HMAC, RSA, ECC, etc. I was thinking I could download a special test firmware into the SSD that would use the built in hardware crypto engine to do this calculation. It would be incredibly fast. I could justify downloading it to an entire rack of SSDs during manufacturing in order to do a "burn in test" of the crypto hardware on the drive. Should be fun.

!!!!!
than
(auto-translate)

Quote from: kzv on February 15, 2019, 09:01:38 PM

Now some super secret asic boost algorithm is being heard. Who knows what about him?

Quote from: kzv on February 16, 2019, 05:47:44 PM

Consider several ways to find a hash with a given complexity
1. Sequential nonce from zero
2. Sequential search not from zero, but from a random point
3. Random search nonce.

Now each of the methods is separate. For clarity, let nonce vary from 0 to 1,000,000 and let us look for a hash with three zeros

Sequential nonce from zero .
In this case, the miner will find the hash on average for 3845 iterations, but with a probability of 61% iterations will be less. That is, Of the 100 hashes found, 61 hashes will be found faster than in 3845 iterations.

Sequential enumeration not from zero, but from a random point
In this case, the miner will find the hash on average for 3845/2 = 1923 iterations, while with a probability of 39.4% iterations will be less (I calculated this additionally). That is, out of 100 hashes, 40 will be found faster than in 1923 iterations.

Twice as fast as in the first case! This is because the random beginning, with a normal distribution, will fall on average to the center of the segment for enumeration.
Also in this method with a probability of 259/1000000 the hash will be found immediately.

Random brute force nonce.
In this case, you can never find the desired hash at all, but with a probability of 259/1000000, the hash will be found on the first try. If the random distribution is distributed normally, then the probabilities should add up, which means after 1923 iterations the probability of finding a hash will be already 50%. That is, out of 100 hashes, 50 will be found faster than in 1923 iterations.

As a result, it turns out that the first algorithm is the slowest. The third algorithm is the fastest. If I havent messed up anything of course ...

and googling "Probabilistic Bitcoin Mining Method"
What u know about this?
(given the probabilistic nature of the kangaroo, this can help)

Post

Topic

Board Bitcoin Discussion

Re: Bitcoin challenge transaction: ~100 BTC total bounty to solvers!

Telariust

on 20/08/2019, 03:00:13 UTC

Quote from: byyuans on August 19, 2019, 11:46:52 PM

Yes I'm using bitcrack. RX 580 89MKey / Sec

Code:

>cuBitCrack-0.30 -d 0 -t 512 -p 1024 1CABDYTie48wXV93XJ4Bdk7MFSTyshTXxg
[2019-08-20.06:44:00] [Info] Compression: compressed
[2019-08-20.06:44:00] [Info] Generating 16,777,216 starting points (640.0MB)
GeForce GTX 980 2059 / 8192MB | 1 target 153.59 MKey/s (1,694,498,816 total) [00:00:09]

>clBitCrack-0.30 -d 0 -t 512 -p 1024 1CABDYTie48wXV93XJ4Bdk7MFSTyshTXxg
[2019-08-20.05:50:09] [Info] Compression: compressed
[2019-08-20.05:50:11] [Info] Generating 16,777,216 starting points (640.0MB)
GeForce GTX 980 1024 / 8192MB | 1 target 67.72 MKey/s (671,088,640 total) [00:00:07]

>oclvanitygen64.exe -v -D 0:0 -F compressed 12345678
Device: GeForce GTX 980
Grid size: 4096x4096
[59.23 Mkey/s][total 402653184][Prob 0.0%][50% in 2.9h]

y, it sad

Just optimizing the compiler Cuda has gone a lot ahead.
Look for example: https://arxiv.org/ftp/arxiv/papers/1005/1005.2581.pdf

on the other hand... new release JohnTheRipper 1.9
https://www.openwall.com/lists/announce/2019/05/14/1

Quote

The release of the new version 1.9.0-jumbo-1 took place more than four years after the release of the previous 1.8.0-jumbo-1.

In the new version of John the Ripper, developers abandoned the CUDA architecture due to a decrease in interest in it and focused on the more portable OpenCL framework that works great on NVIDIA graphics cards.

opencl trash? or Brichard19 need to train more? unclear

Post

Topic

Board Development & Technical Discussion

Re: BitCrack - A tool for brute-forcing private keys

Telariust

on 19/08/2019, 13:34:02 UTC

Quote from: Telariust

Quote from: chan111

Nothing found.. any clues what is going on / what Im doing wrong?

hmm.. it not u.

when I wrote this I wanted to demonstrate that this is generally possible.
Now I see that I did not take into account special cases ...
need to consider and complement
.. manually calculating this is inconvenient, it's time to write a script

python3 script success wrote and add to faq post:
bitcointalk.org/index.php?topic=4453897.msg52106273#msg52106273

if the checksum(or/and flag compress x01) is corrupted, than recovery now is impossible.
(--stride opt becomes float but must be integer!)
Recovery of these keys is possible only if brichard19 implements support float for --stride opt.

if it is easier to explain..
In WIFbase58privkey, each lost symbol(*/?) to the right is a division stride by 58.
Already from the middle, the stride no longer divides completely.
What does checksum have to do with it? In fact, the checksum forms the float part.
While it is equal, then when subtracting it is reset.
As soon as it becomes different for lowerKey and lowerKey+1, this indicates - stride becomes float.

Post

Topic

Board Development & Technical Discussion

Re: VanitySearch (Yet another address prefix finder)

Telariust

on 17/08/2019, 20:43:26 UTC

Look to:

GPU/GPUEngine.h

Code:

// Number of thread per block
#define NB_TRHEAD_PER_GROUP 128

try increase it to 256 or 512..

Code:

>VanitySearch-1.1.5_th128gr -t 0 -gpu 12345689
GPU: GPU #0 GeForce GTX 1070 (15x128 cores) Grid(120x128)
549.958 MK/s (GPU 549.958 MK/s) (2^33.98) [P 1.88%][50.00% in 00:18:09][0]

>VanitySearch-1.1.5_th256gr -t 0 -gpu 12345689
GPU: GPU #0 GeForce GTX 1070 (15x128 cores) Grid(120x256)
664.534 MK/s (GPU 664.534 MK/s) (2^34.08) [P 2.02%][50.00% in 00:14:59][0]

>VanitySearch-1.1.5_th512gr -t 0 -gpu 12345689
GPU: GPU #0 GeForce GTX 1070 (15x128 cores) Grid(120x512)
710.364 MK/s (GPU 710.364 MK/s) (2^34.35) [P 2.43%][50.00% in 00:13:56][0]

hashrate diff:
664/549 = x1,20
710/549 = x1,29

how about -g argv?
120x512=61440
480x128=64440
but it not equal:

Code:

>VanitySearch-1.1.5_th128gr -t 0 -gpu -g 480 12345689
GPU: GPU #0 GeForce GTX 1070 (15x128 cores) Grid(480x128)
566.206 MK/s (GPU 566.206 MK/s) (2^32.40) [P 0.64%][50.00% in 00:17:57][0]

-g is useless for it, recompilation is necessary
https://github.com/JeanLucPons/VanitySearch/files/3568897/VanitySearch-1.1.5_th128gr.orig.zip
https://github.com/JeanLucPons/VanitySearch/files/3568900/VanitySearch-1.1.5_th256gr.zip
https://github.com/JeanLucPons/VanitySearch/files/3568904/VanitySearch-1.1.5_th512gr.zip
recompiled, non-official build from me (while Jean_Luc thinks what to do)

Post

Topic

Board Development & Technical Discussion

Re: BitCrack - A tool for brute-forcing private keys

Telariust

on 14/08/2019, 03:55:07 UTC

Quote from: avw on August 13, 2019, 09:38:31 AM

Why not add the ability to search by ripemd160 [...] to speed up

ripemd160(base16) <--> btcaddr(base58) <--> Int/Dec(base10) <--> Hex(base16) <--> Binary(base2)
it's the same thing, just the bases are different, converting

Quote from: avw on August 13, 2019, 09:38:31 AM

Why not add the ability to search by [...] known public key to speed up

Quote from: almightyruler on August 13, 2019, 10:02:17 AM

Matching against pubkey would be handy (if it doesn't already do this?)

..because main target(puzzle) assumes a public key unknown (only its ripemd160 hash is known).
and if pubkey is known then sequential key search in a limited space (which uses bitcrack) is the stupidest and slowest way to search compared to baby-step-gigant-step algo and pollard-rho/kangaroo algo
when BitCrack compute 10M keys, its checked 10M keys
when BSGS/Pollard compute 10M keys, its checked (10M)^2 = 100000000M keys (!!!..undestand?)

Quote from: avw on August 13, 2019, 09:38:31 AM

And which bitcrack version supports random key space function ?

This fork https://github.com/pikachunakapika/BitCrack

Code:

-r, --random
Each point will start in random KEYSPACE

Post

Topic

Board Altcoin Discussion

Re: Vanitygen support for 70+ altcoins! +oclvanitygen and encrypt decrypt privkeys

Telariust

on 14/08/2019, 03:26:19 UTC

Hi, dev ExploitAgency!

Look at:
https://github.com/exploitagency/vanitygen-plus/issues/205
origin link
https://bitcointalk.org/index.php?topic=25804.msg52110068#msg52110068
extended
https://bitcointalk.org/index.php?topic=25804.msg52124850#msg52124850

If u can and want (Last Active: November 26, 2017, 01:07:26 AM)..
It would be great if you released the official minimal update with fix opencl kernel, increased rekey_at, fixed auto-detection of the gridSize, and binary release is x32 and x64bit.

Post

Topic

Board Development & Technical Discussion

Re: BitCrack - A tool for brute-forcing private keys

Telariust

on 12/08/2019, 12:57:35 UTC

please, do not use quoting huge messages

Quote

4) BitCrack Performance Tips.
...
run with the option "--continue sess.txt", any address, for example
xxBitCrack --continue sess.txt 17GD2bc5b7SvU7bzN7CYUSc6h4zBJ36Rx6
wait a little longer than a minute until the file of saving the progress will not be created, turn off;

Quote

In which folder is the sess.txt file saved - in the one where BitCrack is located OR in some other?

in xxBitCrack.exe folder
u can direct run cmd shell in folder using trick: Shift+RightClickMouse -> Open Shell

https://github.com/brichard19/BitCrack/releases/tag/0.24

Quote

...FILE is updated every 60 seconds.

Quote from: chan111 on August 11, 2019, 03:55:46 PM

Nothing found.. any clues what is going on / what Im doing wrong?