Search Posts

Post

Topic

Board Development & Technical Discussion

Re: Pollard's kangaroo ECDLP solver

Zoning5264

on 20/02/2025, 11:02:12 UTC

Quote from: WanderingPhilospher on February 19, 2025, 08:27:07 PM

Quote from: Zoning5264 on February 19, 2025, 07:57:52 PM

Quote from: kTimesG on February 18, 2025, 08:21:48 PM

Quote from: Zoning5264 on February 18, 2025, 07:20:29 PM

Quote from: ee1234ee on February 14, 2025, 09:13:35 AM

Why use JeanLucPons/Kangaroo's program

On my 4060ti machine, the graphics card cannot run to full capacity, displaying 650MKey/s. The graphics card is constantly at a temperature of around 50 degrees Celsius, and the graphics card fan only rotates at 1200 rpm. It is obvious that the graphics card is not running to full capacity.

Does the program need optimization? Where is the reason?

Guys, I have a question. Is the performance of over 500 billion steps per minute in the Pollard Kangaroo good or rather not? What is the best result indicated by other forum users?

It's a very good speed if you plan on breaking 135 somewhere between after 500 and 900 years.

Thanks, I've improved a bit today:

candidate=00000000000000000000000000000076e6eda5e63ddb1b05451e1f6ba3118d16 (AVX2=6600000000000000000000000000000076e6eda5e63ddb1b05451e1f6ba3118d16) elapsed=2 d:08 h:18 m:19 s total_all=332963840000000

Me too!

candidate=0000000000000000000000719D05AAF5E5E7329CEF66C917D6B51980FA8B2CCE (AVX2=6600000000000000000000000000000076e6eda5e63ddb1b05451e1f6ba3118d16) elapsed=0 d:03 h:43 m:07 s total_all=2901745600000000000000

I think mine is a bit more improved, but no doubt you will get there!!

I thought you were a serious guy and you just forgot to change the AVX2 key in the copy of my post.

Post

Topic

Board Development & Technical Discussion

Re: Pollard's kangaroo ECDLP solver

Zoning5264

on 19/02/2025, 20:42:41 UTC

Quote from: WanderingPhilospher on Today at 08:27:07 PM

Quote from: Zoning5264 on Today at 07:57:52 PM

Quote from: kTimesG on February 18, 2025, 08:21:48 PM

Quote from: Zoning5264 on February 18, 2025, 07:20:29 PM

Quote from: ee1234ee on February 14, 2025, 09:13:35 AM

Guys, I have a question. Is the performance of over 500 billion steps per minute in the Pollard Kangaroo good or rather not? What is the best result indicated by other forum users?

It's a very good speed if you plan on breaking 135 somewhere between after 500 and 900 years.

Wow, 210271420289855072keys/s, great result, congratulations. I can't get more on my 4060.

Post

Topic

Board Development & Technical Discussion

Re: Pollard's kangaroo ECDLP solver

Zoning5264

on 19/02/2025, 19:57:52 UTC

Quote from: kTimesG on February 18, 2025, 08:21:48 PM

Quote from: Zoning5264 on February 18, 2025, 07:20:29 PM

Quote from: ee1234ee on February 14, 2025, 09:13:35 AM

Guys, I have a question. Is the performance of over 500 billion steps per minute in the Pollard Kangaroo good or rather not? What is the best result indicated by other forum users?

It's a very good speed if you plan on breaking 135 somewhere between after 500 and 900 years.

Post

Topic

Board Development & Technical Discussion

Re: Pollard's kangaroo ECDLP solver

Zoning5264

on 18/02/2025, 19:20:29 UTC

Quote from: ee1234ee on February 14, 2025, 09:13:35 AM

Guys, I have a question. Is the performance of over 500 billion steps per minute in the Pollard Kangaroo good or rather not? What is the best result indicated by other forum users?

Post

Topic

Board Tablica ogłoszeń

Re: Poszukuję dev do stworzenia kryptowaluty

Zoning5264

on 10/02/2025, 22:31:54 UTC

https://www.biznesoferty.pl/o/dao-nft-tokenizacja-kiedy-i-jak-zastosowac-mica-kyc-aml,242128.html

Post

Topic

Board Bitcoin Discussion

Re: Bitcoin puzzle transaction ~32 BTC prize to who solves it

Zoning5264

on 21/01/2025, 11:46:31 UTC

Quote from: kTimesG on January 17, 2025, 07:55:53 PM

While we're waiting for RTX 5090 here's some really fast jumper for 64-bit CPUs.

This is 100% working code as I'm using it to test that my CUDA kernel jumps correctly. I really needed it to be as fast as possible so I don't get old waiting for results to validate.

This uses libsecp256k1 internal headers with inlined field and group basic operations, and does batched addition with non-dependent tree inversion loops (translation: a good compiler will use unrolling, SIMD and other CPU instructions to speed things up).

Group operations / second / thread is around 15 - 20 Mo/s on a high-end Intel CPU.

Compile with "-march=native" for best results.

No, this is not a fully-working puzzle breaker. You need to use your brain to add DP logic, saving, and collision checks. This is just the lowest-level detail: a very fast CPU kangaroo jumper for secp256k1.

This function also assumes that a jumped kangaroo can never be a point in the set of jump points, nor its opposite. This guarantee applies to my Kangaroo algorithm by design, so the logic of point doubling or point at infinity is not needed at all.

Code:

#include "field_impl.h" // field operations
#include "group_impl.h" // group operations

#define FE_INV(r, x) secp256k1_fe_impl_inv_var(&(r), &(x))
#define FE_MUL(r, a, b) secp256k1_fe_mul_inner((r).n, (a).n, (b).n)
#define FE_SQR(r, x) secp256k1_fe_sqr_inner((r).n, (x).n)
#define FE_ADD(r, d) secp256k1_fe_impl_add(&(r), &(d))
#define FE_NEG(r, a, m) secp256k1_fe_impl_negate_unchecked(&(r), &(a), (m))

static
void jump_batch(
   secp256k1_ge * ge,
   const secp256k1_ge * jp,
   secp256k1_fe * xz, // product tree leafs + parent nodes
   secp256k1_fe * xzOut,
   U32 batch_size
) {
   secp256k1_fe t1, t2, t3;

   int64_t i;

   for (i = 0; i < batch_size; i++) {
   uint8_t jIdx;

#if JUMP_FUNC == JUMP_FUNC_LOW_52
   jIdx = ge[i].x.n[0] % NUM_JUMP_POINTS;
#elif JUMP_FUNC == JUMP_FUNC_LOW_64
   jIdx = (ge[i].x.n[0] | (ge[i].x.n[1] << 52)) % NUM_JUMP_POINTS;
#endif

   xz[i] = ge[i].x;
   FE_NEG(t1, jp[jIdx].x, 1);
   FE_ADD(xz[i], t1); // XZ[i] = x1 - x2
   }

   for (i = 0; i < batch_size - 1; i++) {
   FE_MUL(xz[batch_size + i], xz[i * 2], xz[i * 2 + 1]);
   }

   FE_INV(xzOut[batch_size * 2 - 2], xz[2 * batch_size - 2]);

   for (i = batch_size - 2; i >= 0; i--) {
   FE_MUL(xzOut[i * 2], xz[i * 2 + 1], xzOut[batch_size + i]);
   FE_MUL(xzOut[i * 2 + 1], xz[i * 2], xzOut[batch_size + i]);
   }

   secp256k1_ge * _a = ge;
   const secp256k1_fe * _inv = xzOut;

   for (i = 0; i < batch_size; i++) {
   uint8_t jIdx;

#if JUMP_FUNC == JUMP_FUNC_LOW_52
   jIdx = ge[i].x.n[0] % NUM_JUMP_POINTS;
#elif JUMP_FUNC == JUMP_FUNC_LOW_64
   jIdx = (ge[i].x.n[0] | (ge[i].x.n[1] << 52)) % NUM_JUMP_POINTS;
#endif

   const secp256k1_ge * _b = &jp[jIdx];

   FE_NEG(t1, _b->y, 1); // T1 = -y2
   FE_ADD(_a->y, t1); // Y1 = y1 - y2 m = max_y + 2(1)
   FE_MUL(_a->y, _a->y, *_inv); // Y1 = m = (y1 - y2) / (x1 - x2) m = 1
   FE_SQR(t2, _a->y); // T2 = m**2 m = 1
   FE_NEG(t3, _b->x, 1); // T3 = -x2
   FE_ADD(t2, t3); // T2 = m**2 - x2 m = 1 + 2(1) = 3(2)
   FE_NEG(_a->x, _a->x, 1); // X1 = -x1 m = max_x + 1
   FE_ADD(_a->x, t2); // X1 = x3 = m**2 - x1 - x2 max_x = 3 + max_x + 1
   secp256k1_fe_normalize_weak(&_a->x);

   FE_NEG(t2, _a->x, 1); // T2 = -x3 m = 1 + 1 = 2
   FE_ADD(t2, _b->x); // T1 = x2 - x3 m = 2 + 1 = 3
   FE_MUL(_a->y, _a->y, t2); // Y1 = m * (x2 - x3) m = 1
   FE_ADD(_a->y, t1); // Y1 = y3 = m * (x2 - x3) - y2 m = 1 + 2 = 3
   secp256k1_fe_normalize_weak(&_a->y);

   ++_a;
   ++_inv;
   }
}

Easy to parallelize, let's add a wrapper that jumps a specific buffer of kangaroos:

Code:

static
void computeBatchJump(
   secp256k1_ge * ge,
   const secp256k1_ge * jp,
   U32 batch_size,
   U32 num_jumps
) {
   size_t tree_sz = (batch_size * 2 - 1) * sizeof(secp256k1_fe);

// printf("Allocating %zu bytes for tree\n", tree_sz);

   secp256k1_fe * xz_1 = malloc(tree_sz);
   if (NULL == xz_1) return;

   secp256k1_fe * xz_2 = malloc(tree_sz);
   if (NULL == xz_2) return;

   for (uint32_t loop = 0; loop < num_jumps; loop++) {
   jump_batch(ge, jp, xz_1, xz_2, batch_size);
   }

   free(xz_1);
   free(xz_2);
}

And now, once you have a really big buffer of kangaroos, you can run the jumps on all of your physical cores:

Code:

#define JUMPS_PER_STAGE 32768

   secp256k1_ge * secp_ge = malloc(numElements * sizeof(secp256k1_ge));
   secp256k1_ge * secp_jp = malloc(NUM_JUMP_POINTS * sizeof(secp256k1_ge));

   // init the jump points, init the kangaroos to your needs
   // ...

   int numLaunches = 1; // extra multiplier for the total number of jumps
   int numThr = omp_get_max_threads();

   // use the max amount of threads that exactly divides the number of items
   while (numThr > 0 && numElements % numThr) numThr--;

   U64 gePerPart = numElements / numThr;
   printf("\tThreads: %u; elements/thread: %lu\n", numThr, gePerPart);

   double ompStartTime = omp_get_wtime();

   for (U32 launchIdx = 0; launchIdx < numLaunches; launchIdx++) {
#pragma omp parallel for
   for (U32 tIdx = 0; tIdx < numThr; tIdx++) {
   U64 offset = tIdx * gePerPart;
   secp256k1_ge * localGE = secp_ge + offset;

   computeBatchJump(localGE, secp_jp, gePerPart, JUMPS_PER_STAGE);
   }
   }

   double ompEndTime = omp_get_wtime();
   elapsedTime = ompEndTime - ompStartTime;
   speed = (double) totalCount / elapsedTime;

Good luck.

Hello kTimesG,

I’m working on a Pollard’s Kangaroo implementation for secp256k1 and I’d love to achieve high performance for point arithmetic on CPU (in particular, large-scale multiplications of G and other points). Could you please share or publish your HPC‐optimized code and techniques for kTimesG? I’m especially interested in any optimized field/group operations, batched inversions, or other CPU‐level optimizations you’ve used to speed up these computations.

Thank you, Zoning5264