Re: Bitcoin puzzle transaction ~32 BTC prize to who solves it

Quote from: Zoning5264 on Today at 11:46:31 AM

Quote from: kTimesG on January 17, 2025, 07:55:53 PM

While we're waiting for RTX 5090 here's some really fast jumper for 64-bit CPUs.

I’m working on a Pollard’s Kangaroo implementation for secp256k1 and I’d love to achieve high performance for point arithmetic on CPU (in particular, large-scale multiplications of G and other points). Could you please share or publish your HPC‐optimized code and techniques? I’m especially interested in any optimized field/group operations, batched inversions, or other CPU‐level optimizations you’ve used to speed up these computations.

Why would you need large-scale multiplications of G, it's only needed to create the initial kangaroos. Anyway, you can use libsecp256k1 for that or extract relevant code from it, like I did. You can also optimize further the code I posted, like keeping Y2 always in negated form and caching the jump index. More than that IDK if there's more to do on a CPU, as the batched inversion I presented is already the "parallel" tree-based version (hence the tradeoff with double-size tree storage, to avoid race conditions and r/w overlaps), not the "serial" version. Also, a RTX 4090 is 1000 times faster than a single-core high-end CPU, so my optimizations start there, not for CPU code.