Post
Topic
Board Bitcoin Discussion
Re: Bitcoin puzzle transaction ~32 BTC prize to who solves it
by
kTimesG
on 21/05/2024, 12:31:07 UTC

Is it non-pointless even if writing it in assembler?

Therefore, whatever assembler optimizations and algorithmic tricks you can imagine to push the boundaries, they do not overcome the inherent speed limits imposed by modular inversion and hashing.

Double SHA-256 hashing is the ultimate bottleneck  Grin
That was my point, read carefully the question Smiley

Any sort of strategy is useless if you use either Python or ASM as long as any sort of higher-level op like SHA / RIPEMD is the actual bottleneck.

But if we only operate on EC then it's another story, this is where kangaroo / rho / bsgs comes into play, and we can start optimizing the operations that are the bottleneck at THAT level. It's one thing to do a few thousand mul/s in Python and another to do batched additions (with a single inversion) on a GPU, to reach giga adds/second. Millions of times faster for identical results. And ofcourse it's still not enough even at that level, so...

Something to think about on 256-bit modular math:
- a single inversion is ~60 times slower than a multiplication (GCD algo, not a simple operation at all)
- a multiplication is ~2 times slower than a squaring
- a squaring is ~3 times slower than an addition, ~2 slower than a normalization

A simple point addition requires one inversion, 2 multiplications, a squaring, 2 normalizations, and 6 additions.
A simple point multiplication requires a bunch of point additions (in truth, it's somewhat faster bcz of Jacobian shortcut, but still a lot of multiplications).

So stuff like doing point-scalar multiplication is a bad joke, compared to the problem of optimizing the primitives Smiley