Post
Topic
Board Development & Technical Discussion
Re: Mark1 - pollard rho implementation (38 minutes for 80 bits solving on CPU)
by
nomachine
on 28/06/2025, 20:38:29 UTC
Can this be further optimized in terms of speed? Tongue


The current implementation is already highly optimized, so further gains would likely be in the 10-30% range rather than order-of-magnitude improvements. The most promising areas would be hash function optimization and fine-tuning batch sizes for specific hardware.

These batch sizes could be tuned based on:

CPU cache sizes (L1/L2/L3)

Available SIMD registers

Benchmark different sizes (256, 512, 1024)

Current DP Table Structure:

Code:
#pragma pack(push,1)
struct DPSlot{ fp_t fp; Scalar256 key; };
#pragma pack(pop)
static_assert(sizeof(DPSlot)==40);

8 bytes for the fingerprint (fp_t)

32 bytes for the scalar key (Scalar256)

Total: 40 bytes per slot


Could potentially reduce to 32-bit with:

More frequent collisions (manageable)

Secondary verification when matches occur

Savings: 4 bytes per slot (10% reduction)