Re: Mark1 - pollard rho implementation (38 minutes for 80 bits solving on CPU)

Quote from: Akito S. M. Hosana on Today at 08:15:05 PM

Can this be further optimized in terms of speed? Tongue

The current implementation is already highly optimized, so further gains would likely be in the 10-30% range rather than order-of-magnitude improvements. The most promising areas would be hash function optimization and fine-tuning batch sizes for specific hardware.

These batch sizes could be tuned based on:

CPU cache sizes (L1/L2/L3)

Available SIMD registers

Benchmark different sizes (256, 512, 1024)

Current DP Table Structure:

Code:

#pragma pack(push,1)
struct DPSlot{ fp_t fp; Scalar256 key; };
#pragma pack(pop)
static_assert(sizeof(DPSlot)==40);

8 bytes for the fingerprint (fp_t)

32 bytes for the scalar key (Scalar256)

Total: 40 bytes per slot

Could potentially reduce to 32-bit with:

More frequent collisions (manageable)

Secondary verification when matches occur

Savings: 4 bytes per slot (10% reduction)