I found the "bug" of my previous attempts to increase DP without increasing runtime cofactor.
39 bits, DP 8
[20] Ops: avg 698950 = 0.942 * sqrt(b) min 219204 max 1848433 dp_ovh: 384.0 mul: 9.35
Stored footprints: avg 6410 min 1987 max 16978
No cycles. No "hey, let's create kangaroos and pretend they don't count as a group operation".
This only works for secp256k1. It is basically 100% faster than the usual approach.
So basically you are saying that you have invented a new method with K<1.0 at DP>6? Without cycles? Are you sure?