Post
Topic
Board Development & Technical Discussion
Re: Solving ECDLP with Kangaroos: Part 1 + 2 + RCKangaroo
by
kTimesG
on 18/12/2024, 15:22:19 UTC
I must admit you used some really clever tricks to make maximum usage of shared memory (L1) and L2 caches. I'm still trying to figure out the way you keep track of the jump distances using the shared memory instead of updating them using L2.

After adapting my own kernel to load/store stuff using L2 (instead of only once, before and after all the jumps) I reached 9.7 GK/s on RTX 4090 (64 jump points, DP 32), which was an increase of 75% in speed, and I haven't even tried to do micro-optimizations on it, like before. So I guess this was the missing lack of knowledge to be able go beyond the advertised 8+ GK/s stated by others around here, after trying every possible advanced optimizations I could think of to speed things up.

So did you start work on solving 135?