It is not so difficult like AVX Secp256k1
Yo, fam! If your code’s crawling at 150ns when it should be flying at 6ns, something’s off. AVX2’s 256-bit registers can handle 4 or 8 hashes at once like a pro. That W array expansion (the σ0/σ1 math) is begging for SIMD. sha256rnds2 goes brrr—like 5x faster than scalar.
You’re probably vectorizing just ONE hash instead of stacking 4-8 like pancakes. Double-check if you’re using the flags: -mavx2 -mbmi2 -madx -fwrapv.
Shoutout to the guy who nailed it:
https://github.com/ulhaocheng/avxeccThe core math is similar, just different numbers.
Key takeaways:
AVX2-optimized field multiplication (the slowest part of ECC).
Parallel limb ops (256-bit registers crunching 32/64-bit chunks).
Just tweak it for Secp256k1’s constants, and you’re golden
