Shoutout to the guy who nailed it:
https://github.com/ulhaocheng/avxeccKey takeaways:
AVX2-optimized field multiplication (the slowest part of ECC).
Parallel limb ops (256-bit registers crunching 32/64-bit chunks).
Just tweak it for Secp256k1’s constants, and you’re golden

I think you meant scalar multiplication there... which is basically a no-op if scanning ranges with sequential keys.
I bet libsecp256k1 is faster than that though, since it's already SIMD-ed by the compiler, because it uses carry-free independent limbs. That's like, free vectorization out of the box, less cycles / op, and so on.
Ideal for ending up with a fried CPU, which seems like it's what people want.