yes, I see, half of this will have to be rewritten

You have two options: either rewrite the whole thing from scratch, or implement the entire existing SECP256K1 in AVX2 (the GPU version is a whole different story).

I won't have anything to test the GPU version with, since I have a card from the reds, not the greens, and the greens (CUDA) are better in this regard
