You want this code:
https://github.com/bitcoin-core/secp256k1/pull/507 it will be astronomically faster than your current code.
I believe when I previously implemented the techniques in this code my result was faster than vanitygen on a GPU.
It could also be made faster still with some improvements. E.g. it doesn't actually need to compute the y coordinate of the points, so several field multiplications could be avoided in the gej_to_ge batch conversion. It could also avoid computing the scalar for any given point unless you found a match. (E.g. by splitting the scalar construction part into another function which you don't bother calling unless there is a match).
Another advantage of this code is that it is setup to allow an arbitrary base point. This means you could use untrusted computers to search for you.
Sipa also has AVX2 8-way sha2 and ripemd160 that he might post somewhere if you asked. An 8-way bech32 checksum generator should be really easy to do, though if your expression doesn't match on the final 6 characters you should avoid even running the checksum.
Thanks for this Greg. There's a lot I don't know about ECC but I'm hoping I can contribute in other ways.
I just tried to use the makefile and deal with the errors in order, I didn't get there
The "Makefile" is for FreeBSD (and probably other BSD); the "GNUmakefile" is for Linux, and should automatically take precedence with gmake. I assume you are using a GNU toolchain on Windows? I could try to produce a makefile which makes isolating errors easier; let me add a no-libcrypto build option first.
Edit: I thought tossing in the hash implementation C files would be a quick fix. Oops. This may take a few minutes.
I'm using mingw, so I've been running
mingw32-make.exe CC=mingw32-gcc
inside the project directory. I'll start using the GNUmakefile since I'm using the mingw version of gcc.