if you are on python just use this library:
https://github.com/iceland2k14/secp256k1 fastest out there for python since .dll(Windows) and .so(Linux) are shared libraries containing compiled functions written in C++ ready to load and use.
Most my codes that are there just good for testing and research. That is what I do mostly.
For CPU best option is VanitySearch code base it is the fastest.
It is in C++ and easy to understand and use.
I want to make one that will do 300Miilion Key/s from python. One or more GPUs....The only way is through numba and jit.
https://numba.pydata.org/And there is nowhere an example of how to calculate secp256k1 through @jit . These are simple mathematical formulas, not rocket science.
It's as if someone doesn't want this to speed up on purpose.

It is simple. You know what you need. Just code it yourself and it is done.