Should I assume that you do not wish to make the math available? Or is it already implemented in libsecp256k1? Cause millions of combos per second is pretty good, on a CPU too.
The math is available
https://bitcointalk.org/index.php?topic=1573035.msg17676647#msg17676647not the code.
I think that almost everything is implemented in libsecp256k1 already, but my software does only one thing: it generates consecutives keys.