just use Numpy and u have your 250 million or more keys/second.
u need a lot of RAM.
Why stop at 250? Let's get that RAM to speed things up to 100 Gk/s or more. Numpy FTW!
On a single thread. Hell, actually let's dump numpy and do things in CPU machine code directly. I heard we can reach 70 Petakeys/s that way. There's this undocumented "h160" op-code that computes around 262144 hashes for an entire range of private keys, in a single clock instruction! Imagine doing this on 16 cores! It also works with hyper-threading, and turbo
bootsboost is enabled automatically for all cores if this secret instruction ends up on the CPU's stack instruction pointer register.