I am running Rotor Cuda on my second PC with an old Zotac Nvidia 1060 GTX and I am getting about 150.000.000 keys per seconds...
I am learning Python and try to also to make my way for scan private/public keys...
I think the fastest way to generate public keys with CPU is to use ice library
https://github.com/iceland2k14/secp256k1you can use ice.point_sequential_increment(3500000, P)
So you have point P and in 2 seconds I get 3.5 million of next points (+G+G+G+G+G+G....) and then you can proceed with that array... That is the faster way than checking one by one
And when you are finished with 3.5 million of points then last point from that batch will be the next point P and then again ice.point_sequential_increment(3500000, P)
Keep experimenting... that is the only way to success