Single core python script...294k keys/s
riiiight, so either the script is running on
dozens of single cores, or is leveraging cupy/pycuda/etc., or is orchestrating the cuda backend instantiation, or..?
as in "An Aes Sedai never lies, but the truth she speaks, may not be the truth you think you hear."

or perhaps you forgot to x6 your speed for the endo and x1000 for the milliseconds

I jest, but that speed seems hard to swallow when I'm running at several times that and have several orders of magnitude less results
Yeah I remember the debate about speed when using all 12 different points.
You are checking 1 key, in a single position on the curve (multi/add), but hashing 12 different points. But the others don't require the heavy lifting of the initial multiplication or add.
So one could say 294k x 12 = Hashing rate. Or, is it that the speed....?