I haven't found a john the ripper implementation for ECDSA, or it doesn't exist.
From the code you pasted, it seems that the real issue here is the 10000 rounds of SHA256, not the ECDSA. It may be worth trying to put only the "stretch_key" method on a gpu, and then do the ECDSA on the CPU. Probably much less work to implement, but still a significant speed-up compared to naively running this python script.
A friend of mine said that he tried but he can't run ecdsa at a good speed... Not sure of why this doesent agree with other posts here...
I think I read that openssl has a purposely crippled implementation because otherwise timing information is leaked which can be used to recover the private key. I don't know the details, but there was a thread about it on here at some point. Maybe that's the problem - try a different ECDSA library designed for speed rather than security.
ECDSA is slow. No matter how you look at it. Implementations slow it down further in order to improve robustness to things like timing attacks, but you're just not going to get a fast implementation on a CPU. I think 1,000/sec/core is about what you can expect.
But the other poster is right -- this is not really an ECDSA problem -- it's a hashing problem. And there's plenty of hashing algos already implemented on GPUs. It is probably reasonable to run hashing on the GPU, and then send a steady stream of results to the CPU for checking the answer. Even if your GPU has lots of cores, I'm not sure it will out-pace the CPU-ECDSA compute speed. This is probably a reasonable approach.