After reading over the description of
Pollard's kangaroo algorithm I think I understand it enough to be able to explain it to my 13 year old daughter so she can write the code as a fun educational exercise. She is always looking for a good subject for her next science fair project and I think this would make a good one.
It's not so hard to write working Pollard's kangaroo, and there are some example implementation. Problem is writing CUDA implementation of it, as I understood CPU implementation can not compare by speed with CUDA one.
Good point.
For my real job I am writing all the TCG and secure boot ROM firmware for a next gen SSD controller ASIC. This SSD controller ASIC happens to have a built in hardware crypto engine for AES, SHA, HMAC, RSA, ECC, etc. I was thinking I could download a special test firmware into the SSD that would use the built in hardware crypto engine to do this calculation. It would be incredibly fast. I could justify downloading it to an entire rack of SSDs during manufacturing in order to do a "burn in test" of the crypto hardware on the drive. Should be fun.