I don't like any of the proof-of-work algorithms over Bitcoin's thus far (at least given what I think we know about Cuckoo hash thus far, i.e. seems to be highly parallelizable even if slightly sublinear thus I don't think it will keep GPUs at parity? It might have some role if the number of lightweight cores on mobile increases to some huge number).
What my Cuckoo Cycle benchmarking has shown so far is that 40 Xeon threads is not quite enough to saturate memory. But I imagine a few hundred will. An FPGA or ASIC will be able to generate the memory requests at a much faster rate using hardwired siphash24 computation, and so will hit the parallelization limit much earlier.
Because GPU memory is ill-suited for Cuckoo Cycle's random access to bitpairs in global memory,
which resist coalescing (1M consecutive accesses are on average 512 bytes apart on a large instance),
I expect the GPU to struggle to put hundreds of threads to use. That's why
I posted a $1000 bounty on the speed parity of GPUs and server CPUs for Cuckoo Cycle in
https://bitcointalk.org/index.php?topic=707879.0
which is duplicated in the README at
https://github.com/tromp/cuckoo
So you don't think my bounty is safe... care to have a go at it yourself?!
(a) It's not a good wage for the time it would take for the kind of people who could do it. It's probably > 20 hours of work, which would suggest that a $5k bounty might start to get in the right range.
(b) I believe from the rumormill that a fast CryptoNight implementation from day 1 of XMR was worth something north of $400k in net profit. Just as a comparison against the value of keeping an optimized implementation private.
I don't think it's reasonable to ask you to put up more of your personal cash - but I think it's reasonable for anyone seriously considering adopting CC to help boost that bounty into the range it would be attractive for someone to actually demonstrate.