The performance numbers sound reasonable. I have no idea why he only gets 8MHz, but this sounds like it can be improved. For a full pipeline you need roughly 100K flipflops plus all the logic. 100 times that would be 10M flipflops plus logic, which would be a rather big ASIC, but it's certainly possibly.
I had a co-worker try an Altera HardCopy synthesis and that one ended up with 20 pipelines running at >200MHz each, yielding >4GH/s per chip.
I don't trust these cost estimates at all though. The ASIC price estimate might be sensible for >10000 volumes, but the board, host interface and assembly cost estimated at $30 seems to be impossible. The VRM alone will cost that much.
He can't get over due to hazards . The propagation delay will be way too big to get any higher frequency.