but after that short delay the circuit spits out 1 solution every clock cycle.
Ah, so you're saying that as soon as the signal for one hash leaves a transistor to go the next, the next hash is already incoming to that first transistor? Then the limit on clock speed is only the switching speed of
one transistor, and so there's no way to really improve it? Would reducing the number of transistors just reduce the power consumption then?