It's technically doable - it's called
asyncronous logic, there are even a number of CPUs based around it including the GA144 that ElectricMucus has, it's just not commercially viable because it's so hard to design and debug. So in a way I'm actually kind of surprised BFL aren't using it

There are a bunch of things which are async logic which aren't really marketed as such, e.g. octasic DSPs. For a miner the complicated valiation issues aren't really an issue, and wouldn't be surprised if someone could get a decent efficiency improvement by slicing sha256 into groups of 8 words and running them async across a span of 8 rounds (where they should naturally time back up, at least if the wiring in the round is totally regular).
I wouldn't say that too loudly =P I tried it in another thread and got shouted down and then locked into a debate over the advantages and disadvantage of "unrolling process" (their words not mine).