Just coded a fully unrolled SHA256 in VHDL using two different approaches to maximize clock rate, a simple approach that involves precalculating H + K + W, and a more advanced approach that further pipelines each stage. Initial compiles targetted Cyclone IV using web edition quartus (which sucks), with the simple version achieving 110MHz and the advanced version 133MHz. Will be interested to see maximum clock rate that can be achieved on Stratix IV.