Okay, so you fit "around" 1.5 engines on a chip. is it me or doesn't that make any sense at all?
Edit:
Yes, I make assumptions about sha256. it's sha256. the round function including W update needs at least 8 32 bit adders. no amount of "optimizing" changes that.
And those "highly optimized" commercial cores? barely 120MHz on a S6, 65+ clocks/block, and you can maybe fit 70 on a LX150. 65Mh/s wooo...