Could someone familiar with any of those technologies make another post with relevant calculations?
Edit:
Btw, did some reading on algebraic logic minimization last night along with a couple other techniques. This is already done, automatically, during synth (but can be turned off). Seeing the process, yes, it's something that could be added to simplify logic circuits. HOWEVER, Vivado already does it! Starting to question OP and if this bittware account is even really bittware. I might have to put my foot in my mouth in 18 days, but the more I look at it, the more I'm thinking it's not possible. Elaborate scam?
Im not an expert in that part, but I believe the synthesizer does basic reduction and propagates constants/eliminates unused NETs. Actual SAT3 style reduction is a very hard problem (I think NP) and I am fairly certain the default options are not generating a provable minimal circuit. I believe that would take exhausting analysis of the 640+ bit input state to output truth table.
You can very likely achieve better for some definition of better with a longer running algorithm, much like the place and route problem.
Additionally, the synthesizer cant help you at all if you dont have all your constraints constrained. If your design allows arbitrary input to the Keccak engine, or uses the same module for each round, etc. the synthesizer cant make those optimizations for you.
Re memory bandwidth: those specs are published for those memory types? No need to guess. I glossed over the details of overhead and latency vs I/O bandwidth, but for the most part you can hide those.
Edit:
I agree with Ethash being a bonus / using extra memory - but your HMC costs more
than a GPU that gives you the same bandwidth.