Sorry, I've been meaning to make a post related to this. We have been working with the OP to determine the optimal configuration and board for this application, as we have a few to choose from.
If you could include a table showing how much (LP)DRAM and SRAM (or low-latency alternative) can be hooked up to each FPGA, as well as the resulting cost, that would be very helpful. Thanks in advance!
The best thing to hook up to the FPGA would be a hybrid memory cube +$500. This would get you the same level of memory performance as HBM memory. The other nice thing, the HMC has a silicon memory controller on it along with some basic logic functions that can speed up certain applications (xor, and, or).
Can the hybrid memory cube be used with the VCU1525?
.
HMC is usually soldered on to the board just like HBM. HMC can provide staggering amount of bandwidth although is suffers in latency. Uses SERDES communication.
There are HMC+Altera FPGA paired boards by PicoComputing iirc.
What is hoped to be accomplished with HBM/HBC? What algorithms ? These typically offer in the order of 500GB/s. That means for ETH they support a max of 62MH, and for CN7 they alone support only 16KH, but they add huge cost premiums to FPGAs, and offer no advantage over GPUs.
The FPGA advantage is in calculation bound code, or algorithms whose work happens in memory spaces < 40MB or so, where the onboard 11+ TB/s of ultraram/BRAM can be used.
External QDR is going to top out at 80-160 GB/s too because of pin count. Sure it is low latency but that is only part of the equation.
The absolute most bandwidth you can get off chip is if you used 128 32 GBit/s transceivers in a very expensive chip, and that will give you... 512 GByte/second again.
Theres little to be gained with off chip memory of the very expensive variety, in most cases.