What about the upcoming HBM2 FPGA's? Does that drastically change the playing field?
Probably not. You get 20x performance improvement on the DDR4<->FPGA interface but no improvement anywhere else. So the algorithm you are accelerating must require high amounts of sequential access to a large memory bank. And any algorithm could be made resistant to acceleration by an HMB/FPGA merely by requiring >16GB of memory.
... And the present price of the HBM devices is *shocking* ... Better from Intel/Altera than Xilinx, but still more than this market can tolerate. Also, Intel/Altera has not executed well in the last 2-3 years. Getting a Stratix10 device has been a bit of a joke in the FPGA world.