Virtex-7 -> It will not be used!
I don't know why you ruled out the Virtex7. I just gave the biggest member (xc7v2000t) of the Virtex7 family a run through Vivado. I constrained the clocks at 200MHz and the timing was reported as 4.575ns or 218Mhz. The utilization of a single core was
+----------------------------+-------+-------+-----------+-------+
| Site Type | Used | Loced | Available | Util% |
+----------------------------+-------+-------+-----------+-------+
| Slice LUTs | 44753 | 0 | 1221600 | 3.66 |
Hence it might be possible to fit 27 hash cores (- logic to communicate with the cores) in the device which would lead to a hashing performance of 5.8Gh/s assuming timing would be the same for 27 cores. Others on the forum have optimized the design using the Xilinx DSP blocks (this device has 2160 of them) and did run the Kintex7 at much higher frequencies. This is a very expensive FPGA. But again, I don't know what kind of deal you can get.