Still working on it, it runs 22kH but not completely convinced a good portion of that isnt spitting errors. Im using a relaxed definition of correct to squeeze higher hashrate out while allowing some rare corner cases to be incorrect. Currently more hashes than expected are incorrect.
Well 22khs is pretty descent, the guy who claims VCU1525 gets 64khs is R0land
I wont say its impossible, but I would be really genuinely surprised. 32MB / hash of total bandwidth (read + write) is needed, and 2MB or so of stashes per hashcore.
You have 1280 URAM blocks of 288kb by 72 bit interface dual ported in the biggest configuration .Thats an incredible amount of internal bandwidth but you can only store 23 or so simultaneous Cryptonight7 2MB blocks in that. The absolute biggest part (which isnt on the 1525 board) has 360Mbit URAM, 96Mbit BRAM, and 48Mbit Distributed RAM, holding a theoretical 63 MB of pipelines, assuming you didnt need a single bit of that for the rest of your logic (you do).
The external memory at say 4x64 DIMMs @2666 is only 85GB/s, or 2.6 KH worth of bandwidth with a perfect access pattern.
Even if you could imaginarily use all 2000+ balls on the FPGA for 2666 MT/s DDR style speeds youd still only clear 20KH against external memory and that isnt even real bandwidth.
Even if you took the biggest part with 128x32 Gbps transceivers to SERDES memory youd only have 16kH limit from bandwidth.
Unless you break the algorithm itself, theres no where to find the bandwidth + storage space for 64khs on a single FPGA.