Post
Topic
Board Mining (Altcoins)
Re: DIY FPGA Mining rig for any algorithm with fast ROI
by
GPUHoarder
on 05/05/2018, 23:34:43 UTC
I won’t say it’s impossible, but I would be really genuinely surprised. 32MB / hash of total bandwidth (read + write) is needed, and 2MB or so of stashes per hashcore.

You have 1280 URAM blocks of 288kb by 72 bit interface dual ported in the biggest configuration .That’s an incredible amount of internal bandwidth but you can only store 23 or so simultaneous Cryptonight7 2MB blocks in that. The absolute biggest part (which isn’t on the 1525 board) has 360Mbit URAM, 96Mbit BRAM, and 48Mbit Distributed RAM, holding a theoretical 63 MB of pipelines, assuming you didn’t need a single bit of that for the rest of your logic (you do).

The external memory at say 4x64 DIMMs @2666 is only 85GB/s, or 2.6 KH worth of bandwidth with a perfect access pattern.

Even if you could imaginarily use all 2000+ balls on the FPGA for 2666 MT/s DDR style  speeds you’d still only clear 20KH against external memory and that isn’t even real bandwidth.

 Even if you took the biggest part with 128x32 Gbps transceivers to SERDES memory you’d only have 16kH limit from bandwidth.

Unless you break the algorithm itself, there’s no where to find the bandwidth + storage space for 64khs on a single FPGA.

You're missing a really big part of the ultraram. One of the most attractive things that ultraram has to offer. True dual port single clock read/write. Also, when you chain ultrarams together it increases the bus width proportionally to the amount it increases the latency. I never completed monero but my estimates were in the 4-8Kh/s per board range at 100W.





I wasn’t missing it - the true dual port is truly the reason it works as well as it does - write completion before read in the dependency chaining. The issue (at least with cryptonight) isn’t the bandwidth at all, it is the amount available. Ultraram is great in general.


There's also a bunch of block ram and distributed ram. AES itself is tiny (<30K luts) and the secondary hashes don't need to be completed on the FPGA (meaning, you don't really need to put groestl, jh, etc on the fpga, you can just read the 8Kh/s and complete the secondary on CPU). While I have those completed (the secondaries), I had never intended on putting them on the fpga for cryptonight.



Im not sure you actually read my posts on the topic, as you’re repeating a few things I already stated - such as I don’t do secondary hashes on the FPGA. 30k LUTs for AES? That’s a huge amount more than my cores...