Post
Topic
Board Hardware
Re: Official Open Source FPGA Bitcoin Miner (Last Update: April 14th, 2013)
by
iidx
on 17/04/2013, 21:18:10 UTC
I think the problem is linking 11 BRAMs together requires a lot of LUTs for address decode/routing since the BRAMs are arranged in columns throughout the chip.  Plus linking 11 together would probably result in a minimum period much higher than 2.0ns (2.0 ns is for 1 BRAM I think).

So, you would need 128 (hashers) * 11 (BRAMs) for one pipeline stage = 1408 total BRAMs.  Of course, you're not suggesting you use BRAM for all the delay.  However, I think the slices you would sacrifice to connect the BRAMs and create their address logic would be more expensive than just using the built in FFs or DMEMs (plus the speed hit).

I'm hoping by floor planning each hashing module I can get to quick speeds.  Currently the logic delay I am facing is only around ~2.0 ns, with the routes taking the rest.  So with some nice routing I would hopefully meet my target.

The V6LX130 isn't even as big as the S6 150, but at least is has DSP48s.

I may also need to cut down the PCIe link from 4x to 1x and reduce its performance settings to regain some of the space that is being used up.

IIDX

Looks good!  I tried to do the same thing on a V6 LX130T (use almost all DSPs and pipeline the rest of the LUT adders), but there aren't enough registers in that device for tx_w and tx_state delays Sad.  so many 512 and 256 bit registers...


   If you are short on flip flops, have you considered using the BRAMs?  You would need 11 primitives (there are 264 in the LX130T) to make a by 792 bit wide memory.  You can set the BRAM to 'write first' mode, which will echo the data to the output.  The clk-to-out for unpipelined BRAM is ~2.0ns...slower than FF. 
   Since the BRAMs are dual port, you can use both sides of the memory (with different locked addresses), you can get enough storage for 48 stages of a fully unrolled algorithm.   
   I've never tried this, but was just thinking of how to make use of all the unused BRAM laying around.  I usually run out of LUTs, but need to rethink if this is worthwhile with the DSP48 implementation.