Post
Topic
Board Hardware
Re: Official Open Source FPGA Bitcoin Miner (Spartan-6 Now Tops Performance per $!)
by
paszczakojad
on 24/04/2013, 15:33:54 UTC
This is a DSP48E1 based design, and I have compiled and run it at 400MH/s.

Have you done any testing as to which adders provide the best increase to the fmax? In order to get multiple cores in there going to need to pick and choose which adders to replace with dsps and which not to. I'm currently at 66% LUT usage with 99% memory LUT and 108% dsp usage with 2 unrolled cores (I had one core do even nonces while the other does odd nonces to make life easy). I've been slowly working down the number of dsps utilized per core to make it fit. I'm thinking it might be possible to get 3 full cores on the A7 200.

Does the DSP performance increase compound? If I change one adder over to DSP utilization and it gives a 10% fmax increase... would changing additional adders down the chain affect that 10%? or will that one adder always give a 10% boost? I'm wondering if it will be possible to go through the adders one by one and calculate the increase in frequency for each one to find which adders would be the most effectively utilized under DSP48 blocks to get the best timing.


I compiled fpgaminer's DSP code on A7 200 and I got 356 MHz on -3 grade, 311 MHz on -2 grade and 262 MHz on -1. The -3 variant only exists in extended temperature version, so it's much more expensive - so the -2 is the best choice in my opinion.

The usage was 20% slice logic, 34% slice logic distribution and 92% DSP.

What were your results? I.e. what maximum clocking do you have without DSP?

Now I'm trying to replace some DSPs with adder IP core - I think best candidates are these that don't use PCIN input (because they are simpler), like dsp_e, dsp_wp and dsp_t1p. When I replaced dsp_e with adder I got 302 MHz (-2 version), 23% logic, 37% distrib, 75% DSP. Then I replaced dsp_wp: 271 MHz, 24% logic, 38% distrib, 63% DSP. Compilation took over 5 hours, while it takes 30 min when using only DSP. Then I replaced dsp_t1p and the compilation takes ages to complete (it didn't complete yet) Sad

The estimation is that DSP usage will be 49%, so theoretically I should be able to fit two such cores. Even if I have to lower the clock to, say, 200 MHz then total output would be 400 MH/s, which would be better than 311 MH/s with one DSP-only core.