Post
Topic
Board Hardware
Re: Official Open Source FPGA Bitcoin Miner (Spartan-6 Now Tops Performance per $!)
by
hardcore-fs
on 01/11/2012, 23:56:59 UTC

I've been reading through sha256_transform.v, and while I've got a rough idea of what its doing, its going to take me a while to work out whether its being implimented correctly in the device


Why fuck about?
www.iscturkey.org/2010/2008/2007/pdf/sozlu/10.pdf
http://www.ee.usyd.edu.au/people/philip.leong/UserFiles/File/papers/sha_fpl02.pdf
http://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html (check out the Sha2 VHDL source, NOT SHA3)
http://www.iis.ee.ethz.ch/~sha3/index.html


Normally with academic research, you research FIRST then compare hypothesis.

The main way forward for speed, is unroll the calculations and optimize the Expander, then multi-core it.

HC

Many thanks, that's very useful background! Unfortunately maths was never my strongest subject, but I'll take the time to understand it. I'm more the hack it together and see if it works type than the academic type.  Roll Eyes

I was rather hoping that the fpgaminer code would work "out of the box", but it seems things are never that simple.

I have made some progress though. I've been comparing the different versions and compiled the xilinx branch LX150_makomk_Test ... it needed a little bit of tweaking (GOLDEN_NONCE_OFFSET was out by one), but its working at LOOP_LOG2=3 and generating valid hashes  Smiley Its bumped up the throughput by 50%, so now I'm getting 7.6 MH/s at 80MHz and 14.9 MH/s at 120MHz (OOPS, belay that remark, its kicking out bad hash'es at 120MHz, not so good).

Multi-core sounds good, perhaps mixing the sizes (say a LOOP_LOG2=3 plus a LOOP_LOG2=4) to fill up the device, however I rather expect throughput to ultimately be thermally bound (the power dissipation will scale with MH/s rather than MHz, at least to a first degree). I plan to see what performance I can get at -20C (freezer temeratures), as this is far more practical with a 10Watt FPGA than a 200W GPU! It would be nice to dynamically set the clock speed too, so the devices can self-calibrate and ramp themselves up to a maximum clock speed. As I said in my earlier post, this is going to be fun. And if I can get the kit to pay for itself, then that's just a bonus Grin

Again, many thanks, hope to stay in touch!

Maths is NOT my strong point either, but i can add up and multiply by 2 (right shift)  'and' 'xor'

e0
e1
ch
maj
sigma0
sigma1

Basically the speed 'weakness' in this algorithm is the long chain additions, the  design can be broken down into TWO main sections.

The Expander & the Compressor, since an addition  (x+y)+(p+z) is basically the same whichever way you do it.
you can calculate BOTH
(x+y)
(p+z)
At the SAME time, since neither independent result depends on the other.

consider:

w_out(511 downto 480) <= s1 + w_in(319 downto 288) + s0 + w_in(31 downto 0);

Whilst it executes within a "single clock cycle"
process(clk)
....
.....

The shear length of the additions DICTATES the number of logic levels and therefore the  MINIMUM clock cycle length, due to the physical implementation of the routing.(you cannot go faster than a CLK cycle, all you can do is ensure your logic shortens it)

Also if you are going to stick shit into the freezer.

1. It ain't going to be a profitable way to mine at 7.6MH/s, since the cooling cost outweighs the bitcoin value
2. SEAL the device in a PLASTIC bag with some silica gel, because when you bring the stuff out of the freezer, moisture in the air is going to condense on the design and destroy it. (in a poly bag, it prevents condensation until the design reaches ambient , at which time it can be brought OUT of the bag. The silica gel acts as a buffer to ensure the bag is super low humidity)

3. Its NEVER going to pay for its-self at 7.6MH/S.