Post
Topic
Board Hardware
Re: Official Open Source FPGA Bitcoin Miner (Smaller Devices Now Supported!)
by
makomk
on 27/07/2011, 13:23:02 UTC
edit: update
if I use this for the K and K_next assignment when LOOP == 1, I don't get the LUT messages anymore:
Quote
`ifdef USE_RAM_FOR_KS
         if ( LOOP == 1) begin
            assign K = Ks_mem[ i ];
            assign K_next = Ks_mem[ i + 1 ];
         end else begin
...
I think the problem is that K and K_next are not assigned in a clock state, thus they become asynchronous combinatorial logic - and XST can't map that to a ROM?  Or maybe it's the addition of using a multiplier output as an address selector?  Something in there XST wasn't liking for me.
I'm guessing the reason you don't get the LUT RAM messages anymore is because xst is now interpreting that as a single constant value rather than as a table lookup. (Since i is a genvar, both i and i+1 are constant at synthesis time, and xst already knows Ks_mem is read-only.)

I was able to achieve 100MHz and route it with the current design, slightly modified.  Changed the PLL to output clk0 so no clock division, and I changed the K/K_next assignments as to what I described in my previous post.  This routes it to a min clock period of 9.742ns for me.
Congratulations, nice one! Wonder what exactly ISE was doing before...

It also seems like the worst case critical path related to the 100MHz clock is between Hasher[8] and Hasher[13], looks like an output of an adder in Hasher[8], adding rx_state and k_next gets registered into Hasher[13]'s shift_w1 register.
That's interesting.

Edit: Just noticed there was another page of discussion. fpgaminer's explanation is indeed correct, including for the bits I modified. (I'm afraid I'm actually responsible for a lot of the confusing bits, including the cur_ prefixes which are there to distinguish between the values of wn in this round and the variable new_w15 which holds the value of w15 used by the next round.)

It might be more useful to think of the w computation as being
Code:
w[i+16] = w[i] + s0(w[i+1]) + w[i+9] + s1(w[i+14])
because that's effectively how it's being calculated. cur_w actually means w[+i] and next_w15 will hold the value of w[i+16] one clock cycle later.

This is only applicable to makomk's latest revisions, where I think he chose to do a long chain loop instead of tight feedback because of how the W shift registers are implemented at LOOP_LOG2>0. He'll have to chime in here to verify, as I haven't worked all the logic out for myself.
That's one reason for the change, but it's probably not the only one; all those muxes at every stage in the pipeline aren't exactly cheap from what I can tell. (Also: don't forget the extra added clock cycle of delay in the feedback path in order to make each piece of data arrive at the input with cnt one higher than on its previous pass through the pipeline.)