Re: Official Open Source FPGA Bitcoin Miner (Smaller Devices Now Supported!)

Quote

This would (for LOOP_LOG2=1) mean, that it will need to be fed a burst of work during 32 clocks, and then nothing at all during the next 32 clocks, which seems to be unneccessarily complicated to me.

It's fed "new" work every other clock cycle, confusingly enough. HASHERS[0], for example, will take new work at the first clock, old work at the second, new work at the third, old work at the fourth, and so on. It's really not a terrible way to do it, except for the long feedback chain required between the last and the first hasher, which is fine if those two instances are placed next to one another. ... and the fact that it's confusing and makes my brain squeal.

It may even be better, to be honest, because you have feedback in only one location instead of at every hasher. If you make a U shape out of the hasher placement it puts the first and last hasher next to each other for fast feedback there, and the rest can happily route forward. K_next is the only signal that changes the behavior of the hashers at each cycle, and is likely implemented as a 2:1 MUX at LOOP_LOG2=1.

Quote

Mine works in a different way. It feeds back a stage's outputs to its own inputs on every second clock, making HASHERS[0] handle rounds 0 and 1, and HASHERS[31] handle rounds 62 and 63. This way it just seems to work at half the frequency externally, no need for bursts.

That is how the normal code works. This is only applicable to makomk's latest revisions, where I think he chose to do a long chain loop instead of tight feedback because of how the W shift registers are implemented at LOOP_LOG2>0. He'll have to chime in here to verify, as I haven't worked all the logic out for myself.

Quote

Hm, this is weird. Maybe somehow related to verilog?
With my VHDL code, it tries to implement every single occurrence of two registers in a row as a shift register, leading to lots of unused flipflops and an unroutably high LUT usage. Limiting that to at least 4 registers in a row in the synthesis options improved things a lot.

Quite possibly. I am mostly just happy to finally have a routable design running on my LX150 at 100MHz Tongue

At least on Altera, Quartus couldn't figure out how to shove things into the M9Ks with vanilla code. makomk's version with explicit shifters is what got Quartus to use the M9Ks and cram the design into 75K LEs.

Perhaps the way VHDL is synthesized allows the compiler to better realize chains of registers? It seems odd, but I wouldn't put anything past these compilers Tongue