Please excuse the simplistic question, but does anyone know how the BitFury chip works? It apparently has about 700+ rolled-up double SHA256 engines, each of which must have a unique nonce to be useful. It seems (according to what I read in this forum and the supporting links) to claim that it can perform a complete pass through each of those engines in about 65 clocks, at several hundred MHz. On the other hand, someone has displayed a scope trace showing a load sequence, with Merkle data, pre-calc, and block data, along with about 110 uS of nonce initialization. Since the init sequence in total is over 300 uS long, and there are only a few nonces in it, I have two questions:
1. How does ~800 bits of nonce initialization data set up the 700 * 4 * 8 = more than 22,000 bits of nonce needed for all 700+ engines?
2. If, indeed, the engines run for about 65 clocks, at say 300MHz - which is about 1/5 of a microsecond - at which point they have used that nonce data, how is the chip getting more than a small fraction of 1% utilization: run for 0.2uS, init for 300uS, repeat as desired...
I am presuming that somewhere out there is a document that exposes what the chip internals do with the initialization nonce data, and what happens to the nonces once each round of 65 clocks has passed.
WHERE IS THAT INFORMATION?
Presumably, the initialization nonce data is expanded some way, by a factor of 22k/800 = 28 to get initial nonces for all 700 engines, then each round of 65 clocks results in those nonces being incremented or shifted, or something. With only a one-bit "I found one!" notice coming from the whole chip, decoding which nonce was the golden one becomes rather time consuming and depends upon knowing the number of 65 clock rounds since the startup...
So where is all this crucial information? I've searched and searched, and the best that I've found is to reverse engineer one or more of the miners, but there is a crucial step missing - I see what the program does, but I can't see how that data is manipulated before being shipped into the chip via that Manchester encoded serial pair.
Does anyone have a pointer to the real information? Has anyone analyzed the WU readouts of the demo'd miners to see if we are actually getting the number of results expected for the hash rate, and not just getting trillions of duplicated hashes on duplicated nonce streams?
Thanks for any help or pointers, folks.... making boards and firmware for this kind of a system should be a piece of cake for any experienced design team - I've run such teams for nearly 45 years, and we've done many more complex systems. But it's impossible if the chip semantics aren't available, unless you have a government lab at your disposal that can treat it as a black box and run dozens of probing tests at a time. They don't open-source the results.