Re: Block Erupter: Dedicated Mining ASIC Project (Open for Discussion)

Quote from: kano on November 24, 2012, 11:29:39 AM

1) Firstly, the double sha256 is a total of 3 rounds (with 64 steps each) - just the whole first round is constant across a full nonce range.
(commonly known as the midstate) that you only need to do once per nonce range.
2) Secondly, the first 3 steps of the 2nd round are constant across a full nonce range.
3) Thirdly, some of the W values are also constant across a full nonce range (easy to work out which)
4) Then finally, as you said, you don't need to complete the last 3 steps of the 3rd round.

Thanks. I'm quoting this because it is a very nice reference for the state-of-the-art GPU/FPGA optimizations. I remembered the 4) on your list the most because it most clearly shows the shift-register structure inherent to the SHA-256.

Edit: Note to self: Kano is swapping the standard terminology: step vs. round. Using standard terminology first SHA-256 hash in Bitcoin consists of 2 steps of 64 rounds each.

Quote from: kano on November 24, 2012, 11:29:39 AM

In ASIC terms it would be risky to implement any of 2, 3 or 4
While you may gain a few % overall (6 out of 128 steps plus W optimistations) it also means you can only sha256 an exact BTC block header.
If BTC continues to use sha256 but makes any changes to the block header, then that wouldn't be a problem if none of steps 2, 3 or 4 were implemented in the silicon, since you could change the firmware to deal with a different header.

At least for the chip discussed in this thread it appears that the block header structure is fixed:

Quote from: friedcat on September 22, 2012, 05:32:59 AM

0-31    writing midstate
32-43   writing data
44-47   reading nonce