Re: [OPEN] Bitmine CoinCraft A1 28nm chip distribution / DIY support

Quote from: zefir on February 20, 2014, 10:39:07 AM

Quote from: totalslacker on February 20, 2014, 01:30:32 AM

That looks to work very well. Unfortunately my hashing test is hitting some errors at anything much over 800Mhz.

I don't see chips dropping out or bad results being produced, just that nonces are missed. My code checks for all six nonces in sequence for each job issued (all four chips are run simultaneously and the job queue is kept as full as possible).

I don't think the nonce queue is overflowing as I'm issuing the read result command very frequently (and then checking chip status for finished jobs). I will get a bunch of no results and then a chip reports a nonce that has skipped a previous result.

I haven't looked at power at the board level at all yet - hopefully that's the issue. I think this board is running a little low on the voltage front so hopefully I can clean things up with that.

The good news is that I did get it to run at up to 1.25Ghz (40GH/s). About two-thirds of the nonces were dropped but it successfully ran the test (101 jobs) to completion.

My SPI frequency is a little low (think it's around 250Khz) but I wouldn't expect that to cause issues unless it were so low that the nonce queues couldn't be serviced frequently enough?

For 1.25GHz to work stable, you will need extreme cooling. Once the initial stress is over, I plan to develop a stackable 16-chip board for a submerged setup. A 10 PCB stack will fit into one liter - 160 chips ran at 40GHps, or 6.4TH in-a-box. But for heatsink cooling 1.25GHz is definitively a challenge.

As for the work / result pipelining: luckily, with input and output queues the job feeding and collecting of results is de-coupled, leaving you the freedom to find a good trade-off between their priorities. We dimensioned the output queue based on statistics collected from mining sessions which resulted in 99.9% of all jobs have 4 or less results.

If you run the A1 at nominal speed (800MHz), it will crunch the nonce range in ~160ms. That is, if every 160ms you collect the results and feed new jobs, you should get 99.9% of all potential results. To catch all results of a job with more than 4 winning nonces, you need to poll for results in-between, i.e. check every 80ms and you reduce loosing the 5th result significantly. Obviously, you can't cover all potential use cases and prevent loosing results: assume you have a job with 5 winning nonces that happen to be very close to each other - the chip will spit them out before you have time to collect. But the chances for that are negligible.

Bottom line: if you feed the chip at least every nonce period (e.g. 160ms at 800MHz) and check for jobs every half nonce period, you should be on the safe side.

As for SPI clock, I made some theoretical considerations here: https://bitcointalk.org/index.php?topic=294235.msg4554508#msg4554508

Why 16, when you already have ref.design for 8, and 8 is easier to cool...

Even when you think about immersion cooling, if enything goes wrong, your whole chain (16) goes offline...

Wouldn't it then be smarter to stick with 8 ?