Post
Topic
Board Hardware
Re: Klondike - 16 chip ASIC Open Source Board - Preliminary
by
freeworm
on 15/07/2013, 06:26:45 UTC
From the chip statistics, I can see that some time 15 chips are working but at most time 13 chips.
Things I can think to check:

- obviously check power to each chip, btut it hash multiple connections for power so make sure it's getting to all of them. It's pretty hard to fully test that since the pads are underneath. So you have to kind of go by inspection and whether it looks like it's "dry".

- check the reset pin, it needs to be high. Don't measure at the resistor but try to get right to the pin. If it's still low then it's being held in reset.

- check the clock is getting to those chips.

- If the chips are both sequential in the chain, then the first will block the second, so focus first on the one lower in the chain. I don't have a good diagram yet - todo. The chain order is

Bank 0: U6, U8, U5, U7, U2, U4, U1, U3
Bank 1: U9, U11, U10, U12, U13, U15, U14, U16

The weird order is because I laid out the chips before we had the docs, so took a guess as to which way the pins oriented for chaining.

- what method did you use for soldering ASICs? oven or reflow air gun? I had one that shorted power and I couldn't see it. I was lucky that reheating with the air gun and giving it a small bump with tweezers got the bridge under to clear. I've found that less paste is better than more.

- if you have a scope then view the data inputs, and chain outputs.

- you can alter the code that sets up the NonceRange values to push zeros for other chips and a start value for that chip close to the expected nonce. That will have that chip find it first before others, so if triggering then you should see what it outputs. But you cannot know if it actually came from that chip except by timing - quickly, or later in the the cycle. Init to just before the nonce on that chip and just after for others gives max difference between them.

- note the chip order above. If any chip is not working then ones after it could be but may not get data from up the chain.

That's all that pops into my head atm.

Hi BBKCoins,

Thanks a lot for your information and great contribution. I have to say this is very very useful for people who are debugging this board.

I used a reflow air gun for soldering ASICs one by one. After soldering each ASIC, I double checked every PIN to make sure the power, GND, CLK, config, report were correct on my oscilloscope. When the ASIC became hot after powered on and the config signal came out at the output PINs, I knew it worked and I would do the next one.

I have moved from a desktop to a laptop and the HW error rate drops to less than 5%. It has been running for 4 hours at 256MHz hashing stably at 3.0~3.2GH/s on eligius.st. Here's chip statistics

Quote
           "Errors / Chip 0": "0000 0000 0035 0037 0042 0036 0041 0019 0041 0027 0027 0000 0034 0035 0027 0033",
            "Nonces / Chip 0": "0005 0000 0742 0682 0776 0684 0741 0739 0706 0725 0737 0010 0726 0732 0749 0772",


It shows that one chip returns only 5 nonces, one returns 10 and another one returns 0 in 4 hours. The rest 13 chips are all working as expected (256M * 13 * 0.95 =3.162G).
I think maybe the 3 problem ASICs are also working but their reports cannot be totally captured by the PIC. Since I soldered chips and tested them in cgminer one by one, I am pretty sure the problem ASICs are located at the half board far from the PIC.
This may indicate that you need to pay some attention on the PCB wire routing for the report PINs of the ASICs far from the PIC.

I will use your method to locate the problem ASICs and hopefully the real reason can be identified soon.