Post
Topic
Board Mining support
Re: Ubuntu 11.04: miners hang without reporting any error
by
sunbird
on 27/10/2011, 02:10:28 UTC
Ah ha!

So, after running for about 36 hours, I was going to tweak the settings on one of the miners, but CTRL-C didn't do anything. I noticed that all the Mhash/sec figures were frozen. I rebooted, tried to start the miner again, and the machine froze up for a bit, and then syslog had this to say:

Code:
Oct 26 15:34:03 kernel: [  690.520008] [fglrx] ASIC hang happened
Oct 26 15:34:03 kernel: [  690.520020] Pid: 9047, comm: clinfo Tainted: P            2.6.38-12-generic #51-Ubuntu
Oct 26 15:34:03 kernel: [  690.520026] Call Trace:
Oct 26 15:34:03 kernel: [  690.520117]  [] ? KCL_DEBUG_OsDump+0xe/0x10 [fglrx]
Oct 26 15:34:03 kernel: [  690.520196]  [] ? firegl_hardwareHangRecovery+0x1c/0x50 [fglrx]
Oct 26 15:34:03 kernel: [  690.520325]  [] ? _ZN4Asic9WaitUntil15ResetASICIfHungEv+0x9/0x10 [fglrx]
Oct 26 15:34:03 kernel: [  690.520448]  [] ? _ZN4Asic9WaitUntil15WaitForCompleteEv+0x6c/0xb0 [fglrx]
Oct 26 15:34:03 kernel: [  690.520572]  [] ? _ZN19mmEngineR600_DRMDMA4idleEv+0x72/0xc0 [fglrx]
Oct 26 15:34:03 kernel: [  690.520693]  [] ? _ZN14CMMHeapManager22freeAllExpiredTSMemoryEj+0x64/0xe0 [fglrx]
Oct 26 15:34:03 kernel: [  690.520816]  [] ? _ZN18mmEnginesContainer4idleEv+0x46/0x60 [fglrx]
Oct 26 15:34:03 kernel: [  690.520935]  [] ? _ZN15QS_PRIVATE_CORE7idleAllE15idle_WaitMethod+0x2d/0x40 [fglrx]
Oct 26 15:34:03 kernel: [  690.521050]  [] ? _ZN3MSF19doGarbageCollectionEv+0x35/0x260 [fglrx]
Oct 26 15:34:03 kernel: [  690.521061]  [] ? down+0x2e/0x50
Oct 26 15:34:03 kernel: [  690.521127]  [] ? KCL_SPINLOCK_STATIC_Release+0x16/0x20 [fglrx]
Oct 26 15:34:03 kernel: [  690.521213]  [] ? firegl_cmmqs_ProcessTerminate+0x32/0xc0 [fglrx]
Oct 26 15:34:03 kernel: [  690.521287]  [] ? firegl_release_helper+0x3a8/0x6c0 [fglrx]
Oct 26 15:34:03 kernel: [  690.521362]  [] ? firegl_release+0x60/0x1c0 [fglrx]
Oct 26 15:34:03 kernel: [  690.521426]  [] ? ip_firegl_release+0x11/0x20 [fglrx]
Oct 26 15:34:03 kernel: [  690.521436]  [] ? __fput+0xbe/0x200
Oct 26 15:34:03 kernel: [  690.521444]  [] ? fput+0x25/0x30
Oct 26 15:34:03 kernel: [  690.521451]  [] ? filp_close+0x60/0x90
Oct 26 15:34:03 kernel: [  690.521461]  [] ? put_files_struct+0x88/0xf0
Oct 26 15:34:03 kernel: [  690.521469]  [] ? exit_files+0x54/0x70
Oct 26 15:34:03 kernel: [  690.521477]  [] ? do_exit+0x175/0x410
Oct 26 15:34:03 kernel: [  690.521549]  [] ? drm_free+0xf3/0x180 [fglrx]
Oct 26 15:34:03 kernel: [  690.521558]  [] ? do_group_exit+0x58/0xd0
Oct 26 15:34:03 kernel: [  690.521566]  [] ? get_signal_to_deliver+0x247/0x410
Oct 26 15:34:03 kernel: [  690.521650]  [] ? firegl_cmmqs_CWDDE32+0x0/0x100 [fglrx]
Oct 26 15:34:03 kernel: [  690.521658]  [] ? do_signal+0x56/0x180
Oct 26 15:34:03 kernel: [  690.521723]  [] ? ip_firegl_unlocked_ioctl+0xe/0x20 [fglrx]
Oct 26 15:34:03 kernel: [  690.521733]  [] ? do_vfs_ioctl+0x8f/0x360
Oct 26 15:34:03 kernel: [  690.521740]  [] ? do_notify_resume+0x65/0x80
Oct 26 15:34:03 kernel: [  690.521748]  [] ? sys_ioctl+0x91/0xa0
Oct 26 15:34:03 kernel: [  690.521754]  [] ? int_signal+0x12/0x17
Oct 26 15:34:03 kernel: [  690.521765] pubdev:0xffffffffa09b03c0, num of device:3 , name:fglrx, major 8, minor 86.
Oct 26 15:34:03 kernel: [  690.521772] device 0 : 0xffff880144430000 .
Oct 26 15:34:03 kernel: [  690.521778] Asic ID:0x689c, revision:0x2, MMIOReg:0xffffc90011140000.
Oct 26 15:34:03 kernel: [  690.521784] FB phys addr: 0xc0000000, MC :0xf00000000, Total FB size :0x40000000.
Oct 26 15:34:03 kernel: [  690.521791] gart table MC:0xf0f91f000, Physical:0xcf91f000, size:0x3e0000.
Oct 26 15:34:03 kernel: [  690.521798] mc_node :FB, total 1 zones
Oct 26 15:34:03 kernel: [  690.521803]     MC start:0xf00000000, Physical:0xc0000000, size:0xfd00000.
Oct 26 15:34:03 kernel: [  690.521811]     Mapped heap -- Offset:0x0, size:0xf91f000, reference count:16, mapping count:0,
Oct 26 15:34:03 kernel: [  690.521818]     Mapped heap -- Offset:0x0, size:0x1000000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [  690.521825]     Mapped heap -- Offset:0xf91f000, size:0x3e1000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [  690.521832] mc_node :INV_FB, total 1 zones
Oct 26 15:34:03 kernel: [  690.521837]     MC start:0xf0fd00000, Physical:0xcfd00000, size:0x30300000.
Oct 26 15:34:03 kernel: [  690.521844]     Mapped heap -- Offset:0x302f4000, size:0xc000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [  690.521851] mc_node :GART_USWC, total 2 zones
Oct 26 15:34:03 kernel: [  690.521856]     MC start:0x3e750000, Physical:0x0, size:0x4d800000.
Oct 26 15:34:03 kernel: [  690.521863]     Mapped heap -- Offset:0x30000, size:0x2000000, reference count:14, mapping count:0,
Oct 26 15:34:03 kernel: [  690.521869] mc_node :GART_CACHEABLE, total 3 zones
Oct 26 15:34:03 kernel: [  690.521875]     MC start:0x10400000, Physical:0x0, size:0x2e350000.
Oct 26 15:34:03 kernel: [  690.521881]     Mapped heap -- Offset:0x2600000, size:0x100000, reference count:2, mapping count:0,
Oct 26 15:34:03 kernel: [  690.521889]     Mapped heap -- Offset:0x1400000, size:0x900000, reference count:2, mapping count:0,
Oct 26 15:34:03 kernel: [  690.521897]     Mapped heap -- Offset:0xb00000, size:0x900000, reference count:2, mapping count:0,
Oct 26 15:34:03 kernel: [  690.521904]     Mapped heap -- Offset:0x200000, size:0x900000, reference count:2, mapping count:0,
Oct 26 15:34:03 kernel: [  690.521911]     Mapped heap -- Offset:0x0, size:0x200000, reference count:7, mapping count:0,
Oct 26 15:34:03 kernel: [  690.521919]     Mapped heap -- Offset:0xef000, size:0x11000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [  690.521928] GRBM : 0x3828, SRBM : 0x200000c0 .
Oct 26 15:34:03 kernel: [  690.521937] CP_RB_BASE : 0x3e7800, CP_RB_RPTR : 0x19dc0 , CP_RB_WPTR :0x19dc0.
Oct 26 15:34:03 kernel: [  690.521946] CP_IB1_BUFSZ:0x0, CP_IB1_BASE_HI:0x0, CP_IB1_BASE_LO:0x3eaa8000.
Oct 26 15:34:03 kernel: [  690.521953] last submit IB buffer -- MC :0x3eaa8000,phys:0x131ebc000.
Oct 26 15:34:03 kernel: [  690.521961] device 1 : 0xffff880145b14000 .
Oct 26 15:34:03 kernel: [  690.521967] Asic ID:0x689c, revision:0x2, MMIOReg:0xffffc90011180000.
Oct 26 15:34:03 kernel: [  690.521973] FB phys addr: 0xb0000000, MC :0xf00000000, Total FB size :0x40000000.
Oct 26 15:34:03 kernel: [  690.521979] gart table MC:0xf0f91f000, Physical:0xbf91f000, size:0x3e0000.
Oct 26 15:34:03 kernel: [  690.521985] mc_node :FB, total 1 zones
Oct 26 15:34:03 kernel: [  690.521990]     MC start:0xf00000000, Physical:0xb0000000, size:0xfd00000.
Oct 26 15:34:03 kernel: [  690.521997]     Mapped heap -- Offset:0x0, size:0xf91f000, reference count:10, mapping count:0,
Oct 26 15:34:03 kernel: [  690.522004]     Mapped heap -- Offset:0x0, size:0x1000000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [  690.522010]     Mapped heap -- Offset:0xf91f000, size:0x3e1000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [  690.522017] mc_node :INV_FB, total 1 zones
Oct 26 15:34:03 kernel: [  690.522022]     MC start:0xf0fd00000, Physical:0xbfd00000, size:0x30300000.
Oct 26 15:34:03 kernel: [  690.522028]     Mapped heap -- Offset:0x302f4000, size:0xc000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [  690.522034] mc_node :GART_USWC, total 2 zones
Oct 26 15:34:03 kernel: [  690.522039]     MC start:0x3e750000, Physical:0x0, size:0x4d800000.
Oct 26 15:34:03 kernel: [  690.522045]     Mapped heap -- Offset:0x30000, size:0x2000000, reference count:10, mapping count:0,
Oct 26 15:34:03 kernel: [  690.522052] mc_node :GART_CACHEABLE, total 3 zones
Oct 26 15:34:03 kernel: [  690.522057]     MC start:0x10400000, Physical:0x0, size:0x2e350000.
Oct 26 15:34:03 kernel: [  690.522063]     Mapped heap -- Offset:0x1d00000, size:0x900000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [  690.522070]     Mapped heap -- Offset:0x1400000, size:0x900000, reference count:2, mapping count:0,
Oct 26 15:34:03 kernel: [  690.522077]     Mapped heap -- Offset:0xb00000, size:0x900000, reference count:2, mapping count:0,
Oct 26 15:34:03 kernel: [  690.522084]     Mapped heap -- Offset:0x200000, size:0x900000, reference count:2, mapping count:0,
Oct 26 15:34:03 kernel: [  690.522091]     Mapped heap -- Offset:0x0, size:0x200000, reference count:4, mapping count:0,
Oct 26 15:34:03 kernel: [  690.522098]     Mapped heap -- Offset:0xef000, size:0x11000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [  690.522106] GRBM : 0x3828, SRBM : 0x20000ac0 .
Oct 26 15:34:03 kernel: [  690.522113] CP_RB_BASE : 0x3e7800, CP_RB_RPTR : 0x5b0 , CP_RB_WPTR :0x5b0.
Oct 26 15:34:03 kernel: [  690.522121] CP_IB1_BUFSZ:0x0, CP_IB1_BASE_HI:0x0, CP_IB1_BASE_LO:0x3e8df000
Oct 26 15:34:03 kernel: [  690.522127] last submit IB buffer -- MC :0x3e8df000,phys:0x12ffd9000.
Oct 26 15:34:03 kernel: [  690.522135] device 2 : 0xffff880145b08000 .
Oct 26 15:34:03 kernel: [  690.522140] Asic ID:0x9440, revision:0x2, MMIOReg:0xffffc900111c0000.
Oct 26 15:34:03 kernel: [  690.522146] FB phys addr: 0xd0000000, MC :0xf00000000, Total FB size :0x40000000.
Oct 26 15:34:03 kernel: [  690.522152] gart table MC:0xf0fc1f000, Physical:0xdfc1f000, size:0x3e0000.
Oct 26 15:34:03 kernel: [  690.522158] mc_node :FB, total 1 zones
Oct 26 15:34:03 kernel: [  690.522162]     MC start:0xf00000000, Physical:0xd0000000, size:0x10000000.
Oct 26 15:34:03 kernel: [  690.522169]     Mapped heap -- Offset:0x0, size:0xfc1f000, reference count:11, mapping count:0,
Oct 26 15:34:03 kernel: [  690.522176]     Mapped heap -- Offset:0x0, size:0x1000000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [  690.522183]     Mapped heap -- Offset:0xfc1f000, size:0x3e1000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [  690.522189] mc_node :INV_FB, total 1 zones
Oct 26 15:34:03 kernel: [  690.522194]     MC start:0xf10000000, Physical:0xe0000000, size:0x30000000.
Oct 26 15:34:03 kernel: [  690.522201]     Mapped heap -- Offset:0x2fffd000, size:0x3000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [  690.522207] mc_node :GART_USWC, total 2 zones
Oct 26 15:34:03 kernel: [  690.522211]     MC start:0x3e750000, Physical:0x0, size:0x4d800000.
Oct 26 15:34:03 kernel: [  690.522218]     Mapped heap -- Offset:0x30000, size:0x2000000, reference count:6, mapping count:0,
Oct 26 15:34:03 kernel: [  690.522224] mc_node :GART_CACHEABLE, total 3 zones
Oct 26 15:34:03 kernel: [  690.522229]     MC start:0x10400000, Physical:0x0, size:0x2e350000.
Oct 26 15:34:03 kernel: [  690.522235]     Mapped heap -- Offset:0x1d00000, size:0x900000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [  690.522242]     Mapped heap -- Offset:0x1400000, size:0x900000, reference count:2, mapping count:0,
Oct 26 15:34:03 kernel: [  690.522249]     Mapped heap -- Offset:0xb00000, size:0x900000, reference count:2, mapping count:0,
Oct 26 15:34:03 kernel: [  690.522256]     Mapped heap -- Offset:0x200000, size:0x900000, reference count:2, mapping count:0,
Oct 26 15:34:03 kernel: [  690.522263]     Mapped heap -- Offset:0x0, size:0x200000, reference count:2, mapping count:0,
Oct 26 15:34:03 kernel: [  690.522270]     Mapped heap -- Offset:0xef000, size:0x11000, reference count:1, mapping count:0,
Oct 26 15:34:03 kernel: [  690.522278] GRBM : 0x3028, SRBM : 0x200000c0 .
Oct 26 15:34:03 kernel: [  690.522284] CP_RB_BASE : 0x3e7800, CP_RB_RPTR : 0x330 , CP_RB_WPTR :0x330.
Oct 26 15:34:03 kernel: [  690.522291] CP_IB1_BUFSZ:0x0, CP_IB1_BASE_HI:0x0, CP_IB1_BASE_LO:0x3e8bc000.
Oct 26 15:34:03 kernel: [  690.522297] last submit IB buffer -- MC :0x3e8bc000,phys:0x12daa5000.
Oct 26 15:34:03 kernel: [  690.522303] Dump the trace queue.
Oct 26 15:34:03 kernel: [  690.522307] End of dump

I've got a gigabyte board with 4 GB ram and 1x 5970 and 1x4830. The MB, ram, CPU are all new. I am also using a new SSD drive for this box.

I think I'll pull the 4830 and see if the problem persists.

I "fixed" it again by rebooting, running aticonfig -f --initial --adapter=all, rebooting and it plugged away for a bit more before freezing again.

One other thought, I'm running this with the latest updates in the 11.04 ubuntu tree, including the most recent kernel.

Thoughts?

*Edit*

I should have listed my clock settings, which I pulled from the mining hardware page. I have both the 5970 and the 4830 set to 850/300 for core/memory. In addition to trying to pull the 4830, I'll try mining at stock clocks to see if I can reproduce.