Post
Topic
Board Mining (Altcoins)
Re: [ANN] TeamRedMiner 0.7.17 - Nimiq/Kawpow/Ethash/Cryptonight and More
by
kerney666
on 18/11/2020, 15:57:12 UTC
I have been using TRM for over 6 months with very happy results on my Ethereum mining farm. The most recent release has been causing random GPU mining crashes on rigs that have been stable for months. The problem with the crashes is 0.7.17 frequently hangs and just sits there not mining until the user issues a system restart. I have had 10-20 rigs not mining for hours due to this.

I recently went back to 0.7.15 and all the issues went away. I have been running a farm for about four years now and know when there is an issue with mining software versus GPU crashes due to OC, riser, voltage issues, etc...

Here is SMOS log of one of the many crashes after 0.7.17 upgrade on all my machines.
Code:
[2020-11-10 04:36:13] Pool us2.ethermine.org received new job. (job_id: 0x238cea67918072b4b145002a593cb77015079123ffb74ce84a47d8ff1f78aafc)
[2020-11-10 04:36:14] Watchdog triggering miner shutdown after restart script execution.
[2020-11-10 04:36:14] Shutting down...
[2020-11-10 04:36:14] Watchdog thread exiting.
[2020-11-10 04:36:14] GPU10 thread exiting.
[2020-11-10 04:36:14] GPU 9 thread exiting.
[2020-11-10 04:36:14] GPU12 thread exiting.
[2020-11-10 04:36:14] GPU 2 thread exiting.
[2020-11-10 04:36:14] GPU 1 thread exiting.
[2020-11-10 04:36:14] GPU11 thread exiting.
[2020-11-10 04:36:14] GPU 3 thread exiting.
[2020-11-10 04:36:14] GPU 6 thread exiting.
[2020-11-10 04:36:14] GPU 7 thread exiting.
[2020-11-10 04:36:14] GPU 0 thread exiting.
[2020-11-10 04:36:14] GPU 8 thread exiting.
[2020-11-10 04:36:14] GPU 5 thread exiting.
[2020-11-10 04:36:24] GPU 4 thread 0 shutdown timed out.
[2020-11-10 04:36:24] Successful clean shutdown.
Miner ended or crashed. Restarting miner in 30 seconds...


Hi! Any chance you can hunt me down on discord to do some one-one-one troubleshooting? I would love to get more data here, a full log as produced by —log_file would be great as a start. There are zero kernel changes between these two versions, so gpu stability isn’t really expected to be affected. I also wonder what watchdog/restart script is executed above. Afaik SMOS normally run their own script, but since you don’t even get a proper reboot above something is weird. Also, it looks like gpu 4 is stuck above, would be interesting to hear if there are any kernel/dmesg logs of interest or if this could even be a host-side hang.

I've noticed on multiple occasions that miner that is able to allocate proper and work normally after restart will just sit idle around the gpu reset. miner can be closed gracefully and it is not active in a driver and app can be started again and sometimes it may even work...
multiple restarts/power cycles usually do the trick, but it's no way to use it... so for now I'm on PM5.2c - it allocates 4023 easily, but unlike TRM it won't hold the allocation and switch back and froth between ETH and ZIL dags. plan to setup linux on one 4gb with 11 gpus, will see how it goes.
 
...Also Claymore's Miner Manager is very useful, any thoughts to update the API to support at least the reporting part?


Yeah, the win allocation state for 4GBs is a mess right. It seems to depend a lot on tiny details in the allocation strategy, ending up with something that's just random. Some users say TRM works and other miners doe't, for others it's vice versa. It's a shitshow Cheesy.

And yes, support for Claymore's Miner Manager is on the TODO list. Will probably be read-only.