kano/ckolivas,
I'm working with the guys at NiceHash.com to try and resolve a few issues with how they are managing miners with their stratum pool. In debugging this, I've been reading through the code that implements the stratum client - util.c - and trying to better understand the stratum protocol.
What I'm not quite clear on is the loop/state machine that you use in a mining client to "poll" a pool and attempt to connect to get work.
The NiceHash.com pool uses the password parameter in a mining.authorize to pass in a requested "threshold" for mining payouts. (e.g. I might only want to mine if there are contracts for a minimum of 0.07BTC/TH/Day so my password becomes "p=0.07") When the mining.authorize occurs, they will accept or error my auth based on that threshold comparison.
This is working ... mostly.
The problem we are seeing is that at some later point of time, the miner seems to "hang" or believe the pool is alive. I'm running kano's builds on my AntMiners, and periodically during the day I'll find them attached to the pool, but not getting any work. They think the pool is alive ... the pool is saying they sent an error on the auth.
I am debugging to see if this is pool related, miner related, or protocol related. (Or a little of several?)
What I wanted to know is:
Within cgminer, once a pool is determined to be "dead", what is the delay before the pool is tried again, and what operations (Requests) are retried to "test" the pool for aliveness? Where would I find the code that handles that delay and retrying?
I'm writing here, as I tried to locate and study the protocol documentation to determine what the "proper" process ought to be ... but didn't find too much on the subject. :-)
I'm wondering if a miner like cgminer would be doing : mining.subscribe -> mining.authorize -> error/delay/startover-at-subscribe?
If this is the case, is there any reason to believe that after looping like this - getting errors for a half day or so - the miner might hang?
I'm also glad to take this conversation elsewhere if this is not the appropriate thread.