I believe the setting is quite pool specific. I used to use --queue 1 but found I got a lower DOA rate & slightly better performance using --queue 0 with the cpu at ~80% - but I'm pointing mine at p2pool so maybe that's why? Load is currently: 1.58 1.62 1.40 @ 237.5
Thanks for the feedback. I'm pointing them at Slush, with Bitminter as a backup (failover).
Don't see how the CPU% is dependent on the OC, but I ran tests on 3 different clocks, with 3 different queue settings. The load was pretty high on all of them.
Everytime there is a new block the cpu has to make 4097, 129, 3, 2, or 1 work unit to send out immediately. There is always 1 with a queue of 0 you have 1 work unit prepped to send to the next chip needing work. with 4096 you have 4097 work units ready to go out. At the end of a block you still have 1 to 4097 work units ready to go. The cpu will have far less load spikes on a lower number. I started with 2 (three prepped in total) but noticed 1 (2 ready at all times) used a bit less cpu. I then moved to 1 and got just under 2 UNLESS I go to the Realtime Graphs tab. The big advantage to something slightly above 0 but less then maybe 16 is that if most of your hashing chips suddenly ran out of work there is enough work for half of them queued up. At least for one work unit each.
I really did prefer 2 ish but 0 seems to be less loaded and pool results aren't significantly different for me.
I understand how the queue works; I've been using cgminer since v 1.4 or something silly like that.
When I dropped down to queue=1, my hashrate dropped from ~480 to ~460. I'm running it with a queue of 16 for now, but my load is still 2+.
But to answer my original question: this sounds like something that other people are living with, and is normal?
I didn't intend to imply you hadn't been using cgminer or that you didn't know how the queue worked. I only wanted to point out that I find it less likely that running even more processor cycles initially to get to a certain level of queued work would be somewhat less likely to help improve speed. I do see how an increase from 440 to 480 would represent a 8.333333% increase in speed. Likely the processor requirements to keep up with said improvement in hashing speed are fairly linear. If you are running 90% at 440 I would expect nearly 99% at 480. That having been said I also would wonder how bogging the cpu down for longer initially and maintaining a higher average load would improve hashing speed. I suppose since the new work can be queued up farther down the line and cgminer wouldn't run out of work maybe it makes sense. I do see that you tested it and I don't want to infer you have made a mistake. I don't think you have. I just find it very curious.
I do wish the BITMAIN would publish the device driver details so that a current version of cgminer could be attained. I know at least 2 releases in 4 had notes about queue improvements and one at least specifically stated it generated work much faster. Also one of the last 3's had an improvement about how work was loaded into devices. I know for sure 3.12 is very old and I am positive that many work generation improvements have been added since then.
This analysis also ignores the serial load as that is separate unless BITMAIN actually has them running direct USB. Meaning the load will be actually higher for cgminer then it appears on top or the process tree.