First - quick results for cn-trtl, w/ efficiency-focused settings, using timings mostly borrowing from others here, w/ some minor tweaks:
Vega 64 air, ubuntu 18.04 + amdgpu-pro 18.50, TRM 0.4.3 (L18+18), 852 cclock (p0)/1107 mclock/818mv, power readings at the wall
stock timings:
--CL 20 --RAS 33 --RCDRD 16 --RCDWR 10 --RC 47 --RP 14 --RRDS 4 --RRDL 6 --RFC 260 (--REF 3900)
18.5 kh/s @ 135w (137 h/w)
modded timings 1
--CL 19 --RAS 28 --RCDRD 12 --RCDWR 5 --RC 44 --RP 12 --RRDS 3 --RRDL 3 --RFC 248
19.75 kh/s @ 137w (144 h/w)
modded timings 2
same as above, plus --REF 15600
20.71 kh/s @ 137w (151 h/w)
Second - notes on power... I don't see any appreciable power differences - nor would I expect to. Clocks and voltages are untouched, we simply have a bit more data being transferred. Even the 2w difference I'm reporting here is conservative - taking natural fluctuations in my readings into account, my actual increase could be closer to <= 1w. People seeing large power increases (at least on vega 64) seem to have something else going on.
Last - some conjecture / educated guessing re: THAT --REF THO!!! I'm assuming --REF is the refresh frequency, in nanoseconds, and unlike most timings, a higher value (meaning less refreshing) is better. Refreshes steal bandwidth, and AMD seems to have gone majorly conservative (aggressive?) on this, probably due to the super high temps of the HBM during normal/gaming use. As leakage increases w/ temps, more refreshes would be required when running your GPU/HBM at high clocks/voltages. Since (efficient) miners tend to run cooler, the crazy high default refresh rate is really unnecessary. I found 4x the default to be around where returns quickly diminish, at least at my clocks - i can get maybe another 50 h/s (turtle) going 4.5x. HOWEVER - if you run super aggressive for max h/r, or just aren't effectively cooled in general, you may want to dial this back, or you may start seeing mem errors / bad shares from corrupted data due to insufficient refreshing / leakage.
nice test, can you test cnr algo, is it necessary flash 56 to 64 for this algo?
Ok - now that the latest and greatest TRM is public... here are my results for vega 64s (power is ATW)
Base GPU settings: 1375 (effective) cclock / 1107 mclock / 837mv (range across 8x64s is 825-843mv)
TRM 0.4.3, 15+15, no timing mods:
2120 h/s @ 165w or 12.8 h/wTRM 0.4.3, 15+15, --RAS 28 --RCDRD 12 --RCDWR 5 --RC 44 --RP 12 --RRDS 3 --RRDL 3 --REF 15600 --RFC 248 --FAW 14:
2140 h/s (did not measure power)TRM 0.4.4, 15*15, --RAS 28 --RCDRD 12 --RCDWR 5 --RC 44 --RP 12 --RRDS 3 --RRDL 3 --REF 15600 --RFC 248 --FAW 14:
2440 h/s @ 178w or 13.7 hwTRM 0.4.4, 15*15, --RAS 32 --RCDRD 12 --RCDWR 5 --RC 44 --RP 12 --REF 15600:
2420 h/s (did not measure power - assume same as previous)As you can see:
1. Timings mods aren't doing much for CNR, w/o the latest TRM '*' mode
2. Many of the timing mods are just pushing at the edges... You can get ~95% of the gain w/ just 6 (maybe less) parameters mods.