It turned out that it is impossible to tune memory timings by hand for on-the-fly memory timing mods, and I ended up implementing a fully automated optimizer for memory timings, overclocking, and algorithm parameters such as intensity and global work size. I was pulling my hairs over this stuff, but it should almost be over...
Great work sir! Can you say, what about change pow algo for monero? Do you implement it before hard fork in the end of month?