This make no sense to me. When all your memory banks are already busy switching rows on every
(random) memory access, then every additional PoW instance you run will just slow things down.
The bolded statement is not correct in any case. Threads are cheap on the GPU. It is memory bandwidth that is the bound. Adding more instances and/or more per instance parallelism (if the PoW proving function exhibits per instance parallelism) are both valid means to increase throughput until the memory bandwidth bound limit is reached. Adding instances doesn't slow down the performance of each instance unless the memory bandwidth bound has been reached (regardless of whether the memory spaces of separate instances are interleaved or not).