For the "bad" performing card(s), have you tried using smaller thread concurrencies and two threads? That is, --thread-concurrency=8192, -g2, -I13?
I have two Gigabyte 7970 GV-R797OC-3GD cards and can't get the high-concurrency, single-thread, settings to work. With every version of cgminer that I've used in the 2.11.x releases, the above settings give ~750 khps.
I know this is not exactly what you were looking for, but from what I've seen, not many people are trying similar settings. I'm curious if it works for anyone else.