What I am currently working on is a modified version of phoenix which runs multiple kernels with a single instance and a single work queue (to decrease excessive getwork).
I am also working on plugin support for it, so you can use various added features (such as built-in gui, Web interface, logger, autotune, variable aggression for when computer is idle, overclocking support, etc...)
This would make it tremendously easier for anyone to add features and you can still use whichever kernel works best for you.
As for cgminer support, I haven't tried it, are there any benefits over phoenix? I may fork that instead of phoenix and make the plugin support via command-line, lua or javascript, although I find that python is much easier to code than c (especially for cross platform support).
In most cases you won't see much if any decrease in the number of getwork requests by running multiple kernels behind the same work queue. The reason for having a work queue in the first place is so that the miner only needs to ask for more work when the queue falls below a certain size. During normal operation Phoenix won't request more work than absolutely necessary. There might be a small benefit to doing this when the block changes, but aside from that the getwork count for a single instance running 2 kernels compared to 2 instances will be very close.
That said, I am interested to see the results of the other changes you mentioned. Feel free to PM me if you have any questions.