Just did a test:
Rig setup:
Linuxcoin v0.2b (Linux version 2.6.38-2-amd64)
Dual HD5970 (4 GPU cores in the rig)
Mem clock @ 300Mhz
Core clock @ 800Mhz
VCore @ 1.125v
AMD SDK 2.5
Phoenix r100
Phatk v2.2
-v -k phatk BFI_INT VECTORS WORKSIZE=256 AGGRESSION=11 FASTLOOP=false
Result:
Overall Rig rate: 1484 MH/s
Rate per core: 371 MH/s
This is ~4MH/s faster than Diapolo's latest.
On 5970, phatk 2.2 is current king of the hill.
For the world to be perfect, this kernel needs to be integrated into cgminer

The last kernel releases show, that it is a bit of trial and error to find THE perfect kernel for a specific setup. Phaetus and I try to use the KernelAnalyzer and our Setups as a first measurement, if a new Kernel got "faster". But there are many different factors that come into play like OS, driver, SDK, miner-software and so on.
I would suggest that we should try to create a kernel which is based on the same kernel-parameters for phatk and phatk-Diapolo so that the users are free to chose which kernel is used. One thing is CGMINER kernel uses the switch VECTORS2, where Phoenix used only VECTORS (which I changed to VECTORS2 in my last kernel releases). It doesn't even matter to use the same variable names in the kernel (in fact they are different sometimes) as long as the main miner software passes the awaited values in a defined sequence to the kernel.
Dia
A good idea.
A further improvement: I'd like to have an option in my miner that spends ~2mn
benchmarking all the kernels available in the current directory (without talking to
a pool, i.e. doing pure SHA256 on bogus nonces), and picking the fastest for the
current rig.
For people with lots of different rigs/setups, that would save them the headache
of having to hand-tune each instance.
What I am currently working on is a modified version of phoenix which runs multiple kernels with a single instance and a single work queue (to decrease excessive getwork).
I am also working on plugin support for it, so you can use various added features (such as built-in gui, Web interface, logger, autotune, variable aggression for when computer is idle, overclocking support, etc...)
This would make it tremendously easier for anyone to add features and you can still use whichever kernel works best for you.
As for cgminer support, I haven't tried it, are there any benefits over phoenix? I may fork that instead of phoenix and make the plugin support via command-line, lua or javascript, although I find that python is much easier to code than c (especially for cross platform support).