What I am currently working on is a modified version of phoenix which runs multiple kernels with a single instance and a single work queue (to decrease excessive getwork).
I am also working on plugin support for it, so you can use various added features (such as built-in gui, Web interface, logger, autotune, variable aggression for when computer is idle, overclocking support, etc...)
This would make it tremendously easier for anyone to add features and you can still use whichever kernel works best for you.
As for cgminer support, I haven't tried it, are there any benefits over phoenix? I may fork that instead of phoenix and make the plugin support via command-line, lua or javascript, although I find that python is much easier to code than c (especially for cross platform support).
Would definitely be interested in a cgminer fork. Don't get me wrong, phoenix is great and has always given me the best performance overall but it does lack some of the more refined features, which the other poster listed above. Failover and nice static but updated command line "UI". Seems like you and diapolo are hitting the ceiling with phoenix anyway.
I will release a version that will work with cgminer early next week (looks like he has already implemented diapolo's old version).
We are hitting a ceiling with opencl in general (and perhaps with the current hardware). In one of the mining threads, vector76 and I were discussing the theoretical limit on hashing speeds... and unless there is a way to make the Maj() operation take 1 instruction, we are within about a percent of the theoretical limit on minimum number of instructions in the kernel unless we are missing something.
Now that doesn't mean that there is NO room for improvement, just that any other improvement will probably have to be faster hardware, a more efficient implementation of openCL by AMD or figuring out a better way to finagle the current openCL implementation to reduce the implementation overhead. But, unless there is a problem with pyopenCL, c and python should give equivalent speeds as long as they are just calling the openCL interface (the actual miner uses negligible resources). I suppose it could be possible to access the hardware drivers directly and run the kernel that way... but I don't see that as being feasible.
But, with all of that said, I have looked through some of his code, and it some really clean code. Part of the reason I want to add these features is to learn more python (this is the first thing I have programmed in python), but it probably will just be easier modifying the cgminer code. Thanks for pointing out cgminer to me
