Seems to me like you've got it all under control, so I'll leave you to finish up. Thanks for your involvement. However I don't want multiple phatk kernels so just replace the current one in-situ and don't bother enumming a different kernel. As for the output code, I prefer to use 4k so feel free to do it your way, but be aware I plan to change it back.
Ok, the source is up... I am trying to figure out how to compile this for windows without the cygwin layer (I really haven't done any of this before... I am soooo lost)...
https://github.com/Phateus/cgminerckolivas... if you want to merge this into your code at some point, let me know what I have to do... I literally installed git yesterday, and there is only so much you can learn on the internet in a day ;-)
As for the buffer, my kernel only uses WORKSIZE+1 parts of your buffer, but I left the buffer size intact.