I took a look at the comparison between version 2.2 and version 2.1
could it because __constant uint ConstW[128] change that broke VECTORS4?
That change is inconsequential (I was trying some things that required the change but did not keep them).. the compiler doesn't use those values, so they code should be exactly the same doing it either way (you can try and replace the code with the old code if you want to check).
You keep saying that it is broken.. if it does not run, post the errors.
I have found that on my card, VECTORS4 is much slower in version 2.2 than 2.1, but this is not a bug... it seems to be because openCL does not like allocating that many registers... Version 2.1 uses around 99.7% of instruction slots with VECTORS4 and I have tried many many ways to make it faster and more reliable (in 2.1), but I have given up on it. It is still in the release because I don't see any point in taking it out... but getting 2.2 to run as fast as 2.1 with VECTORS4 is not going to happen. Also, the differences between 2.1 and 2.2 with VECTORS are very tiny anyway (less than .5%)...
Getting into more detail about it: If you look at the graph on the main page of the thread, you can see the graph of VECTORS4 in version 2.1... in version 2.2 for some reason, the spike (and corresponding valley) is located higher (somewhere around 500), this could mean that it would be just as fast if you had 1500 Mhz memory, but I have no idea why openCL reacts this way to changing the memory speed. There are way to many GPU architecture/GPU bios/PCIe bus/CPU-GPU transfer/driver/openCL implementation unknowns to try to predict this behavior.
-Phateus