As of version 2.1, phatk now has command line option "VECTORS4" which can be used instead of "VECTORS".
This option works on 4 nonces per thread instead of 2 and may increase speed mainly if you do not underclock your memory, but feel free to try it out. Note that if you use this, you will more than likely have to decrease your WORKSIZE to 128 or 64.
I'm using a 6770 @ 1.01Ghz with phatk 2.2. When I run the memory clock at 300Mhz with the VECTORS option, I get 234.5Mhps. However, I can't seem to reap the benefits of VECTORS2 or VECTORS4 at a higher memory clock (i.e. 1.2Ghz). I've reduced the WORKSIZE from 256 to 128 and 64 and can only seem to peek at 213Mhps. With these options, I can only achieve between 204 and 213 Mhps.
I have found that VECTORS4 is extremely unreliable... even tiny changes in the kernel and other factors affect the hashrate tremendously... OpenCL gets really weird when you use a lot of registers. I added it in 2.1 because it was comparable to VECTORS in some situations, but changing the kernel slightly in 2.2 seems to have broken it (even though kernel analyer says it uses less registers and less ALU ops... *sigh*)
Anyone wondering about any new kernel improvements, I seem to be at a standstill... I have tried the following:
- Removing all control flow operations (about 1MH/s slower)
- Sending all kernel arguments in a buffer (about 1MH/s slower)
- Using an atomic counter for the output so that the output buffer is written sequentially (about the same speed and only works on ATI xxx cards and newer)
- Using an internal loop in the kernel to process multiple nonces (Either significantly slower or massive desktop lag)
- Calling set_arg only once per getwork instead of once per kernel call (only faster when using very low aggression and FASTLOOP, I will add this to my next kernel release)
-Phateus