I did some benchmarking and tweaking of the ccminer code and was able to squeeze out a 5% hashrate increase mining HVC out of my 750 Ti rig (+800 khash/s per card). Not a big advancement but I think it's pretty good for a guy whose name is not Christian

I had initially removed support for all Compute versions except CC5.0, but I was able to get CC2.0+ compiled. Alas, I have no way to test whether this fork will work with CC3.5 and below, or on Windows for that matter. Therefore I can make no guarantee of your success if your rig uses either or both.
If you're brave, you can
checkout my fork or
view the full diff of changes..
Summary of changes:
* Compiled with CUDA 6 RC
* Made modest changes to hefty1 kernel. Honestly not sure these even made a difference; the original code from the C+C hash factory was already damn near perfect

* Changed code compilation
* relocatable device code support
* explicit linking via nvlink
* Removed maxrregcount to let compiler choose register count
My testbed specs:
OS | Ubuntu 13.10 x64, 3.13.6 kernel, NVIDIA 334.21 driver |
CPU | Intel Pentium G3220 @ 3.00GHz (2 core) |
Motherboard | MSI Z87-GD65 |
RAM | 4GB DDR3 PC1333 |
GPUs | (6) PNY 750 Ti OC (stock, no mods, all 1x risers) |
Risers | (6) 1x PCIe via USB 3.0 risers (slim) |
And some performance metrics:
| Before | After |
Hashrate/card | 13400 khash/s | 14200 khash/s |
GPU RAM usage | 186MiB | 200MiB |
GPU Temp (avg) | 55C | 56C |
edited: more specs