Whoops sorry...one last update.
Forgot a change to floating point precision (fast) and enabled /mp (multi-processor compilation) compiles at least twice as fast. Can do with it what you want, just something to fiddle with.

Anyway seems a tiny / negligable hash increase, but compiles way faster...a lot faster.
http://d-h.st/AWYNice, will have to check it out. BTW, what version of CUDA toolkit do you have installed. (haven't looked at VS yet).