Hi Christian. I've tried your new repo under windows. Just needed to #undef HAVE_ALLOCA_H and HAVE_SYSLOG_H in cpuminer-config.h to get it to compile under vc2010 and then changed max_warps_per_block() to return 16 in fermi_kernel.h to make use of the extra WARPS_PER_BLOCK.
There is some increase in reported speed. Before, i was using 28x8 with a GTX560Ti 448 core and peaking around 210khash/s. Now using F14x16 im getting up to 218khash/s.
Not much more but still more. Great work, thanks.