PCIe bandwidth usage dropped from ~100 MB/s to 500kB/s per GPU! This should really help those with PCIe ×1 risers. MAX_SOLS is now reduced from 2000 to 10 CPU usage should also now be close to zero. (Well except Nvidia because their OpenCL implementation implements busy waits, but I'll check in a workaround soon.)
I'm seeing a 7-9% speed improvement between 2 GPUs. One is in a 16x slot and the other on a 1x riser. For the card on the 1x riser the speed improvement is ~10%, and for the card in the 16x slot ~5%.