Hurrah! over 30% jump from ~31.5 Mkeys to ~41.6 Mkeys on a 7700k with 1080Ti
If your system is more or less like this:
https://bitcointalk.org/index.php?topic=1573035.msg18373053#msg18373053or like this
https://bitcointalk.org/index.php?topic=1573035.msg18446472#msg18446472then we are now faster than oclvanitygen on fast GPUs too. Remember that 41.6 Mkeys/s means 41.6 M compressed keys + 41.6 M uncompressed keys per second, over 83 Maddresses/s!. Could you test oclvanitygen on your machine?
Gpu usage from ~83% now to ~98%
I'm afraid that almost all systems are now GPU limited.
Only multi GPU systems can take advantage from other ecc improvements (if there will be) and from n-k symmetry. I think that a GPU version of ECC library would be not very useful at the moment. We need above all to speedup sha256/ripemd160 on GPU.