Your settings look pretty close to me. YAC favor cards with lots of memory, and relative to how many threads you have (2560), 4GB is actually not much i the way of memory.
Your ideal settings (no HW errors) will be one of these (based on 3600 buffer-size)
LG rI
6 5296
7 6227
8 7200
9 7680
10 8694
11 9804
12 10240
You may need to round down to the nearest multiplier of --worksize (Depending on your driver). Obviously, the more you can allocate with buffer size, the higher your rI can go
Thanks - I'm just so used to my R9 290s significantly outperforming my GTX 780s I assumed I must be missing something! I also got a bit excited when I got over 10k on my first attempt before noticing all the hardware errors...

What's the relationship between buffer size and raw intensity at the various lookup gaps? Besides LG 8 being double the buffer size of course, that one I can manage myself!