It's a GTX 670. has 1344 cuda cores. 1344x32 works at -L 512. No matter what I specify for -L, doubling doesn't work. Out of memory every time.
but can't you just ignore the out of memory? it's for scrypt related buffers mostly, which keccak doesn't need.
maybe -L 1024 -m 1 does the trick?