I will try with 2 GPU to see if something is wrong there...
There is no limitation to find the key of an address, only the time needed

I tried all I can think of here is what I found, maybe that helps:
removing -t 0 and or -o ... and or -b .. then it runs for exactly 3 sec. with 133 addresses
using different grid size down to 64,128 made it run a few seconds longer .. like 5 sec. or so..
using only gpu 0 then it runs for 30-60 sec. BUT not when using gpu 1... I switched them (pic-e slot) but the same result.. same error..
and still not loading addresses everything works fine...