Interesting List you have. I would have given merits If I had some.
I don't think that they can be effectively searched in parallel. You have to divide each pubkey and check it with the babysteps. So not only do you need to make very expensive global memory lookup (GPU has slow global and super fast local memory) and load each key.
So if you would search multiple keys you would effectively reduce the performance by them. Like 10 keys in parallel means 10 times slower. 2800 keys means 2800 times slower.