I can't speak for other sites, but for example
https://www.moneypot.com/ is currently monitoring a few million addresses. The approach is rather simple. In my case, I pre-generated all the addresses (using bip32, but it's irrelevant). Next, I store all those addresses in a hashmap, which effectively allow constant time lookup by address.
Obviously this approach only works if you know all your addresses in advance. If you had to import a new, unfamiliar address you still have the same problem
Cheers, Paul.