I can't speak for other sites, but for example
https://www.moneypot.com/ is currently monitoring a few million addresses. The approach is rather simple. In my case, I pre-generated all the addresses (using bip32, but it's irrelevant). Next, I store all those addresses in a hashmap, which effectively allow constant time lookup by address.
Next, I listen on the bitcoin network and the blockchain -- when ever I see a new transaction -- I look at all the outputs and see if it corresponds to one of the them in the hashmap to see if it's in there. If it is, it records the details in a database that has an index on the address column. Actually, using an approach like this would easily allow me to monitor *every* address in the blockchain. Really, the only slow part of my scheme was just generating the initial address set