The short answer is I'm not sure, the long answer is that there exists a company called Chainalysis that does these kind of address clustering activities. I think they also know the addresses belonging to exchanges. So if anybody that uses Chainalysis service, which is limited to governments and large institutions, gets some transactions linking one of your wallets with a known exchange wallet, and manages to see your documents, then they can store your personal information along with your wallet addresses in whatever database they are using.
It doesn't need to be as in depth as that. There are hundreds of potential ways to link an address to a real life identity.
Looking at things like common inputs, change addresses, address reuse, exact amount transactions, and so forth, allows you to build up a picture of which addresses are linked to which other addresses. If I know you own address A, I can use these methods to ascertain that you also own addresses B, C, and D, for example.
Once I know a bunch of addresses you own, it only takes one single mistake to link one of those addresses to your real identity. Perhaps one of those addresses is a web wallet, which will know your email address and IP address. Perhaps you sent one of those addresses to someone in an email, or posted it on a forum, or sent it on twitter, or any other medium where your account is linked to your email or other personal data. Perhaps one of those addresses received coins from an exchange or service you completed KYC on. Perhaps you paid in person for a good or service from one of those addresses, or on a site which has your email and IP addresses. Perhaps your wallet is using a server I control, and I can see your IP address that way.
Page here for more info:
https://en.bitcoin.it/wiki/Privacy