How did this yield a 2.5 MB list with 100000 entries?
We gathered these addresses by assuming that if two addresses both appear as inputs in the same transaction, then they are from the same wallet.
So, I presume something like this:
Take the list of addresses you know belonged to MBC;
For each address in list:
For each outgoing transaction from that address:
If transaction has more than 1 input:
Add any input addresses to the list if they aren't already there
Seems quite reasonable