Get the public lists of public keys that exist for most of the physicals, and search the internet, but this forum specifically, for mentions of them
I still haven't finished
downloading all posts, but if it helps, I can share what I have. It's currently around 60 GB and it takes about half a year to collect all data from the forum (with 1 request per forum).
When all the existing blocks will be processed, we'll have the state of all addresses "in-use" and their "balances" in BTC's blockchain.
We will process new blocks on the fly to always have an up-to-date state of addresses.
Have you considered what's going to happen to your database if you process a new block that orphans afterwards?