Can you also scrape all the Bitcoin Address used here in forum and the user that uses it?
I actually can

I found this regexp on
Stackoverflow:
egrep --regexp="^[13][a-km-zA-HJ-NP-Z1-9]{25,34}$" filename
With some slight changes it stops matching parts of Eth-addresses:
egrep -w --regexp="[13][a-km-zA-HJ-NP-Z1-9]{25,34}" *
I could run this code on
53 million archived posts, but the main problem will be excluding quotes. That's annoying and slow to do, and if I don't exclude them, it will completely mess up the data. On the other hand, quotes may still contain information that was deleted by the user who posted it.
Even without quotes, users still post Bitcoin addresses that aren't theirs, for instance when providing evidence on a scammer.
I think it would be possible if and only if you scraped the following boards:
- Services
- Bounties
- Marketplace in general (both BTC and Alt)
- And Marketplaces of all local boards if applicable/available
With that, detection with evidences on a scam wouldn't be a problem to the matter. And yes, it would be hard especially if threads/posts were deleted. But it mustn't be a problem as long as a list can be made to simply be a reference of which user had used nor mentioned any addresses throughout his post history.
I think it would help labeling the users and alt accounts throughout the entire forum, and would make it easier to detect which accounts are linked to each other
A smart user would simply use different addresses. An even smarter user would use different wallets, so they don't create a blockchain trail when they make a payment.
As a quick test,
51 out of 9999 posts contain at least one Bitcoin address (starting with 1 or 3, ignoring Bech32).
For now I won't go continue this search. If I ever do, I'll move this discussion to
Reputation.
I'm looking forward to make it happen. Have I already mentioned my project on making an app (a BPIP ripoff) and such data would be helpful in it. I'm still on the planning stage to which should I go first and with many scraped data you've done, it would help me to make less scraping but rather make an API to just look up on your data.