I'm only going to filter things that are widely accepted to be illegal. (like CP, malware etc.)
Allow me to turn this around: so you're building a database of CP, malware and more, basically highlighting the bad parts inside the blockchain. That sounds much worse than having them lost in large amounts of data.
Very interesting, I don't know why but there is a similar post up regarding OP_RETURN about filtering similar data before block propagation (
https://bitcointalk.org/index.php?topic=5559355.msg65815592#msg65815592)
Same conclusion there, building a database of "bad" data is just impossible imo. Most social media platforms or data centers have a similar issues with moderating user generated content and I don't think any of those giants have it figured out.
I don't think there is a global consensus on what data is right or wrong, and I think Bitcoin's database should NOT be censored.