I read it, and I read it again. I can't see why anyone would want this. When syncing Bitcoin Core, you sync everything. When using an API, I assume you're looking for specific parts, and the parts you're looking for shouldn't be altered. So I see absolutely no use for this.
I'm only going to filter things that are widely accepted to be illegal. (like CP, malware etc.)
Allow me to turn this around: so you're building a database of CP, malware and more, basically highlighting the bad parts inside the blockchain. That sounds much worse than having them lost in large amounts of data.