Allow me to turn this around: so you're building a database of CP, malware and more, basically highlighting the bad parts inside the blockchain. That sounds much worse than having them lost in large amounts of data.
I can definitely see this causing a Streisand effect if this goes wrong, yeah. Right now we at least do not know where in the blockchain this data is being stored, if at all.
But a counterargument is that I fear that CEXes and large institutional companies might create their own solution involving a whitelist of transactions and blocks, and apply the same KYC scrutiny that we are seeing now with addresses.
To put it simply, they might invent "taint" for raw transactions and blocks, and then make it really hard to use their services unless you have coins from "clean transactions/blocks".
If this happens it will make the whole of Bitcoin even less pseudonymous.
So this work is to hopefully create a blueprint for avoiding that fate.
Personally i don't see the point. Even if you use RPC call such as decoderawtransaction, the arbitrary data isn't human readable/viewable. It still requires additional effort from developer to decode and show it properly.
Which would make the developer liable, but there aren't strong legal protections for node runners yet.
I would say its index/long list to content that deemed illegal.
It's going to be a hash table of SHA256 checksums, exactly which part of the transactions or blocks are going to be hashed I haven't figured it out yet.
I think the main risk here is not just altering data but the precedent it sets. Once you create an API that edits blockchain responses, you’ve basically introduced a middle layer of truth that people have to trust. Bitcoin’s whole foundation is that the data is verifiable and final, no one edits it for you.
If you're creating a service to sanitize for legal reasons, that's fair but it should be clear to the end user that they’re interacting with a filtered view. That way, you keep the trust model intact without creating confusion or accidental reliance on a modified dataset.
I specifically said that it's not suitable for use for verifying blockchain data i.e. nodes.
The API specifications will be exactly the same as Bitcoin Core JSON-RPC.
I don't think there is a global consensus on what data is right or wrong, and I think Bitcoin's database should NOT be censored.
I am not censoring blockchain data, I am merely filtering it. This project depends on unfiltered nodes so there will always be many of these kinds of regular nodes running.