Is there some mechanism for preventing a single node storing a single copy of the data, and then spoofing multiple identities and claiming payment as-if the data were stored across multiple nodes?
No and yes, that's kind of the point though. If miners are honest and pick only the N closest nodes (using the Kademlia distance metric), then there's high chance that the data will be distributed amongst at least >1 nodes. The more requested nodes in the storereq-tx the greater the chance the data is actually distributed. Metrics that show how dense the identities around your data are can give you an indicator of how likely it is that multiple nodes store the data instead of just a few. The more nodes you're willing to pay to store your data, the more secure your data is.
The obvious solutions are proof-of-work or proof-of-stake, but I can't see how proof-of-work alone would suffice.
If the expected cost of performing the proof-of-work is greater than the reward for storing the block, then no one stores it. Otherwise, what's to stop an attacker from generating multiple possession-txs in order to claim all the available payments for a given storereq-tx?
I imagine they would try, but like I said earlier, payments
should go to only the closest nodes to data. As the network grows, there will be too many identities that trying to get closer to a given data block wouldn't be worth it when you can just store some other data that you are actually close to. Managing a lot of identities and data blocks quickly becomes an exponential problem.
Proof-of-stake would at least provide some capital reserve to allocate to each identity, then you could add a field for minimum proof-of-stake amount into the storereq-tx, and/or you could use it as an ordering criteria instead of the Kademlia distance.
I will look into it, and TBH I haven't considered proof of stake for this. Sounds intriguing.