Blockchains are not decentralized data storage

People are used to saying and thinking that blockchains provide immutable data storage. Then many times they add a caveat that says blockchains are very expensive, so we can’t really store too much data on them, but we can still store some data if we really want and are ok with paying for it.

But the fact is that blockchains cannot ever be used to store anything. The purpose of blockchains is to keep track of some state that everybody must agree upon at all times, and arbitrary data that anyone may have wanted to backup there is not relevant to anyone else1 and thus there are no incentives for anyone else to keep track of that. In other words: if you backup your personal pictures as OP_RETURN outputs on Bitcoin, people may delete that and your backup will be void2.

Another thing blockchains supposedly do is to “broadcast” something. For example, nodes may delete the OP_RETURN outputs, but at least they have to verify these first, and spread they over the network, so you can broadcast your data and be sure everybody will get it. About this we can say two things: 1, if this happens, it’s not a property of blockchains, but of the Bitcoin transaction sharing network that operates outside of the blockchain. 2, if you try to use that network for purposes that are irrelevant for the functioning of the Bitcoin protocol there is no incentive for other nodes to cooperate and they may ignore you.

The above points may sound weird and you may be prompted to answer: but you can do all that today and there is no actual mechanism to stop anyone from broadcasting irrelevant crap!, and that is true. My point here is only that if you’re thinking about blockchains as being this data-broadcast-storage mechanism you’re thinking about them wrong, that is not an essential part of any blockchain. In other words: the incentives are not aligned for blockchains to be used like that (unless you come up with a scheme that makes data from everyone else to be relevant to everybody), in the long term such things are not expected to work and insisting on doing them will result in either your application or protocol that stores data on the blockchain to crash or in the death of the given blockchain (I hope Bitcoin haters don’t read this).

(This is a counterpoint to myself on idea: Rumple, which was a protocol idea that relied on a blockchain storing irrelevant data.)

  1. For example, all Bitcoin transactions are relevant to all Bitcoin users because as a user the total supply and the ausence of double-spends are relevant, and also the fact that any of these transactions may end up being ancestors of transactions that you might receive in the future.

  2. Of course you can still backup your pictures as invalid P2PKH outputs or something like that, then it will be harder for people to spot your data as irrelevant, but this is not a feature, it’s a bug of Bitcoin that enables someone to spam other nodes in a way they can’t detect it. If people started doing this a lot it would break Bitcoin.