The network has just started and the size of the folder with blocks is already 2.64 GB.
This is a lot.
Will there be any work to optimize the size of stored blocks?
Hi, I am from the Kaspa research team.
All and all we store three components: full header data above the pruning block, the UTXO set of the pruning block, and a proof of correctness for the UTXO set.
We have a fancy pruning mechanism (cf.
https://research.kas.pa/t/some-of-the-intuition-behind-the-design-of-the-invalidation-rules-for-pruning/95) that allows us to remove old block data. At full capacity the size of a block payload is bound by 100kB and the size of a block header is bound by (100 +32*log_2(past size))B. In the distant future where we have a trillion blocks in the network (this will take about 30 thousand years of one block per second) we will have that log_2(past size) = 40, so let us assume that log_2(past size) <= 40. This means that the header size is bound by (100+32*40)B which is just below 1.5kB. For simplicity assume for now that he entire block size is 100kB. We store three days worth of full block data which, at a rate of one block/second, accumulates to about 26GB (note that this bound assumes that all blocks are at maximum capacity, no assumptions on average number of txns per block).
The UTXO correctness proof (cf.
https://github.com/kaspanet/research/issues/3) requires that we keep additional log_2(number of blocks in the network) headers (not full blocks). Using again the assumption log_2(past size) <= 40 this adds about 60kB of data, which is completely negligible. Currently we store all block headers, as it requires some care to remove them without accidentally removing headers required for the proof and our dev team hasn't got around to this yet, this is a completely technical issue which will be resolved in the near future. (There is another detail I swept under the rug, which is that we also have to store the headers of all pruning blocks. This means one header per day. While this technically grows at a rate O(n*logn) the constant is ridiculously small: it is bound by 1.5kB/day, which are about 570kB a year).
The only thing that grows linearly is the pruning block UTXO set itself. It currently requires a field of a fixed size for every unspent output in the network. It is hard to predict how fast this set grows as this heavily depends on user behavior. We will resolve this in the future by means of cryptographic accumulators (cf.
https://en.wikipedia.org/wiki/Accumulator_(cryptography)). An accumulator is a way to represent a large set succinctly such that it is impossible to recover the set itself (due to information theoretic compression bounds), but it is possible to verify that an element is in the set given a proof. This means that every user will only need to store the (proofs of) their own unspent outputs, and the nodes will only have to verify this proof against the accumulator, which is much smaller than the actual number of unspent outputs. The sizes of the accumulator and the proofs depends on the exact solution we will choose.