Revised swarm node proposal

Let me know what you think.

---BITCOIN SCALING---
-- 1 ABOUT ME--
Martin Clemens Bloch. CEO of BlochsTech and creator of an easy to use Bitcoin smart card hardware wallet.
(www. BlochsTech.com)

-- 2 GOAL--
We want to design a partial node that together with other nodes can do full validation of the chain.
By having many of such 'swarm' nodes the chain will remain secure and duplicated many times.

Because each single swarm node would only store and validate a part of the chain a CRITICAL scaling issue would be solved:
Keeping nodes diverse.

If there are many full nodes, but all of them are big companies it becomes easy for governments to target them.

If there are very few full nodes the network can easily fail.

With small swarm nodes the network could be supported entirely by a little spare capacity on users normal PCs.
Such a network might have a few full nodes, but would not require them and still the entire chain would be duplicated
and validated thousands of times.

Each time the amount of users increased the network would also gain more resources.

-- 3 WHY--
Because if the cost of running a validating node is trivial many many people will do it.

This is not a postulation; all torrent files are shared free of charge by people who are done downloading it
and have every incentive to stop sharing.

However because the cost of sharing a file a few times is trivial they - or enough of the users anyway - keep
sharing.

This is the ONLY true solution. The debates of 1mb vs. 8mb vs. XT are misleading and can never satisfy the demands for price and decentralization at the same time.
Segregated witness does absolutely nothing to solve the issue.

The lightning network is a beautiful idea, and while it has great merit it gives too much power to the payment channel operators. Larger payment channel operators can
pay greater fees than smaller ones which together with a 1mb limit means total centralization.

Bitcoin has only 3 real choices:
1. Do nothing and die/be replaced.
2. Increase the limit and accept centralization towards big miners and big node hosting companies.
This would still be much more decentralized than our central banks so I wouldn't see that as a total failure.
3. Split up validation work among multiple nodes - swarm nodes.

This example goes through how to do the 3 option assuming a standard PC with less than 1mb/s available and 1 million transactions per block - 1600 TX per second.

-- 4 TO BE VALIDATED--
1. Nothing can be withheld from a validated block.
2. Valid scripts and signatures.
3. Correct transaction fee total (coinbase).
4. No double spends.
5. Correct mining reward/mining difficulty (trivial will not discuss).
6. Block size.

-- 5 SOLUTION--
- 5.1 REQUIRED CHANGES-
TXs in blocks should be sorted by their TX hashes. Today the order is random and without any effect.

After sorting a TX hash starting with 000 may be first in the merkle tree leaf index and a hash
starting with FFF last.

This will be needed in order to easily request data from other swarm nodes. (See section 5.7)

Swarm nodes could be made without this change, but it makes data handling easier.

- 5.2 SWARM NODE CHUNKS-
Swarm nodes would break the Bitcoin transactions into chunks based on their index in the block.
For this example we will just say 512. (current block limit corresponds to about 3000 txs)

The block will thus be broken into some number of chunks with exactly 512 TXs each and one last chunk
of 1-512 transactions.

We will go through this example assuming our swarm node validates just one chunk.

Chunk sizes like this would mean we would have 1954 chunk ranges for 1 million TX per block - VISA
levels or 1600 TX/s.

A swarm node would only need 15.6 GB per year to keep up at these levels (see section 6.1).

At even higher volume bigger chunks would be better to reduce connection count, but at that market cap
it would not be a problem if running a swarm node required a single dedicated PC full time.

- 5.3 SWARM NODE SCRIPT RANGES-
In addition to chunks we will also partition data long claim script hashes.
All Bitcoin outputs are in the form of a script, the hashes of all those scripts are unique per script.

Swarm nodes would subscribe to certain ranges. So if hashes could be only 1 or 0 node A might subscribe
to the 0 hashes and node B subscribe to 1 hashes.

If node A sees a 1 hash in its chunk range 0-512 it will send it to B. If B sees a hash 0 in its chunk range
513 to 1024 it will send it to A.
B will give its signature to A that it has sent all 0 hashes - for example by A asking B "Does Tx534 and Tx567
contain all the 0 hashes outs in chunk range 513-1024 for block xyz?" and B signing that + "yes".

It can thus easily be proven later if B withheld data.
How this helps us will become clear later.

- 5.4 NO WITHHOLDING-
Our swarm node gets its chunk from the network as well as a list of the other chunks merkle hashes.

Using this information it can determine:
1. The transactions it received are in the block and that from that chunk nothing is missing.
Otherwise the merkle root would not match.

2. Again if an entire chunk is withheld the merkle root will not match.

In this way if we had 1 million transactions per block this would only mean 62.5 kilobytes of merkle hashes
to validate.

- 5.5 NO WITHOLDING - DISHONEST NODES-
Now while this takes care of our own chunk, the other nodes we connect to could cheat.
Other swarm nodes can do two things:
1. Simply sign an invalid chunk.
2. Withhold TX data that we subscribe to from their chunk.

To limit this risk we rank all the nodes we can find for each chunk range. Each node will have an identity such
as an ECDSA-AES public key over which all communication and chunk signing occurs.

We rank their public keys by the following and in this order:
1. We permanently blacklist* nodes that have signed script ranges that turn out to be invalid.
or that have withheld** TXs from us. Ranking of nodes that have recommended a blacklisted node the
past 2 years is reduced by 95%.
2. Proof of work burn done for that specific key. (top 10 we can connect to)
3. Latency. (top 5)
4. How long we have known that node and gotten messages from it. (top 5)
5. Whether other nodes we trust "recommend" that node. (top 5)
(ie. if someone ran a node on two separate devices they would of course recommend their own nodes)

25 peers for 10^6/512 = 1953 chunk ranges with 2000 bytes of ID data each would be a fixed 100 mb cost.

Before a block would be considered valid the node would require 15 of its 25 peers per chunk to sign off on their
chunk* and at least one peer per peer type group (ie. top history, top burn, top ping and top recommends).

* Blacklisting due to invalid script range would be based on a merkle branch proof showing that a TX is being
double spent and that a node signed its script range as valid.

** Blacklisting due to withholding would be based on a signature by the dishonest node that all information
was shared even though it was not. (Chunk range includes relevant TXs 1, 2 and 3 while dishonest node signed
that 1 and 3 were the only relevant)

- 5.6 BLACKLIST REPORTS-
To validate an error report you would only have to get the offending chunk, ether to check the merkle hashes,
to check for missing data or to check invalid scripts.

To validate a double spend would only require the merkle branch of the earlier spending transaction.

If an error report turns out to be false the false reporter is blacklisted.

Error reports are accepted by the top 95% nodes by hash burn/proof of work.

This prevents cheap DDOS spam while still allowing error reports to quickly propagate through the network
from so much as a single honest node.

- 5.7 VALIDATING SCRIPTS-
To validate scripts we need the output scripts that the containing TX references in its inputs section.

When the referenced outputs are not known to the node it will request those TXs from a node known to process that
chunk range.

Since TXs are sorted by their hashes in the block and since hence each chunk will have a highest hash we can
somewhat easily keep track of which chunk a hash is in.
We do this by maintaining a table of highest chunks per block number (3.3 GB/year).

Using the table and another node we request and get the unknown TX.

If the node refuses to give the data the block validation will simply be delayed. In extreme cases
blacklisting of the withholding node is a possibility.

Once we have all input TXs we can validate the scripts and either sign our script range for the block or reject the block.

- 5.8 VALIDATING BLOCK SIZE -
To validate block sizes nodes check the size of their chunks transactions and signs off on it ie. "chunk 45 1.7 kb signed A".
All the nodes can then quickly count the 1954 sizes add them and see if it is below the limit.

If a node sees different sizes for the same chunk the offending node can be blacklisted after getting the chunk and checking its
size.

- 5.9 VALIDATING TOTAL FEES -
Same mechanism is used as for size in section 5.8.

Total fees are then also compared with the coinbase transaction.

- 5.10 CHECKING FOR DOUBLE SPENDS -
Each node will store a table indexed by script hash containing a list of unspent transaction hashes and spending TX hashes per index.
Ie. a list of tuples ordered by an script hash index.

To save hard disk space nodes will not store the full information about transactions in their script range. Instead they store what is
in their chunk range and ask for all other TX information from other nodes.

Since each swarm node processes a small subset of the blockchain bandwidth is a cheaper resource than disk space and not a big worry.
(A 1 mb/s connection can fill 31 TB of disk in a single year - our swarm nodes use only 15.6GB)

-- 6 NUMBER CRUNCHING--
- 6.1 STORAGE REQUIREMENTS PER YEAR -
These numbers are for 1,000,000 transactions per block and with each swarm node processing 512 transactions
each.
Each node will store exactly 512 transactions from its chunk range and approximately 512 transactions
from its script range per block.

We do NOT store script range TXs, only chunk range. Storing script range data would cost almost 30 gb/yr. Instead we store only a table of TX hashes in our script range
and the TX hashes of their spending transactions.
When needed we fetch the script range from our peers which means extra bandwidth usage. This however is not a problem as our yearly bandwidth usage tracking a small subset
of the blockchain is quite low.

8.9 gb for chunk transactions (avg. 330 bytes/tx * 512 * 6 * 24 * 365).
3.4 gb for spending TX hashes ((32 tx hash + 32 * 3 avg. outputs spending TX hashes) * 512 * 52560 blocks/year script range spending TX hashes.)
(log2(10^6 TXs)*32 = 640 bytes extra per TX in merkle branch information)
3.2 gb for TX hash to block and chunk indexing. (1954 highest chunk hashes in sorted order * 32 bytes per hash * 52560 blocks/year)
~0.1 gb for blacklisting and node reputation information (+~1gb fixed one time cost).

Connection usage will likely be up to 10 times this amount (mostly polling for unknown outputs) so 156 gb per year
or 0.005 mb/s - plenty to spare there.

While the nodes will need many other nodes they need not be connected at all times. Nodes can be notified or
polled at need and then the connection closed again.

In other words a normal PC with a 100$ 1TB HD using negligible broadband could run a swarm node and help run
the Bitcoin network at 1600 TPS for 32(!) years and only use half the disk.

It would only take 1954 of such computers to equate one full node - so a small group of enthusiasts could run the
entire Bitcoin network at VISA levels even if all else failed.

- 6.1 ESTIMATED FULL COPIES AT 1600 TPS -
At such a market cap it would be fair to assume that only a few big companies and miners would be running normal full
nodes while a much larger assortment of smaller companies and individuals would be running swarm nodes.

If there are 2 million bitcoiners today and a resulting 6000 full nodes - that today require about the same as a
swarm node would at 1600 TPS - we can try to extrapolate.
At VISA levels one would expect something like 0,7 billion users and if the ratio holds 0.5-2 million swarm nodes.

At 1600 TPS this would equate 255-1023 full copies of the blockchain hosted in a very diverse, robust and decentralized
way.

-- 7 QUICK SUMMARY--
- 7.1 WHAT SWARM NODES SIGN/VALIDATE -
1. Script validity of script range and chunk range.
2. No withholding claims for chunks.
3. No double spends in script range (based on no withholding signatures by other nodes).
4. Chunk size and total fees.

- 7.2 SWARM NODES ARE DIVIDED BY -
1. Chunk ranges.
2. Script ranges.

- 7.3 HOW TO DO IT -
A first version could be done where all nodes trust each other. This would eliminate the need for blacklists and other code dealing with
trust of peers.

This could then be added later at leisure.