Re: Are there any benchmark about Bitcoin full node client resource usage?

Quote from: aliashraf on Today at 05:36:45 PM

Your claim about transaction packages" not being relevant here is totally false and laughable

There is no such thing as 'transaction packages' in blocks from the perspective of validation.

Packages are a concept used in selecting transactions to mine so that low-fee ancestor transactions will be selected in order to mine their high fee children. They don't exist elsewhere.

Quote

The verification of transactions runs in parallel with everything else. One thread loads transactions from the block into a queue of transactions that need to be validated, other threads pull transactions from the queue and validate them. When the main thread is done loading the queue, it too joins into the validation which has been in progress the whole time. There is nothing particularly fancy about this.

Absolutely false and misleading. The code does not "load" transactions to "queue of transactions" you are deliberately misrepresenting the code for some mysterious purpose that I don't understand.

Sure it does. That is exactly how it works. The validation loop iterates over each transaction and each input in each transaction, and for each one it loads it into a queue. Concurrently, background threads take work from the queue to validate it.

When the master thread is done loading work it also joins the processing until the queue is empty (if it isn't already empty by the time it gets there).

Quote

In the real world, the real bitcoin core client, does NOT validate transactions in parallel,

It does. Since you're non-technical I don't expect you to read the code to check for yourself, but you can simply run reindex on a node with -assumevalid set to false. You'll see that once it gets up to 2014 or so that it is using all the cores.

Quote

because in bitcoin there is a possibility for transactions to be chained in a single block, forming a transaction package, hence you can't simply "dispatch" txns of a block between threads waiting for them to join, it is why block verification was implemented single thread.

Transactions in a block are required by the consensus rules to be topologically ordered. That means that all the ancestors of a transaction come first. There is no concept of a 'package' in a block.

When the validation is iterating through the block to load the validation queues it saves the new outputs created by each (as of yet unvalidated) transaction, so that they're available when dispatching the work off for any future transactions that consume them. They don't have to be validated before other transactions can consume them, because if there is any invalidity anywhere the whole block will be invalid.

So you can have e.g. an invalid TxA whos outputs are spent by valid TxB whos outputs are spent by valid txC, and its perfectly fine that the validation accepts TxB and TxC before later detecting that TxA is invalid. A's invalidity will trigger the rejection of the block.

Extracting the outputs for other transactions to use does require a linear pass through the transactions but it's fairly inexpensive and doesn't require any validation. It is required in any case because the block serialization can only be decoded in-order too (because you can't tell where transaction 2 begins until you've parsed transaction 1-- the format doesn't have explicit lengths).

Similarly, checking if a UTXO has already been consumed is also inherently somewhat sequential (e.g. consider when the 5th and 50th txn both spend the same input), but these checks are cheap. Often attempts to make processes like that more parallel just slow them down because of the synchronization overheads. That's why it's not possible to give much in the way of useful parallelism advice without testing.

Quote

PR 2060 was an improvement, based on "deferring" the CPU intensive part of the task, i.e. script verification by queuing this part for future parallel processing.

It's not deferred, it's queued to different threads and run in parallel.

Quote

It was good but not a complete scaling solution because the main block validation process remaining single thread, occasionally waits for UTXO checks, so, we don't get linear improvement in terms of block processing times with installing more cpus/cores. Period.

You essentially never get linear improvement with more cores, as memory bandwidth and disk bandwidth don't scale with them and there are synchronization overheads. ... so that statement is essentially empty.

You've now moved the goal post entirely. In the original thread NotATether misadvised ETFbitcoin that to benchmark validation he only needed to use a single core because 'verification is single threaded'. I stepped in to point out that it's been parallel since 2012. Benchmarking just a single core isn't representative as a result.

Your reply-- basically doubling down and defending misinformation you spread in the past-- is essentially off-topic. If you want to argue that the parallelism isn't perfect, sure it's not-- it pretty much never is except for embarrassingly parallel tasks. And surely there are opportunities to improve it but they're not likely to be correctly identified from the armchair of someone who hasn't tried implementing them or benchmarking them. Your response is full of criticisms essentially copy and pasted from the benchmarks and authors comments about the limitations back in 2012, but the code in Bitcoin has continued to advance since 2012.

But absolutely none of that has anything to do with a person saying that its single threaded (it isn't) so you can just benchmark on a single core (you can't, at least not with representative results).

Ironically, if the parallelism actually were perfect then benchmarking on a single thread would again be a more reasonable thing to do (because you could just multiply up the performance).

Quote

When blocks contain only a few inputs this limits parallelism, but once blocks have a couple times more inputs than you have cores it achieves full parallelism.

Absolutely false and misleading. I have not checked, but I suppose in 2012 most blocks had ways more than "a couple times more inputs" than an average node's cpu cores.

In 2012 most blocks were created in 2010 or before and had few transactions at all. Today most blocks were created in 2015 or later and the nearly empty blocks early in the chain are a rounding error in the sync time, and so most blocks have a great many input and make good use of multiple cores.

Your interactions have been cited to me multiple times by actual experts as to why they don't post here entirely. It really makes it unenjoyable to post here to know that so predictably a simple correction of a factual error with a cite will generate a rabid gish-gallop substantially off-topic response defending some bit of misleading info some poster had previously provided.