Post
Topic
Board Bitcoin Technical Support
Re: Advice Requested on Full Node Build for Advanced Analytics
by
JuxtaposeLife
on 31/10/2024, 08:09:40 UTC
It's the writing on disk (running close to 80% everytime a block gets to the insert part -- which surprises me with write speeds around 7500MB/s)... I suspect the way I am batching everything for each block all at once is what is causing this speed issue (there can be quite a few utxos per block)
Shouldn't disk writes be handled by file system cache? To compare, even on an old HDD, writing 10 million small files (yes I do crazy things like that) is almost as fast as sustained writes. Reading them is very slow, because then the disk head needs to search for each file. Writing is fast, straight from file system cache onto the disk.
I'm not sure how this would work with a database, but if writing 3500 transactions takes 4 seconds, that seems slow to me.

You're right. I was focused on the inserts, but it must be the update to the UTXOs entries that is causing this slowdown. While processing each new blocks I'm looking for utxo's that have been spent and updating that in the database once they are, along with the reference to the transaction that caused them to become spent. I think I can store these pairing in a file as I go, defer all updates until the end, with a batch update that will be much faster than updating each UTXO one-by-one during ingestion. I'll have to think about that. My problem now is I'm been running for almost a week, and I don't want to miss something, leaving me in a state where I would have to start over haha

I think I'm getting closer to solving this... you've been right, it shouldn't take this long. This would also explain why it's slowly getting slower. The update search across a growing utxo table is getting linearly longer and longer... I'm up to 700m utxos

Shouldn't disk writes be handled by file system cache? To compare, even on an old HDD, writing 10 million small files (yes I do crazy things like that) is almost as fast as sustained writes. Reading them is very slow, because then the disk head needs to search for each file. Writing is fast, straight from file system cache onto the disk.
I'm not sure how this would work with a database, but if writing 3500 transactions takes 4 seconds, that seems slow to me.

Completely depends on the filesystem.

Most of them like ext3, ext4 and such use journalling, so when you batch all that data to write into the disk, it actually goes inside the journal first.

Usually the default settings of the journal is to write deltas of the changed bytes on to the disk. This is more reliable than just doing a write-back to the disk, but it's slightly slower.

You can actually change the filesystem settings to more aggressively utilize the disk cache, but it will only take effect on the next reboot.

Interesting. I'll look into this.

====

Thanks again for all the thoughts and ideas. This has been extremely helpful!