Re: Advice Requested on Full Node Build for Advanced Analytics

Quote from: DaveF on October 25, 2024, 01:18:17 PM

I would not use a Ryzen CPU for this if you want to me dealing with large datasets / databases and searches an EPYC is the better choice if you want to stick with AMD and if you want to go Intel use a good Xeon.

Same with RAM, if you are manipulating large data sets to analyze you start with the largest one that you may want to look at, at this point from your last post it's the utxos and double it so you would want about 262GB of RAM. You could probably get away with 256GB at that point it's still not ideal but you would be able to load everything into RAM instead of pulling from the drive and look at it there. If you are going to do it, do it right.

I spend a lot of time telling customers 'I told you so' when they try to do things with lower spec hardware for things and they complain it's slow. For a desktop PC having to wait a few extra seconds here and there for some things because you got an i3 instead of an i5 or i7 is one thing. Depending on what you are doing in terms of analyzing this becomes hours instead of minutes.

-Dave

Good points. (Un)fortunately this is just a hobby/interest for now, if I actually want to do real things with this I will definitely need to scale up.

It's definitely not the CPU (23% capacity) that is the bottleneck, or the RPC commands. It's the IO on disk (running close to 80% everytime a block gets to the insert part -- which surprises me with its write speeds around 7500MB/s)... I suspect the way I am batching everything for each block all at once is what is causing this speed issue (there can be quite a few utxos per block) combined with the indexing I'm using to ensure data integrity (maybe not necessary now that it's running really stable... I just didn't want partially processed blocks re-writing on restarts). I'm ingesting about 15,000-20,000 blocks a day currently... I may attempt to change this so I add sets of 1000 at a time, instead of inserting the entire block all at once after being read. But at this pace, it'll get done one way or another within a couple weeks. I'm up to block 481,000... and I'm just past 800GB for the database - but on average it's growing about 80GB per 12,000 blocks now (2017 things really picked up). I estimate, based on some assumptions running a test scripts on sections ahead of me, that this will end up being approximately 3.4TB in size when I'm done, so I'm about a third of the way there.

I may move some of the tables onto an external once I index the things I'm really interested in. Slow and steady, I'll get there eventually.