Re: List of all Bitcoin addresses ever used - currently available on temp location

Quote from: LoyceV on January 14, 2021, 06:18:06 PM

Quote from: JustHereReading on January 12, 2021, 09:25:32 PM

Really curious how that test works out. I do hope it does a little bit more than just merge the file and not sort them.

It merges all lines from both sorted files in sorted order. After several tests (on my old desktop with HDD), these are the relevant results:

Code:

Old process:
time cat <(gunzip -c addresses_sorted.txt.gz) daily_updates/*.txt | sort -uS80% | gzip > test1.txt.gz
real 90m2.883s

Faster new process:
time sort -mu <(gunzip -c addresses_sorted.txt.gz) <(sort -u daily_updates/*.txt) | gzip > test2.txt.gz
real 51m26.730s

The output is the same.
Interestingly, when I tell sort -m to use up to 40% of my RAM, it actually uses that (even though it doesn't need it), which slows it down by 7 minutes.
Most CPU time is spent compressing the new gzip file.

That's a significant improvement. You could give pigz a try, see: https://unix.stackexchange.com/a/88739/314660. I'm not sure what the drawbacks would be, I"ve never tried pigz myself.

Quote from: LoyceV on January 14, 2021, 06:18:06 PM

Quote

I think you wrote that you'd need about 256GB of RAM for that operation, right? Sorry... can't help you out there. However a bloomfilter might be nice to implement if you have a 'bit' of RAM (a lot less than 256GB).

That's going over my head, and probably far too complicated for something this simple.

Honestly, the bloomfilter was a silly suggestion. It will probably not be a big improvement (if any) compared to your current code.

Quote from: LoyceV on January 14, 2021, 06:18:06 PM

I use Blockchair's daily outputs to update this, not the daily list of addresses.
See: http://blockdata.loyce.club/alladdresses/daily_updates/ for old daily files.

Thanks! Hoping to do some experimenting soon (if I have the time...)