Post
Topic
Board Project Development
Re: List of all Bitcoin addresses ever used
by
NotATether
on 21/08/2020, 09:59:52 UTC
If someone has enough RAM to experiment, I'd love to see the result of this (on the 31 GB file):

I suggest instead of the awk one-liner you look at gz-sort, it is a small linux program that sorts gzip-compressed files on disk while using a very small memory buffer, as low as 4 megabytes.

You sort the file using
Code:
gz-sort -u addresses.txt.gz addresses_sorted.txt.gz

The -u switch removes duplicate lines from the sorted output, and you can increase the buffer size to give it a larger buffer for transporting stuff, but this isn't necessary. I used -S 1G to give it a 1 gigabyte buffer and it took around 7 hours to complete so not much shorter than the advertised completion time, 9 or 10 hours. So this program will run well in your VM, the RAM factor isn't important.

You need to compile it yourself using make but it has minimal dependencies, only zlib and GNU headers.

I used it to find the smallest address in the dump using
Code:
zcat ~/addresses_sorted.txt.gz | head -n 55405 | uniq

This prints 1111111111111111111114oLvT2. This address was used 55405 times (!)

Here are some the other smallest addresses:

Code:
1111111111111111111114oLvT2
111111111111111111112BEH2ro
111111111111111111112xT3273
1111111111111111111141MmnWZ
111111111111111111114ysyUW1
1111111111111111111184AqYnc
11111111111111111111BZbvjr
11111111111111111111CJawggc
11111111111111111111HV1eYjP
11111111111111111111HeBAGj
11111111111111111111QekFQw
11111111111111111111UpYBrS
11111111111111111111g4hiWR
11111111111111111111jGyPM8
11111111111111111111o9FmEC
11111111111111111111ufYVpS
111111111111111111121xzjPWX1
111111111111111111128gzo7iT
11111111111111111112AmVxQeF
11111111111111111112Fr3DURyz
11111111111111111112GvNtZ1K
11111111111111111112VUYD4wA
1111111111111111111313xyAwW
111111111111111111137vGPgFbT
11111111111111111113aT9ZSLG
111111111111111111168xDACCG
11111111111111111116B8w87yU



Maybe you can also make a list of addresses sorted by balance, now that you have an efficient way to deduplicate them.