If someone has enough RAM to experiment, I'd love to see the result of this (on the 31 GB file):
I suggest instead of the awk one-liner you look at
gz-sort, it is a small linux program that sorts gzip-compressed files on disk while using a very small memory buffer, as low as 4 megabytes.
I checked, but it does what I'm doing already. The awk-command removes duplicate lines
without sorting the lines[/url]. I'd like to do it, but I can't run it.
This prints 1111111111111111111114oLvT2. This address was used 55405 times (!)
I'd be interested to see which real address is the shortest. The 111111111-addresses are all burn addresses. I'm not entirely sure what determines address length, but from what I've seen, shorter addresses are much harder to find. A while ago, I was looking for short addresses created from mini-private-keys, and they were quite rare.
To find a real short address, it needs to have sent funds too.
Maybe you can also make a list of addresses sorted by balance
See
List of all Bitcoin addresses with a balance.
Updating
addresses.txt.gz failed, and the original got overwritten. I'm currently uploading a backup, but it takes a while.