Post
Topic
Board Project Development
Re: LoyceV's small Linux commands for handling big data
by
DeepComplex
on 19/04/2022, 18:03:08 UTC
Thank you very much for this...



Get pubkeys out of Bitcoin block data (this was requested here).

Note: this list is not meant for verbatim copy/pasting, it's my own notes of what I did (more or less).

Get outputs data (currently 148 GB)
Code:
wget -r blockdata.loyce.club/outputs/

Get all addresses with pubkey
Code:
for day in `ls outputs/*gz`; do echo $day; gunzip -c | grep -v is_from_coinbase | grep -v pubkeyhash | grep pubkey | cut -f 1,2,4-11 >> output.txt; done
(output.txt includes more columns than strictly needed)

Get currently funded Bitcoin addresses and their balance (1 GB)
Code:
wget addresses.loyce.club/blockchair_bitcoin_addresses_and_balance_LATEST.tsv.gz

Get all unique addresses that are in both lists
Code:
comm -12 <(cat output.txt | cut -f6 | sort -u -S40%) <(gunzip -c blockchair_bitcoin_addresses_and_balance_LATEST.tsv.gz | grep -v balance | cut -f1 | grep "^1" | sort -S40% ) > list

Get list of balances, addresses and pubkeys
Code:
gunzip -c blockchair_bitcoin_addresses_and_balance_LATEST.tsv.gz | grep "^1" | sort -rS60% | uniq -w30 -d > address_and_balance
cat <(cat output.txt | cut -f 6,8) list | sort -rS40% | uniq -w30 -d > address_and_pubkey
for addy in `cat list`; do balance="`grep $addy address_and_balance | cut -f2`"; pubkey="`grep $addy address_and_pubkey | cut -f2`"; echo "$balance $addy $pubkey"; done | sort -nr > balance_addy_pubkey.txt
(the last for-loop is quick and dirty, slow and inefficient, but considering there's not that much data, I didn't bother improving it)