Post
Topic
Board Project Development
Merits 15 from 6 users
Re: LoyceV's small Linux commands for handling big data
by
LoyceV
on 19/04/2022, 11:08:31 UTC
⭐ Merited by NeuroticFish (4) ,Welsh (4) ,BlackHatCoiner (3) ,ETFbitcoin (2) ,baro77 (1) ,DdmrDdmr (1)
Get pubkeys out of Bitcoin block data.

Note: this list is not meant for verbatim copy/pasting, it's my own notes of what I did (more or less).

Get outputs data (currently 148 GB)
Code:
wget -r blockdata.loyce.club/outputs/
Bitcoin_addresses_LATEST.txt.gz

Get currently funded Bitcoin addresses and their balance (1 GB)
Code:
wget http://addresses.loyce.club/blockchair_bitcoin_addresses_and_balance_LATEST.tsv.gz

Get all addresses with pubkey
Code:
for day in `ls outputs/*gz`; do echo $day; gunzip -c | grep -v is_from_coinbase | grep -v pubkeyhash | grep pubkey | cut -f 1,2,4-11 >> output.txt; done
(output.txt includes more columns than strictly needed)

Get all unique funded addresses from the above list
Code:
comm -12 <(cat output.txt | cut -f6 | sort -u -S40%) <(gunzip -c blockchair_bitcoin_addresses_and_balance_LATEST.tsv.gz | grep -v balance | cut -f1 | grep "^1" | sort -S40% ) > list

Get list of balances, addresses and pubkeys
Code:
gunzip -c blockchair_bitcoin_addresses_and_balance_LATEST.tsv.gz | grep "^1" > LATEST.tsv
cat LATEST.tsv list | sort -rS60% | uniq -w30 -d > address_and_balance
cat <(cat output.txt | cut -f 6,8) list | sort -rS40% | uniq -w30 -d > address_and_pubkey
for addy in `cat list`; do balance="`grep $addy address_and_balance | cut -f2`"; pubkey="`grep $addy address_and_pubkey | cut -f2`"; echo "$balance $addy $pubkey"; done | sort -nr > balance_addy_pubkey.txt
(the last for-loop is quick and dirty, slow and inefficient, but considering there's not that much data, I didn't bother improving it)