Clearly the first cluster of 2 through to P (less I) appears evenly distributed (will run a Grubb's test to confirm). Can anyone explain why characters(1, I, Q, R, S, T, U, V, W, X, Y, Z) are so less likely to appear in the first position? Thanks
The explanation is straightforward. A bitcoin address is a base-58 encoding of a 200-bit number, except that before base-58 encoding takes place, leading groups of 8 bits with the value of 0 are encoded as '1'. Since the first 8 bits of a legacy address are always 0, most of the time the address is a '1' followed by the base-58 encoding of a 24-byte random number.
Now, 256
24 = 58
32.7758, so an address is typically a '1' followed by 32.7757 base-58 digits. That means that the first digit after the '1' is a digit in the range of 0 - 23 (58
0.7757).
Remember, that leading '1's are special case. The first digit is always '1' because the first byte is set to 0, but the second byte is random, so it will be 0 only 1/256 of the time, leading to rare cases where an address will start with '11'.