Chance are negligible. If collision occurs with a funded address, attacker you can transfer funds elsewhere.
Chances are still negligible when 1 billion people are using it? also can't I just run some kind of bots, that randomly generate addresses to see if
they have funds in them?
Yes, chances remain negligible. You could run your bot, but it'd be a waste of electricity. Chances are you'd wait the lifetime of the universe before finding a collision.
But probability also says, I could have a success on my first run? isn't it?
Sure, you could absolutely find a success on your first run, but let's apply probability to your scenario.
Let's say there are a billion people using 10 addresses each for 10 billion total addresses.
This means that each address you generate has a (1/2^160)*10,000,000,000 possibility of holding a balance, giving your first attempt a 0.0000000000000000000000000000000000000684% chance of finding a collision on your first attempt.
You are correct in stating that with each try it will either happen or it won't, there is no in-between state, and you're correct in stating that it's possible. It's also bad news for the account holder that a collision would give you control of those funds.
Comparatively speaking, your odds of being struck by lightning in a given calendar year are about 1 in 280,000. The odds of winning my local lottery are about 1 in 176,000,000. So finding a collision on your first try is roughly equivalent to being hit by lightning 16,540,000,000,000,000,000,000,000 times per second for an entire year or winning the lottery 830,000,000,000,000,000,000,000,000,000 times.If you find a collision I would stay indoors and play the lottery.
The highlighted part is wrong. The odds of getting struck by lightning n times is not p*n, it's p^n. Ie. the probability of throwing two sixes in a row with a dice is not 1/12, it's 1/36.
What I'd like to know is why some letters have higher probability than others? Is that because some letters are more likely on the elliptic curve vs not? If you plug sequences into vanitygen it will give different difficulties depending on letter, eg.
11... > 1A...
1Q... > 1D...
1R... > 1Q... and so on.
I made a table of the first three chars after 1 to help me in a project with fast lookup of difficulty calculation. There are broad classes of probabilities depending on what letters are involved. I guess there is some non 1-1 mapping from 2^160 possible addresses and base 58 representations.
It's not related to elliptic curves, as an address is created from a hash, which is basically just random data.
The reason is that the data that is encoded as base58 (and thus becomes an address) is ||<20 byte RIPEMD160 public key hash>||<4 byte checksum> where || denotes concatenation.
This version byte is 0 on the main network. So the binary data of an address will always start with 0000 0000. Since this data is converted into a large number, the probability that some of the leading zero bits will be included in the leftmost bits of the large number is greater than none of the leading zero bits being included (since there are eight of them). This results in the probability of the number starting with a small digit being greater.
This large number is basically just continually divided by 58, and the *remainder* of this division is then used to determine which character is added to the address string (based on it's index position in the string "123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz"). This is repeated in a loop until the large number has been reduced to zero. After this *the string is reversed*. This means that the last remainder will be the first character in the address (after the '1' which will always be there because a '1' is prepended to the address for every leading zero-byte, of which there is always one because the "version byte" is always zero on the main network). So since there is a greater probability that the first part of the number is a small digit, there is a greater chance that the first character of the address will be one of the first characters of the string from which characters are selected based on their index position in that string.