Re: Probabilistic search of prefixes vs random+sequential

Quote from: DannyHamilton on Today at 08:21:33 PM

Quote from: mcdouglasx on April 21, 2025, 05:13:49 PM

- snip -
it first checks if the hash of any element begins with the same first three characters as the hash we're searching for.
- snip -

While I see a few things about this technique that are ignored and not included in the "findings", and I believe significantly better search algorithms already exist, I'm most curious why you chose THREE, specifically, as the number of "characters" to check? Is there something about THREE that makes it better than TWO or FOUR (or any other number)? Have you tried your simulations with other prefix sizes to find the optimal prefix size?

The probabilistic formula for the distribution of prefixes states that a prefix of length N has a probability of 1/16**N. Therefore, if you divide the range to be explored into blocks of size 16**N (or close), once you find a prefix, the probability of finding another within the same block will be very low/rare.

Consequently, you can stop scanning and save the remaining range. (If you use it for puzzle searches, you can explore it in a second pass if necessary).

Example: If you use a prefix "aabbccde" you should use blocks close to 16**8 or 4,294,967,296.

If you find the prefix at 2,000,000,000, it is very unlikely (though not impossible) to find another in the rest of the block.

Statistically, leaving that segment for a second pass if necessary, is the most convenient approach, saving computational power that would be used for searching the remaining block, where it would be very rare to find the prefix again.

Quote

Also, it seems very inefficient to take a hash (which is a large binary number), and first convert it to a string representation of its value as represented in hexadecimal, and then use string manipulation to compare "characters"? Wouldn't it be much faster and simpler to perform a bitwise XOR between some predetermined prefix of bits of the hash computed from the list element and the same prefix of bits on the target hash.

And, if you ARE going to use XOR, is there a benefit to taking the extra steps to generate a bitwise prefix since XOR is so incredibly fast as a comparison method? In the time you spend separating the prefix from the initial hash, the XOR between the full hash values would likely already have completed, wouldn't it?

Yes, but since my intention was to compare both methods, the important thing is that the conditions remain the same. However, it is good to take your advice if one intends to implement it in a real scenario.

Quote

I believe significantly better search algorithms already exist

Out of curiosity, which algorithms are you referring to?