Post
Topic
Board Development & Technical Discussion
Re: PointsBuilder - fast CPU range points generator
by
kTimesG
on 14/05/2025, 14:17:18 UTC
So if we store sqrt2**30 , a space of 2**31 will need 2(sqrt2**30).
But if where are we storing them , ram or disk ?

If you have 2**30 stored items as the square root, then N is 2**60, not 2**31.

If you can't fit those 40 GB in RAM, you can use a bloom filter (only takes 128 MB, but has the hashing overhead) and do the disk/whatever lookup only for filter hits; or you can use those 8 GB of Phase 1 lookup in RAM, and read from disk/whatever for Phase 2, if the Phase 1 returns a positive hit.

You have all options on the table now, it should be clear what you can use. It's not a surprise that these options simply trade time with space. These options are ordered by speed (fast to slow):

1. O(1) lookup: not enough RAM in the Universe.
2. Binning and 2-phase lookup (40 GB of storage, O(1) average lookup, O(16) worst case)
3. Bloom filter (128 MB + key hashing overhead) followed by O(logN) lookup (or option 2 above).

Choose whatever firs well with your resources. You can also make the binning use more memory to reduce the number of false positives (less hits for doing Phase 2). That's what I'd do anyway.

Big warning here: binning can only work if you know before hand the maximum amount of items that an entry can have (to know how many bits are needed for the length). That's why the precomputation is required before building the lookup tables.