The most amusing stuff using mutex locks and creating bloomfilters with the same inputs two times in row.
alexander@alexander-home:~/Documents/Test_Dir/Point_Search_GMP$ diff bloom1B.bf bloom1.bf
Binary files bloom1B.bf and bloom1.bf differ
What did you expect? I looked at your update, and you are simply creating multiple mutexes, one for each thread that runs process_chunk. And locking the entire loop. Basically protecting nothing.
That's not mutexes are for. You only need a single mutex, and you only need to lock the "bf.insert" call, not the entire loop (or else the entire loops will be exclusive).
I'd personally move the mutex to the bloom filter code, and further block only the actual code that accesses data which can potentially be shared (for example, the hashing part probably doesn't need exclusive access).
But I'm glad that at least you got to a case where you can clearly see that the output is wrong, when synchronization is missing. So which one of those 2 outputs is the right one? You'll never know, since they were basically in a race condition, running both in parallel under different mutexes (so, identical as not having a mutex at all).
If you wanna go fancy you can implement a multi-mutex scheme, one for each some memory area size, and only lock the specific mutex for the area the bloom filter writes. This may increase throughput, or it may not, the right balance needs to be found by trial and error. But this is not a programming thread, after all.

LE: another option is to compute the points in parallel, and queue them in a producer-consumer fashion. And consuming the queue in a single thread, that only does the BF insertions. This simply moves the sync on the queue itself, of course, if you don't want to mess with the bloom class.
for (int i = 0; i < POINTS_BATCH_SIZE; i++) { // inserting all batch points into the bloomfilter