Any time you throw away hashes because they fall outside of your memory range you move the computational complexity from the p(n) curve to the q(n) curve in my white paper. You save on memory but the cost is far greater. Memory and CPU power must always be paired proportionally or you will have idle memory or waisted CPU cycles.
I don't wanna pollute your thread with many small replies. If you answer what hashing-power we're looking at in mhashes/s I could make a more thorough reply with numbers. Also, is 32 bit output for birthdayhash right? That would only take storing 77000 entries for 50% chance of finding a collision.
There is a trade off between latency and rate of finding a collision. I am actually using something closer to 36 bits. Beyond 36 bits and the time it takes for a Core i7 to find a single collision would be too long. Remember the collision rate is only the rate at which lottery tickets are found. Difficulty is the rate at which blocks could be found. So this algorithm could be tuned to require no more than 4 GB of RAM after which it produces a collision for every single BirthdayHash... however, there are still benefits for storing and comparing all matches and not just single collisions because every potential match is a lottery ticket. Therefore the number of bits in the collision is almost irrelevant beyond a certain point.