I had considered the potential for such a design. DHT will not perform well enough, you would need 100BT ethernet or perhaps gigabit before the hash table could be filled quickly enough. From what I can tell a SUPER COMPUTER with a ton of RAM and high-speed interconnects would give you the best performance. Here is the difference, such a super computer could be built from commodity parts.
Well, if the hash is so fast you could just collaborate by splitting the birthday space in N ranges and each computer just throws away any result that isn't in its range. This trades hashing speed for memory, so works with rather than against your pow.
What range do you suggest for N and how fast is the hash?
This should be possible to test.