My Scrypt-like hashing function runs on a single core at about 1 k/hash with relatively few optimizations.
Ok. At that speed nodes in a DHT would only need a few kb/s of internet bandwidth to reach the full potential of the DHT. How much gain that would give would depend on the difficulty of finding a match. I think it would be cool if you made it even slower (if only by repeating it enough times) to make it suitable for one huge DHT. I don't know if it's possible, but for other reasons than the ones you brought up. Anyway, good luck, I think that you could have a great proof of work function with some modifications. I'm outta here!