That's a problem. Is there any existing well-known hashing algorithm which favors generic CPUs and have some properties preventing it to be implemented in highly-effective GPU/FPGA/ASIC for acceptable cost?
You don't need a raw hashing algorithm to impose these kinds of requirements on the system. You just need a unparallelizable process that ultimately leads to a random string of bits--i.e. only the last step needs to be hashing. This is a silly, off-the-top-of-my-head idea, but it should illustrate the point:
The current process for POW looks like this:
a = Header
POW = Hash256(a)
Instead you could make it:
a = Header
b = [Hash(a) Hash^2(a) Hash^3(a) ... Hash^N(a)]
c = Gzip(b)
POW = Hash256(c)
The idea here is that each thread must compute N sequential hashes and save all the intermediate results so it can make one huge string and gzip it (I believe gzip needs the entire string in memory to operate... if not, then pick a different operation that does require it). Then the result is hashed to 32 bytes to get the final proof-of-work.
The reason something like this works is that N can be chosen to enforce any kind of per-thread memory requirement. If N is 10,000,000, then each thread
has to hold 320 MB of data at once to do the GZIP. That means that a 6990 would only be able to run 10 threads at once: possibly faster than your 4-core CPU, maybe not. Most GPUs with 1 GB of mem would only run 2 or 3 threads. At best, your GPU would be like an extra CPU, and a modern quad-core computer with 4 GB of RAM could probably run fine with this in the background (or the user could scale it down to 1-2 cores when they need the resources).