Answering the question directly; I don't see that it would be too difficult to optimize an algorithm specifically for x86 hardware; such that an ASIC would simply be x86 with a few instructions removed. The question is whether or not that's useful.
I think having a solution that required several proof of work algorithms, i.e. one is selected somewhat randomly based on the last block might be the way to go.
A general CPU would be good for this task.