If you want to believe me, then I can vouch for mtrlt's gpu miner being significantly more efficient than any current cpu miner for scrypt.
From what I know of the gpu miner, option 3 of modifying the scrypt parameter will have minimal impact. The pad size did not seem to matter much, and can be compressed for lack of a better word, with on the fly value reconstruction. So any increase in pad size will have a relatively equal impact on cpu miners until you exceed their cache size, at which point, gpus may become even more efficient.
I think you will be stuck with option 2, finding a completely different hashing algorithm.