And in general, GPUs aren't as memory limited as you are thinking. The typical highish end video card that we use has more memory on it than a generic server from a couple of years ago. Any change that makes hashing memory intensive enough to stop GPU mining will kill the network by forcing nodes to drop out.
I don't agree. I don't see how the GPUs can be "not as memory limited as you are thinking." We know the total RAM on the card, and we can count on a little bit extra for local memory and registers, but not on the order of 100s of MB. I program CUDA on NVIDIA GPUs, and you're pretty much limited by the RAM amount stated on the box (and OpenCL isn't much different). You can use CPU RAM if you want, but there's massive latencies associated with transferring data over the PCIe bus. Perhaps the latencies aren't all that important in this application but the point is to make the computation memory-limited, not computation-limited.
Right now the computation requires virtually no RAM, beyond a few round constants and storing the original 80-byte header. A GPU with 1600 shaders might have 2 GB of RAM-- that's 1-2MB per core. But a CPU typically has 250-1000MB/core. If the each thread only required 100 MB RAM, then even semi-older computers could apply all their core(s) to computation without impacting system usability dramatically (if at all). But it would limit the 1600 shaders on a GPU to only 20 simultaneous threads (about 1-2% GPU capacity).
As for GZIP... the point was to apply a process that needs the entirety of the data in order to run and can't be broken into smaller pieces. I'm sure there's a billion existing algorithms that can do this, I just guessed that gzip was one of them (incorrectly).
Personally though, I think the biggest threat is actually botnets, so the dominance of GPUs is actually preferable to me. You don't have to agree, though.