The algorithm proposed by Gavin is very simple.
"Let's kick the can far enough each time so it's always far ahead of us"

There are probably reasons for that. Maybe adding block size into the header is too difficult or some other concerns. I haven't yet found a detailed explanation of
why this particular algorithm should be selected, but that's probably argument from ignorance

Granted, anything even remotely more complex has a very slim chance of being implemented and agreed upon. So we probably won't be able to put any additional pressure on miners and will have to hope that everything will eventually work out somehow.
But are there really any good reasons to prefer the exponential curve to the simple sliding window algorithm, which stays reasonably ahead of the current block size?