So I would suggest a block size limit set once every 2016 blocks at the same time as difficulty retargeting, at 12 times the average for the previous 2016 blocks.
Why once? Why can't it be a sliding window of 2016 blocks?
So here are some changes I propose:
1. A
sliding window of, say, 2016 blocks.
2. We track
median block size in this window. (Median makes it impossible to game unless you can generate 51% of blocks and is also more conservative to occasional random spikes).
3. We set the block limit to something a bit smaller than VISA, again to be conservative, because blockchain has a cost. So, say, x10 or x8, not x12.
4. We track median transaction fee in each block and then the median of those inside the window to get Median Tx Fee.
5. We impose a line of minimal tx fee for each new block. It starts at 10% Median Tx Fee (to protect from spam and unimportant txs) and raises slowly to something like x2 at the block limit. The exact parameters can be tweaked, but the general idea is to put additional pressure on miners not to blow up block size.
It would also be nice to put both block size and median tx fee for this block into the block header, so it's easier to validate chains or calculate min tx fee required.
Here's a graph to illustrate.

If a miner wants to add Sample Tx at that point, he must make sure it has the fee that will be above minimum tx fee imposed by the red line.
The farther into the block you go, the larger the minimum fee must be.