EDIT: Oh, and of course, there must be tolerance levels too (if I'm X blocks behind the chain I once rejected, I'll give up and start building on top of it). You don't want to create that many chain forks!

Absolutely. Of course, that sadly means that we won't be able to ever trust a block until it gets past that point (which I think should be 2-4 blocks). So, to mitigate the damage that will cause to the practical confirmation time...
Rationale: we should use time-to-verify as the metric, because everything revolves around the 10-minutes-per-block constant.
...let's lower that constant. Additionally, by lowering the block-creation-time constant, you increase the chances of there being natural orphans by a much larger factor than you are lowering the constant (5 minute blocks would on average have 4x as many orphans as 10 minute blocks over the same time period). Currently, we see that as a bad thing since it makes the network weaker against an attacker. So, the current block time was set so that the block verification time network-wide would be mostly negligible. Let's make it so that it's not.
To miners, orphans are lost money, so instead of using such a large constant for the block time so that orphans don't happen much in the first place,
force the controlling of the orphan rate onto the miners. To avoid orphans, they'd then be forced to use such block-ignoring features. In turn, the smaller the constant for block time that we pick, the exponentially smaller the blocks would have to be. Currently, I suspect that a 50 MB block that was made up of pre-verified transactions would be no big deal for the current network. However, a .2 MB block on a 2.35 seconds per block network (yes, extreme example)
absolutely would be a big deal (especially because at that speed even an empty block with just a coinbase is a problem).
There are also some side benefits: because miners would strongly avoid transactions most of the network hasn't seen, only high-fee transactions would be likely to make it into the very next block, but many transactions would make it eventually. It might even encourage high-speed relay networks to appear, who will require a cut of the transaction fees the miners make in order to let them join this network.
In summary, I propose that to avoid the tragedy of the commons problem, instead of limiting the available space, we limit the available time allowed for the block to propagate instead. Now THAT is a Bitcoin 2.0 (or rather, 1.0)