If the offchains or sidechains are centralized in order to raise performance then there is a central point of failure (and central point for surveillance, and central point for government kill-switch) for 90% of the bitcoin transaction volume. This defeats the whole point of the decentralized peer-to-peer network. If the off-chains/sidechains are decentralized then they need a large network of full nodes. Where are they coming from?
Gavin's testing of 20MB blocks shows that loading (disk io) for these is 4x faster than 20 1MB blocks. So there are economies of scale which are not easily seen with lots of sidechains. The off-chains/sidechains may have a role for micro-tx or faster block times, or 2.0 data storage, but don't help with handling tx volumes while still maintaining a fully decentralized payments system and currency.
It's true that the off-chain transactions would be easier for governments to diddle. So? The blockchain is still there if you're willing to pay for it. You don't just get the nice feature of secure anonymous money transfer without paying for it. Sure, one 20 MB block writes faster than twenty 1 MB blocks, but that 20 MB block is followed by another 20 MB block ten minutes later. If your argument is read/write speed, then off-chain methods are certainly faster. The blockchain is not meant to be fast or efficient; it's just the only way to do what it does, which is keep track of a decentralized ledger.
If I am a miner, can't I create artificial scarcity by enforcing a soft limit? By simply not including transactions below my profitability threshold?
Sure, but you still have to validate, distribute, and store the large blocks of other miners. Also, full node operators that aren't miners will have no control over this soft-limit.