It is possible that there is a fundamental misunderstanding here.
I don't think anyone ever claimed that segwit was a way to expand capacity in a more (or even equally) efficient way than simply increasing the block size.
The advantage of segwit is that it elegantly fixes a couple of other hard problems (malleability, O(n^2) sigops issue) while ALSO allowing more transactions per block without requiring a hard fork for the block size. The amount of data in the blockchain for fully-validating nodes will definitely increase, just as it would if there were a 2MB block-size hard-fork.
Am I misunderstanding the concern here?