Backward compatibility is ensured by an additional information specifically for older nodes to make them still able to process both segwit and nonsegwit transactions together. This definitely adds more size but still allows for greater efficiency.
Sort of the opposite, for pre-segwit peers the witness data is simply stripped out by the peer that provides it to them. Normally stripping stuff out wouldn't work for backwards compatibility, since you can't generally take data out of a transaction and still check out its txid... but the whole idea of segwit is that its important that signature data *not* be part of the txid (to protect against malleability attacks). For segwit the signature data is protected in blocks by the witness commitment in coinbase transactions, which pre-segwit nodes ignore.
I say the opposite because it's made compatible by simply removing the incompatible part when speaking to older peers.
This approach is also nice for SPV clients, -- they can't do anything with scriptsigs anyways (they can't be verified without the ancestor transactions, which SPV clients don't have), so it's just a waste of bandwidth that they had to be sent for the SPV client to check that the txn was present in blocks.