The reality is that optimally nodes should already have all (or at least most) of the txns that make up a new block. Txns propagate the network independently of blocks. The current block message is the simplest aproach but it is also the most efficient. Most of the time when your node receives a new block it is 99%+ information you already have. The good news is that the protocol can be made more efficient. Blocks stored on disk consists of the header and an array of txns. Today the block message is essentially the same format but it doesn't need to be. Block messages for new blocks could consist of just the header, the coinbase txn, and an array of TxIds. The average txn is ~600 bytes and the TxId is 32 bytes so that would reduce the bandwidth requirements of the block message by up to 95%.
Nodes would process new blocks in a slightly different manner:
1) Receive the new block message.
2) Validate header information
3) Validate the block hash meets difficulty target.
4) Construct and validate the merkle tree (only hashes are needed).
5) Scan the local memory pool for the TxIds in the merkle tree.
a) If it is missing txns it requests those by TxId from other nodes (at a minimum any peer the node received the blockheader from should have the missing txn(s)).
b) Validate the new txns.
c) Relay the new txns to peers.
6) The node constructs the block and write it to the local blockchain and relays the block message to its peers.
At any point if the validation fails the node stops. This prevents malicious nodes from making up bogus blocks without completing the Pow (which would be pointless). Optimally the protocol would allow miners to probabilistically include the full txn of txns that are unlikely to be in the memory pool of peers. So a block message could consist of all txns or all TxIds or any combination in between. All this would require changes to the protocol but they could be done in a backwards compatible manner.