Block messages for new blocks could consist of just the header, the coinbase txn, and an array of TxIds.
P2Pool already does this for its own blocks. The problem more generally is that you don't know what other transactions the far end knows (at least not completely) and if you need an extra RTT to fetch the transactions the end result is that the block takes longer to relay than just sending the data in the first place. So in terms of reducing block orphaning and such, it wouldn't be a win. (It works better in the p2pool case, since the nodes themselves are mining and can be sure to have pre-forwarded any transactions they'll mine)
Perhaps as a general optimization for nodes that don't care about getting the block as soon as possible it might be interesting, though it is possible to optimize latency and bandwidth at the same time through more complicated protocols (see link in gavin's post).