I believe the VERY FIRST response on the punks mailing list to Satoshi's announcement was "It does not scale". And theat person was effing smart. Because they could see far enough down the road to realize that the architecture of bitcoin was flawed if it was to keep all of it's important attributes. I have assumed that person was someone familiar with networks. And saw that bitcoin was a "broadcast network". I am not math nerd. But I get why continuing as a broadcast network is not really possible without centralization.
Imagine if cell phones worked this way. When you were talking: Every cell tower on the entire network would be transmitting your conversation. As well as every other conversation. This cannot work. Not yet. Possibly not ever.
The issue is not can hard drive get big enough to store the blockchain, but can the network itself sustain the amount of traffic produced when every person on the planet is using it.
That guy back in 2009 was right. It does not scale. Not that way.
So how will it?
I believe we are seeing the answer to that question being decided in real time in the arena of production software. It really is quite exciting, isn't it?
I think you're conflating a couple of issues here. Unfortunately, the Bitcoin Core that Satoshi put forth was very monolithic and would have benefited greatly from a more modular design. The wallet, for example, should have been a separate module. Likewise, though less obviously so, the networking should have been separated. While the gossip network is not a bad way to implement the networking for Bitcoin in a decentralized manner, it is by no means the only way that a node could receive transactions and blocks and many other more efficient and context specific schemes could be implemented. Exhibit 1, Blue Matt's Fibre relay network which allows miners to ensure they receive blocks in the timeliest manner and Exhibit 2, the --connect option on the core software which ensures that a node receives its data from another specific node rather than the gossip network. Additionally, it could be imagined that nodes might run in a similar way to the gnutella network with ultrapeers and leaf nodes. There are also reasons why your phone call analogy is not such a good one either. Nodes aren't shouting at each other all the time but check if transactions have already been received. There are also some schemes to cut the amount of block data that needs to be transmitted (since most nodes should already have most transactions in a block in their mempool already). I'm not sure if Core is working on that but BU has something working.
So the scaling really isn't about the networking and, correct, it isn't really about the blockchain size since disk space is cheap and there is pruning and other schemes that could ameliorate that issue for most users. It more becomes about the UTXO set. Though the urgency of that as an issue could be a matter of discussion.