Ah, that's interesting. When you contrast that with Satoshi's November 2008 e-mail, where he clearly explained how 100 MB blocks were no problem, and how users would use SPV clients ; and when you see that Hal Finey was the one pushing for the 1 MB limit according to some, we now see that Hal Finey finally took power over Satoshi. Hal Finey is writing here exactly the same objection that Satoshi already replied to in November 2008: "of course we don't send all transactions to all users".
Satoshi never had any doubts about the scaling non-problem from the beginning. Most users simply didn't need the block chain, and that's exactly why he introduced the SPV possibility with the Merkle tree - otherwise there's no need for a Merkle tree structure in Bitcoin ! The very single only reason Satoshi invented the ordering of the blocks in a Merkle tree, is that this allows SPV. If blocks are to be used as a whole, you can simply calculate a single hash of the entire block. Nowhere else do you need any Merkle tree. The Merkle tree is a way to have a minimal number of steps of verification of presence of a piece of data in a block, and really becomes useful only when blocks are very large.
Otherwise you could even resort to a sub-list, that is, a block is a linear list of transactions, and to each transaction corresponds a hash, that can itself be included in a hashed linked list of "hash blocks" all the way to the block header, containing the hash of the last "hash header". The problem is that this list goes as N, when N is the number of transactions in a block. A Merkle tree does the same, but the depth goes as log2(N). This becomes a significant thing when N becomes very large, that is, when blocks become very big. For 1MB blocks, with some 2000 transactions in it, this is not yet very significant. If, in order to check that a given transaction T is in a given block, you need to get that famous "linked list" with 2000 entries, to see that your transaction T was indeed, in the K-th entry of those 2000 entries, that's still very feasible. However, for a block of 100 MB, looking in the list of 200 000 entries, or looking in a path of the Merkle tree, only 18 steps deep, is a hell of a difference.
So from the very start, Satoshi designed bitcoin as a very big block system, of which only mining nodes need to have the full data burden, and of which all other users use SPV and connect to one of these nodes.
The SPV system that satoshi described involves fraud proofs, which are proofs that miners did not commit fraud. However we have no such thing today. From the paper (emphasis mine):
While network nodes can verify
transactions for themselves, the simplified method can be fooled by an attacker's fabricated
transactions for as long as the attacker can continue to overpower the network. One strategy to
protect against this would be to accept alerts from network nodes when they detect an invalid
block, prompting the user's software to download the full block and alerted transactions to
confirm the inconsistency
Satoshi realizes that SPV is not secure, and that some method must be implemented in order for SPV nodes to know that they are not being defrauded, e.g. by full nodes giving them some alert. But the Bitcoin network does not support such a thing, so Satoshi's "SPV vision" does not work until such proofs can be made and be provably sound (i.e. you can't fake a proof).