Second, even with an "infinite" block-size, there are still practical limits to the size of blocks. For example, my full node (with idle token hash-power) is currently set to broadcast 500kB blocks. The reason is that my (ADSL) Bandwidth is limited to 5Mbps up. If I want to send a newly found block to 16 hosts at once, we are talking a delay of about 12.8 Seconds. With a 600 second block-time, that corresponds to an orphan rate of at least 2.1% (one hop). If I had a 1Gbps connection, and wanted to limit my orphan rate to 5%: 600x.05=30 seconds. 1Gbps*30s/(say)64 connections*8bits/byte=58.6MB Block-size (again assuming one hop). At about 300 bytes per transaction (many transactions are larger), that works out to about 195 thousand transactions per block.
Ever heard of "headers first" ?
Yes I have. All "headers first" does is save bandwidth on re-transmission. In order for other nodes to build upon your block in a trust-free manner, they still need the whole block. Edit: there is some provision for blindly trusting the header, then banning mis-behaving hosts later.
Upon review, I did make one error: when mentioning "orphan" blocks above, I was actually referring to "stale" blocks (orphan blocks have no parent that you know about).