As I said, the block time needs to be more than several times the maximum end-to-end propagation delay. This is needed to prevent competing nodes from generating blocks at the same time and convincing large parts of the network that their respective block is the winning block. If competing nodes can convince large parts of the network that their respective block is the winning block, you risk fragmentation of the blockchain. Having the block time be at least several times the maximum end-to-end propagation delays helps prevent competing nodes from convincing large parts of the network that their respective block is the winning block.
Block chain fragmentation is not the issue, an overhead for sure, but it's the "double spend attack" from competing chains that we are worried about, which BTW doesn't means "51% attack". You can try to double spend with less but you may not be successful. The more blocks generated, the smaller your chance. The probability of success with a <50% attack depends on the number of blocks and not the amount of time. This is an advantage of shorter block times.
Yes, but like most things in life, an important balance between too much and too little exists. Shorter block times make it harder for <50% attacks to succeed, true, but by increasing the number of orphans, it decreases the effective hash rate of the network, making a >50% attack potentially much easier. Bitcoin considers the potential risks of a successful >50% attack to be very high, so Satoshi chose a long block time.