The first one that each node receives is worked on. Whatever miner finds the next block afterward determines which side of the chain will be used. Statistically, that would be the side where the block propagated the most. Another tie at this point would be solved the same way.
So I'm guessing the 10 minute average block generation time is a balance between transaction processing time and minimizing network splits yes?
So is this saying that some nodes in the network (say the top tier nodes) would need a copy of every transaction ever?