One more question regarding the block header.
Of all the 6 elements, is it the Block Time alone that guarantees everyone is working on a random hash? To ask another way, if Block Time wasn't part of the header, would everyone be working on the same hash making it a contest of which miner is the fastest?
Version --> has to be bigger than a certain value (currently >=4) but can be anything up to 4.2 billion
Previous Block Hash --> same for everyone
Merkle Root --> different for each miner depending on transactions in that block
Block Time --> different for each miner depending on miner's computers time and how often it is updated in header
Bits --> same for everyone
Hash Nonce --> different for each miner
How many transactions can fit into a single block?
There isn't exactly any tx count limit, the actual limit is the size (or weight) of the block (sum of all transactions' weight). So the block can contain a single transaction that fills its entire size (1 MB) or 2 with a total of 4MB weight, or it can have thousands of smaller ones each taking smaller space to a total of 4 MB weight. Usually blocks are full and contain around 2000 transactions.
How does the miner decide which transactions will go into his mined block, is it automatic like an auction where the highest fees get in first?
Mostly based on fees, meaning they prefer transactions that pay them more so they can increase their profit.