Collecting the tx. from the mempool and adding it to their block takes a set amount of time and the longer that the miner spends collecting the tx. and adding it to their block, the less time they can spend calculating the proof of work - thereby giving their competition (other miners) the edge. The miners would naturally (based on game theory) find a nash equilibrium between collecting as many tx. as possible and finding enough time to calculate the proof of work in order to give them the maximum profitability. Thus, we can assume, that without a blocksize limit (infinite), the block size would stay relatively the same.
I don't run a mining pool nor do I have access to their code. However, I'm under the impression that the construction of the merkle root should not take that much of a time. While the miner focuses on the fees aspect, the miner should be able to just grab the transaction with the highest fee immediately after they verify the block and remove the relevant transactions. Either way, I feel that the block size would serve as a gauge for a controlled growth such that people would still be able to run a full node and have an estimate on its storage in the future.
That was clearly wrong, because in order for that block to be incorporated into the chain, other miners would have to agree with it. There's no reason why honest miners would mine on top of a crazy block. In other words, implicitly, there would be a gross maximum size set by miners and that would grow dynamically.
I agree with you on the 1GB part. But it isn't if the miner would agree with it but its with the fact that the propagation would be so slow, such that the chances of it being orphaned is incredibly high.
I feel that the main motivation behind this is that big blocks wouldn't be able to be handled by the internet for most of the nodes at that time.