@OP It wasn't just a couple of years ago that empty blocks started to pop up. It's actually only logical that the earliest blocks were most likely to be empty, since there couldn't be any transactions to fill them up if they were just a handful of users! As the network grew, and more users transacted, empty blocks became more uncommon, and txs in fact started to join a queue, to the point that empty ones were noticed, hence the criticism/debate/discussions surrounding them.
your ASIC that is actually mining the blocks doesn't validate blocks or construct next ones, it just hashes the 80 bytes that it receives. while it is hashing, your node can easily construct the next one before it reaches the end of nonces and fails to find and requires a different change in extranonce or something. and while you are at it you might as well add new transactions and construct the merkle root too. and since this is done elsewhere (not in ASIC) then there is no difference in time it takes to mine an empty block or a block with 3000 transactions!
I've always understood that you can't check if block N txs are valid without seeing block N first... and you still need to validate N if you want to include txs in N+1, and that is why they don't include them, and solve it and add asap. I think the difference is minuscule but enough for an advantage. Perhaps that advantage is negligible today?