Empty blocks are usually found very quickly after a "normal" block. The explanation I have heard most is that miners will start mining an empty block while they still assemble the tx and merkle roots for the next "normal" block. Sometimes they get lucky in this short time.
Yes, if the blocks are relayed at almost the same time.
The frequency of empty blocks would suggest this "short time" to be around 5 seconds, plus/minus 50% or so. This sounds like an awfully long time to get the normal block preparation done.
Is there something I am missing? Is the chance to catch the block reward with an empty block good enough to always mine on it for several seconds?
Miners usually have to download and validate the entire block before continuing to the next block. Certain pools would connect to each other with zombie users to attempt to get the block headers ASAP, without downloading and validating the blocks. This was the case back in 2015, but the efficiency of block relaying has improved so much that it really isn't that necessary.
I'd say that there's really nothing to lose for the miners. I'd say that it would be worth if the relaying and validation takes a few seconds; if you were to wait and assemble a new set of merkle roots in the meantime, the time inbetween is still wasted. The only caveat being that it isn't fully validated before the pool starts on the new block; resulting in 2015 Bitcoin fork fiasco.