Thanks for your replies, much appreciated. I'd imagine that recomputing a Merkle root would be quite fast so wouldn't interrupt the main business of grinding too much. Even though, I'd expect most software implementations to run this in a separate thread on the CPU, leaving the ASICs to get on with the grinding uninterrupted. If the algo decides its time to include a new TX it can prepare the root on the CPU, then pass it over to the ASICs seamlessly. Sound about right?
If so then sounds like it's fair to assume that a new TX with reasonable fee can expect to get into the current block in normal circumstances.