Relatively often is all relative.
Roger that, here's how I see it from my POV:
His hardware solves 3.45 shares per minute, taking on average 17.3 seconds to find each share.
Statistically, that's just over 34 solved shares per BTC block.
Let's for the sake of clarity of argument (ease of calculations) assume 2 NMC blocks per 1 BTC block.
It translates to over 17 seconds of his work being lost, on average, during each BTC block.
That's nearly 3% performance hit due to merged mining right there.
There's no escaping that, it's just the nature of the beast.
That isn't how it works. That assumes that the GPU will hash an entire nonce range at one time (2^32 hashes) but it does't. The number of hashes performed is a fraction of that. It depends on difficulty, maybe ckolivas can give us the formula. So when an LP occurs the amount of work lost is much less.
Another way to look at it is the hypothetical miner above processes 34 shares. If your analysis was right it would have 1 stale out of 34 even when not merged mining and thus could never have a stale rate of <3%.