If the attacker started to create his double-spending subtangle long time ago, then the initial tx's of this subtangle reference some rather old tx's, with not-so-big cumulative weight. While the attacker waits, the cumulative weight of the legit tangle continues to grow, so he won't be able to catch up.
Of course, this assumes that the attacker's max possible tx's rate is much less then the "usual" tx's rate of the rest of the network.
The first (legit) transaction references the same old transactions as the doublespend. The attacker doesn't need to compete with the rest of network.
OK, but the legit tx quickly starts to accumulate weight (as the honest nodes reference it, directly or indirectly), so, by the time the merchant accepts it, most of the tips of the legit tangle are already referencing it. Even if the attacker publishes his subtangle at that moment, why the honest guys would reference it? The tips from the attacker's subtangle have smaller cumulative weight, and, in the event a honest guy tries to reference a legit tip and the attacker's tip, he'll detect the contradiction and won't do it. Therefore, the attacker's subtangle will be abandoned.