Using "the smaller hash" has a benefit: It is unambiguous!
Orphans are the result of a network split (different portions of the network consider a different block as the best current block). If a client knows two blocks and has to decide, using the smaller hash will always lead to the same decision. This works towards the goal of reducing orphans (reducing the occurrance of differing decisions on distinct clients).
No, we're talking about a scenario where there is already an orphan. It cannot be avoided. Even if the whole network focusses on extending only one of those immediately, your next block still has the same risk of an orphan. So it doesn't gain anything. On the other hand, using the lower hash opens the door to new attacks: if I find a block with a super-low hash, I can just wait to broadcast it until someone else finds it with a higher hash, and instant-orphan them!