Why was 95% adoption rate selected for activation as oppossed to 80% or 50% or even 5%?
So that the soft fork deploys with supermajority. It will essentially have consensus when it deploys. This ensures that the new rules will be deployed and enforced by the miners. The 95% rule has been used for all previous soft forks, no reason to change that now.
Not all-- we've increased it over time in response to prior instability and as we all learned better the implications. BIP30 just used a 'past this time' decree. BIP16 was based on 55% and just used a time for the actual activation, and resulted in months of low levels of orphaned blocks being produced. Satoshi used several hard cut softforks that were just triggered on blockheights. BIP34 was the first to use 95% but it actually started enforcing the rules for a subset of blocks at 75%.
BIP9 changed to a new quorum sensing approach that is MUCH less vulnerable to false triggering, so 95% under it is more like 99.9% under the old approach. But we saw no reason to lower the criteria: basically when it activates the 95% will have to be willing to potentially orphan the blocks of the 5% that remain if they happen to mine invalid blocks. If there is some reason when the users of Bitcoin would rather have it activate at 90% (e.g. lets just imagine some altcoin publicly raised money to block an important improvement to Bitcoin) then even with the 95% rule the network could choose to activate it at 90% just by orphaning the blocks of the non-supporters until 95%+ of the remaining blocks signaled activation.
So basically setting the criteria high protects the stability of the network in the common case, but never ties anyone's hands against something not activating. By default the safest thing happens, but no one can exploit this in an attack.
Because of this, the trade-offs favor a high threshold.