What I don't understand is that if soft-forks are backward compatible why would other miners reject it? Doesn't backward compatibility mean that new blocks are seen as valid by old clients?
There are more things to consider than just whether the blocks that are produced will be valid. There are other concerns too. For example, segwit will likely reduce transaction fees. Thus a miner will be opposed to segwit since they will make less money in the short term if it activates. But at the same time, activating segwit can also earn them more money in the long term.
And if you wait everyone to update what is the benefit for soft-fork?
Soft forks only require miners upgrade. It means that soft forks are more likely to successfully activate as less people need to upgrade and that those who don't upgrade won't be kicked off of the network. The reason most miners need to upgrade is so that chain splits do not happen if a block that is invalid under the new rules is mined.
Is there a con of Hard-Fork if it has 95% support?
The remaining 5% could, in theory, maintain a separate blockchain but uses the same blockchain parameters as the main blockchain but with different consensus rules. Also, keep in mind that with a hard fork, basically everyone (miners and users) need to upgrade whilst with a soft fork, only the miners really need to upgrade.