For #1, if I mine on one fork, doesn't that fork immediately become the one that will get most likely get accepted by the network?
If a) the protocol foresees a "first seen"-rule that prefers blocks that you received first, b) everybody is abiding to this rule (not using modified clients) and c) network latency is evenly distributed among the nodes, then probably yes.
If so why even bother with mining on both?
As I explained in the probabilistic approach, you don't even have to actually mine on both and send two blocks. It's even extremely unlikely that you will succeed on both. To maxime your chances it suffices to just check and see if you can mine on any of the two blocks, no matter which one you received first. Such a behaviour can distrupt consensus.
Plus don't some proposals punish this multiple fork mining behavior?
Yes, please check out the links in my previous post for further details.
For #2, when is this attack used, during initial block download? Is the idea to use this stake to try to perform a stake grinding attack in advance and send those blocks to a syncing node instead of real chain?
Ideally, you would try to buy coins from early adopters when the coin wasn't popular already. That should make it "easier" to buy keys representing a large precentage of stake that existed at that early stage. However, as pointed out in the NeuCoin paper I cited, even if you possess a majority of historic stake it seems that you still have no realistic chances to win the battle since you'd still have to compete with 100% of the stake.
For #3, I can't obtain access to 50% of coins without exchanging for other tokens, fiat or goods/services can I (with the exception of #2)? Those are sunk costs that I can't recover if I cause problems with POS chain.
If there is a a big-enough market for short selling the coins, you could sell at a predefined price without the need of buying the stake beforehand. So, the subsequent devaluation of the coin caused by your attack wouldn't affect this price.
Another attack vector (that is even working wihtout the possibility of short selling) is to regularly buy 51% of the coin and launch lower scale attacks that remain largely uncovered and thus don't have a negative impact on the market price.