1. Would this affect physical merchants who accept 0-confirmation, since transaction propagation will be longer and merchant/user might have wait a bit longer while there's queue?
No. The transaction will take a little bit longer (but likely not noticeably) than it would without dandelion. The effect of this is really just as if you had hesitated a few extra seconds before sending the transaction. There is no difference to these merchants as they will still likely receive the transaction at around the same time the vast majority of the network does. There is no queue.
2. Would this affect block size/weight limit size increase in future?
No. This is not at all related to transaction sizes, transaction formats, or consensus rules. It is purely a network protocol change. It could be deployed right now with no other changes to Bitcoin.
3. Is using Tor/I2P/Kovri better/simpler solution?
With Tor (I'm not super familiar with the others), you could potentially correlate multiple transactions to the same node. AFAIK, with Dandelion, a new circuit is chosen for each broadcast, so it is much harder to correlate multiple transactions. Also, as mentioned earlier, it would be better for Bitcoin to adopt its own privacy protocols rather than relying on external ones. If the privacy protocol was built into the network protocol itself, that would be better than just a few people choosing to use an external privacy method.
That said, it will obviously be slightly more resource intensive for those choosing to use Dandelion. You'll be maintaining two distinct mempools.
I certainly didn't think that, but since once the transaction is broadcasted to network, you simply move transaction on stempool to mempool. IMO it has bigger impact on computational resource.
I could be wrong, but since your stempool will handle other peoples' Dandelion transactions, I thought it fair to assume that both stempool and mempool would need to be maintained continuously. I doubt it will be particularly demanding on your system, though. I really like the idea.
Resource usage won't be that much worse as some tricks can be used. The stempool is a superset of the mempool, meaning that everything in the mempool is also in the stempool. So instead of duplicating everything, you can just maintain a separate stempool only set of transactions. This would take up the approximately the same amount of memory (there's some data structure overhead) as if the node didn't have dandelion since all of the dandelion transactions would have still happened anyways and gone into the mempool.
For CPU usage, it won't be that much more.
Dandelion is
BIP 156.