From miner perspective, in the short run, it is most profitable to include tx with fee that is grater than penalty from broadcasting this transaction. As I understand, as soon as block O(1) propagation is included this penalty is greatly reduced and amounts to almost 0. Therefore if miner is greedy it is best to include almost all transactions with non-zero fee as soon as possible. This however will destroy fee market -> fees will collapse to 0. This is not free market. Good well known analogy is situation when there is an unexpected surplus of some goods eg. food in the wake of some international sanctions. It is then not possible to sell all goods in the domestic market as the demand is not enough even if the wholesale price is 0. Therefore artificial limitation (destroying crops) is crucial for securing farmer revenues.
From miners perspective the best limitation is the one that brings miners the biggest revenue. That is probably when the fee is competitive with the other means of payment for small to medium types of transactions as those types of transactions are the most common ones. Therefore the ideal tx fee should probably be somewhere between $.1-$1.
If miners cooperate they have the power to limit soft maxblocksize by simply not mining larger blocks. But is this kind of cooperation possible in the view of the fact that greedy miner would mine all non-zero txs and have higher revenue? I think Gavin thinks it is possible so the hard maxblocksize role should be different. It should be just an upper limit of soft maxblocksize that miners can impose and that would guarantee network decentralization.