Let's make a simple example with only 3 miners A, B and C who all have the same hash power 1/3 and the same orphan rate 0.01. The miners don't engage in any form of selfish mining strategies. It's easy to see that under these conditions every miner will build 1/3 of the blocks in the chain, as everybody has the exact same chances.
Now, let's assume that A starts building bigger blocks so that his orphan rate increases to 0.2, while B und C retain their orphan rates of 0.01. To determine the fraction of the blocks (in the chain) built by the respective miners, we can calculate:
A: (0.8*1/3) / (0.8*1/3 + 0.99*1/3 + 0.99*1/3) = 0.288
B and C: (0.99*1/3) / (0.8*1/3 + 0.99*1/3 + 0.99*1/3) = 0.356
And we see that B and C can now build more blocks of the chain than their relative hash rates.
According Peter R's equation (3), their success rate would be 0.99*1/3= 0.33, which is incorrect.
Mining is a relativistic game!
Yes, this is with block rewards constant. Tail emission. But in the case of rewards proportional to block length (fees), you have to multiply A's revenues with the fact that his blocks bring in more money. He has a lower percentage of blocks on the chain, but these blocks bring him more rewards as they are bigger.
So if his big blocks bring him 20% more income per block, this is neutral.
However, the thing to keep in mind is to get an orphan rate of 0.2 by network propagation, it means that on average your blocks take 0.2 of the block period to get to the others. 0.2 of 10 minutes is 2 minutes. If you have good links, in order for them to take 2 minutes, they must be mindbogglingly HUGE.
If it takes 2 minutes to pump a block to another miner with whom you are connected with a 10 Gb/s link, we are talking about 100 GB blocks or something.
As I said earlier, this kind of argument only starts to play a role when the network is already dead. Because if a significant fraction of the block time (10 minutes in bitcoin) is what it takes for miners amongst themselves to propagate blocks and get them orphaned, no "normal node user" can ever obtain the block chain up to date, because normal users have a worse network connection to the miners (source of block chain) than miners amongst themselves. Especially if network quality is impacting seriously on their revenues, miners will have the best possible links between them: mutually advantageous (and much less costly than the mining itself: a 10 Gb/s link to Joe MiningPool is less expensive than your mining gear).
If you really want a solution to this problem, then "block length" is not the right parameter, but block income is:
one should cap the "block reward + fee", to, say, 20 btc. As such, miners can make all the blocks they want, long, short, but their TOTAL INCOME (reward + fees) is capped to 20 btc FOREVER (part of the protocol). A block with a total reward larger than 20 btc is simply invalid.
If you do that, you get consensus convergence: blocks will grow until the mem pool is empty, or until fees sum to 20 btc.
If fees sum to close to 20 btc, then users can lower their fees. There is however a dangerous lock out spiral: if fees reach 20 btc, then there will be transactions that are excluded, and users could increase their fees in order to hope to get their transaction in. But then, blocks would become even smaller ! The right action would be that all users stop putting large fees ; which they will, eventually. So users should start LOWERING their fees from the moment that the block rewards start to approach 20 btc.