i believe there are significant incentives for miner not to create blocks that are TOO big
The relay limitations of all miners are not equal. Orphan risk is unequally distributed based on these limitations, which threatens the viability of groups of miners controlling a minority of hash power = miner centralization risk. See Pieter Wuille's simulation:
https://www.reddit.com/r/bitcoin_devlist/comments/3bsvm9/mining_centralization_pressure_from_nonuniform/What defines "too big?" Without
some hard limit that defines this, there is nothing to prevent miners from forking networks.