I think improving block propagation should take priority before block size. Currently block distribution is an O(n) operation, the bigger the block the longer it will take to propagate, which means miners are forced to mine small blocks to reduce orphan rates. Even if the max block size is increased there is no incentive for miners to use it.
This is a possible way to reduce block propagation time if a solution to the drawback is found. Since most nodes will have a nearly identical mempool of transactions a template could be sent that will construct an entire newly mined block. The benefits are block propagation time is no longer directly tied to its size and there is a reduction in bandwidth. The drawback is if some transactions are missing the receiving node will have to send a request for them, this process can end being slower than just receiving an entire block from the start.