It would be nice if someone has some input here because I can't tell whether a block reward penalty would really solve the problem at scale.
It doesn't solve the problem. Block reward penalty applies to the block subsidy, and it is really negligible with only 0.6 XMR as subsidy. With Monero transactions sized at at least a few kilobytes, it only takes a few hundred transactions to incentivize the miner to burn the subsidy; it only takes 600 transactions paying 0.001 XMR each (which is essentially nothing). In an envisioned global adoption, the system should be able to handle more than 600,000 per 2.5 minutes, not 600. Your Monero client would need to verify 4000 transactions
per second, which is infeasible.