I know that it requires some extra hashes but if my math is correct we are talking about 12-13 GHash/s for a 400 PHash/s pool. And I don't think 12-13 GHash/s cost millions of dollar per year.
This is how I derived the number:
A 400 PHash/s pool grinds through 100 million mid-states per second. With 4-way asicboost, it needs about 25 million 4-way collisions per second.
If one collects collisions on a central server (the most efficient way to find a lot of collisions in a big pool), one needs about 3 billion hashes to create 25 million 4-way collisions. If one can live with 5 seconds delay (i.e. blocks miss some high-fee transactions from the last 5 seconds), one can reduce this to 5 billion hashes in 5 seconds, or a billion hashes per second.
Collecting the hashes and storing them to find the collision is not trivial, but should be possible. You have to do it for any form of covert asicboost.
Now, the extra commitment in the UTXO makes computing the hashes more expensive. Instead of a single hash, you need to compute 12-13 Hashes for the Merkle root (if you still use Merkle grinding to generate the commitment hashes). But it requires no extra storage. So the total extra cost is 12-13 GHash/s for a 400 PHash/s pool. Everything else is unchanged.