This is where I'm losing you. Yes there may be a checkpoint, but highest total difficulty still wins out. If the highest difficulty chain conflicts with a check-pointed one, surely the client should go with the higher difficulty one, as you say
Even with headers first, checkpoints still mean that old txs don't need to be validated (assuming you trust the reference client programmers).
You could leave the checkpoints in, and just say that all txs before a checkpoint are automatically valid.
If there was a conflict, I think going into some kind of emergency mode is better than saying nothing.
Better is detecting large forks. If there is a fork that is 1000 blocks long within the last 2000 blocks, then flag a warning and tell users that their balances could be wrong.
Essentially what I'm trying to figure out is a mechanism for blockchain compression so that we can drop very old txs with minimal to no loss in security.
Pruning is not a big deal really. As long as at least 1 node keeps everything, then the network can recover from forks.
If 10,000 nodes each hold 1% of the data, then you are highly likely to have everything.
It is likely that at least a few nodes will be "archive" nodes that will store everything.