Anyhow, the obvious solution is to disallow rollbacks. Doesn't really matter how you get there.
Do you mean make orphans impossible? I think the only way orphans can be eliminated is if all the blocks are correct. And the only way all the blocks can be assumed to be correct is if the participants are assumed to be honest. But if the participants are honest, you don't need PoW or PoS: you just trust by default.
You could also say, "OK, let's allow orphans, but only a certain way back." But if you do this, then you need checkpoints. And if you use checkpoints, who decides what checkpoints to use?
So I don't think such a system can be Byzantine robust.
I think I've come up with a way to make PoS - at least potentially - work as a good solution to the Byzantine Generals problem. It's still kind of shaky in terms of practical applicability because it's sensitive to when large transactions are made - but if transactions are reasonably 'steady' the math works against an attacker.
With Nakamoto's Proof-of-Work solution, we keep track of the difficulty solved at each block, and from that, when we evaluate two chains to see if one should displace the other we can work out how much hashing work went into each chain. That amounts to the irrevocable consumption of a resource in finite supply, with the result that we can be certain that the hashing was not expended on both of two chains.
The crucial point is that a resource in finite supply can be shown to have been irrevocably consumed. Well, the point at which a TxOut gets spent or transferred is also irrevocable.
The first 'serious' proposal for proof-of-stake essentially said, we can keep track of the amount spent in a version of the block chain, and treat that as the measure that a 'superior' chain has to beat, in the same way that a 'superior' chain in proof-of-work has to have generated more hashing.
The problem with that idea is 'nothing-at-stake.' A TxOut can be spent in both chains, and therefore the priority generated by its expenditure can be counted in favor of both chains. When you're talking about multiple chains each with its own 'consensus reality' you're no longer talking about the irrevocable expenditure of a finite resource.
But my opinion is that you can
make it finite if you are careful about exactly what you compare.
The set of TxOuts at the last block that two chains have in common is identical. That is a finite resource whose expenditure can be measured when you're comparing two divergent chains. It may not be the only one, but it is one. So, when you compare two candidate chains, you should be counting the expenditure of
that set of TxOuts, and counting it only once for both chains. You could count nothing for any coin spent in both chains and full-points for anything spent only in one, and compare the priority on that basis. Expenditures of the set of TxOuts created after the chain's divergence cannot be counted; they're already counted when you count (or refuse to count due to conflict) the expenditures that have gone to make them.
That rule is all about resolving which potential blockchain gets to displace the other; it says nothing about how blocks are formed in the first place. In fact you could go on forming blocks by proof-of-work, but instead of paying a large coinbase to the hasher, you could pay a very small coinbase to the hasher and distribute most of the coins as 'interest' on the txouts spent in that block.
The efficiency argument for proof-of-stake would work fine with that structure. With Bitcoin's blocks worth a little over $17K right now, and ALL of that awarded for proof-of-work, people are spending up to $17K per block on generating proof-of-work; custom ASICS are old news, now there are multimillion dollar 'hash farms' with their own power plants are getting built. But if, instead, the hasher were getting, say, $10 for forming a block, people would not be spending more than $10 for enough electricity to get a block regardless of how much bitcoin were being created/distributed per block in Proof-of-Stake or 'interest' payments.