What do the Bitcoin nodes verify exactly when a user exits the off-chain system?
Exactly what they need to, no less, no more. Or to be more specific, what you want to know is what the witness verifies. The Bitcoin nodes only ever verify that the witness was satisfied.
If that sounds a little orbicular, it's because it depends on the details of the system in question. What I'm suggesting is a, effectively, an application for a replacement script system, the users provide the script. In general you'd provably destroy the coins in the off-chain system, in a way which binds the destruction to the bitcoin address you want to pay or in some other way visible to the off-chain users unbind them. The witness' job is to observe the transcript and be convinced by it.
For the anti-replay-oracle example, you'd make a final off chain transaction with something like H('hello bitcoin!'||nonce||bitcoinaddress) as the destination, instead of a public key. This makes the coin forever unspendable in the off-chain system.
For the colored-coin example, you might do something similar you could pay the coin to an unspendable address that was really some hash of the intended bitcoin destination and a nonce, or you'd could just have the funds pass through a transaction that has a zero value OP_RETURN {hash like above}, not discarding the coin at all, but signaling to all the other users of your colored coin (by convention established in the coding of your witness) that it has lost its color in order to move it back to Bitcoin. (This latter case means you would need to be using special colored coin software for your intermediate transactions that watched for this happening in order to avoid getting cheated).
In either case the nonce would never be made public, for privacy sake. It would be placed in the transcript, however, so that the witness could verify that you will be paying the correct Bitcoin destination, which you include as a public input so it can be checked against what you actually provided (I suppose technically it would be a hash of the masked redemption transaction and sighash flags and not an address, but ... implementation details).
But whatever you're doing for double-spending prevention off-chain you'd effectively be using the same thing here to convince the witness that you're not double-spending your redemption. Or not none of this actually requires you to be secure against double spending if you don't want to adopt rules which are... but, uh, no one else may want your double-spendable coins.

If so it's still superior to existing off-chain systems in that funds can only be frozen, not confiscated.
And even there, part of the reason I gave an example using a hash blinded oracle is that it was one where the trusted part is so deprived of information it could only prevent spending by shutting down completely (and even not then if you use M of N them) or if the penultimate off-chain holder cooperates with it.
I could easily imagine witness designs that reduce the frozen funds risk further in exchange for some double-spend risk. For example, if there is an alternative way to redeem the coin by providing a incomplete transcript (e.g. one where you're the final owner, but there is no transfer into Bitcoin at the end) plus Bitcoin headers proving that the current time is in the sufficiently far future. There is a risk that a prior owner of the coin will steal it at this point (by truncating their transcript), but that risk may be better then allowing the coin to be frozen forever if the off-chain system fails, and would only exist if the holder of the coin failed to exit the off-chain system before their previously agreed on timeout ran out.
Interesting ... are there constraints on the set of programs or is it really 'all programs'?
It's turing complete, and for the purposes here it really is all programs. The system emulates a specialized harvard architecture virtual machine and cannot, as currently defined, run self-modifying code. The execution environment can only do input by reading from two one way input tapes (one for public inputs, one for private ones), and can only output by "accepting" or failing to accept by the time the user specified time limit is up. None of this is problematic for the use we'd want here e.g. any output you make an input, and have the program accept only if it matches. The primary limitation that would confront users doing what I've proposed is that longer proof computation time results from longer execution time and larger proof construction memory (and also larger programs, but the growth is well controlled).