I think I may be missing something in the discussion here.
You can never test a program as a black box to see if it is correct except by exhaustively testing all possible inputs, and even then only if the program has no internal state. If you want to trust a bitcoin device that has your private keys you won't be able to test all possible inputs.
The attacks presented by gmaxwell and me go even further. Even if you would ensure that the inputs and outputs are correct, you cannot detect the malicious behavior, because the outputs provably look like regular outputs, i.e. they are following a statistical distribution that cannot be distinguished by any efficient algorithm.
This complexity is part why I'd previously proposed the alternative where the online requesting device blind the signature request, then give the signing device a ZKP that the blinded message being signed is the message being signed... The result is the that the sidechannel is reduced to 1 bit (sign/don't sign) unless the requesting device and the offline device conspire. (also the aforementioned fact that it's much easier to verify a proof than create it)
I don't get how this would prevent the leakage of private keys at all. My attack does not need to know what the message is, and it does not even need to know what the private key is. It just creates a choice of k in a way that enables the attacker to extract k from two signatures. If one knows how the wallet implementation works, it would be enough for this attack to just inject the right random numbers into the wallet's entropy source.