The script being modified is the input script. There are lots of ways they input side of the tx can be modified in such a way which doesn't change the inputs or outputs but does result in a slightly "different" tx and when that is hashed produces a slightly different hash.
Here is just one example.
ECDSA signatures require a random nonce (S) for security. If you remember the talk about broken RNG and stolen bitcoins this was because a few wallets reused the same S on mutiple tx and this allows one to compute the private key used for signing.
ECDSA has a property that for a given payload and private key both S and -S will produce the same signature. Thus an attacker could take any tx and invert the sign of S and the signature will still verify. However when you now hash they produce unique signatures.
To prevent mutability of the tx hash requires creating new rules for Bitcoin that limit the scope of a valid transaction. For example if going forward clients ONLY used an +S value, then -S values could be considered non-standard (and eventually invalid). This would prevent an attacker from changing the S value to produce the same signature. Bitcoin would be more restrictive then the underlying ECDSA. All Bitcoin signatures would be valid ECDSA signatures, but not all valid ECDSA signatures would be valid Bitcoin signatures.
This is just one example there are other elements of the tx protocol which allow similar mutability (without changing the core tx itself, inputs used, outputs used, value to each output, and tx fees). All of these will have be the restricted to a single valid version to make tx ids immutable.
Sipa has done a lot of good work in outlining the problem and based on what I have read I believe it is only a matter of when not if, tx ids are immutable.
Would it help if the protocol was changed so that, from a certain future block onwards, all transactions must identify any previous transaction they're spending one or more outpoint(s) of,
not by the hash of the whole previous tx, as is required today, but instead just by the hash of
the core parts of the previous tx? (its inputs/outputs/values, not its signatures.) After all, the purpose of a txid is merely to
identify a tx, not to "bless" it as valid. That blessing has already happened when either the free-floating transaction was checked (including signature-checked of course!) and accepted into one's mempool, or a block containing the transaction was checked and found to be valid, or both.
This has the wonderful consequence of rendering unnecessary the endless "arms race" of finding sources of malleability and shutting them down one by one - inventing endless new rules (relay ["standardness"] rules or, eventually, validity rules) about how to write an integer, how many OP_DROPs to put here or there, what syntax to use for ECDSA signatures, etc etc ad nauseum. None of this would matter any more. The new-style txid would stay the same.
Validating nodes would need to do a re-indexing of their UTXO database as the switch moment approached, re-labelling all entries as "outpoint #n of [new-style txid for that tx]" rather than "outpoint #n of [old-style txid for that tx]", in order to be ready to do quick lookups from the switch moment onwards. However, this is a purely private matter for each node, to help with its efficiency. Publicly, the stream of new transactions and blocks would simply, seamlessly, switch to using the new-style txids.
Does this break anything?