What exactly do you expect when the checksum only has 4-bit (for 12 words) and 8-bit (for 24 words) size?
i dont know i guess i expected the probability of a false positive to be on the order of 1 in 2^32. is that so unreasonable?
I don't see how it's possible since 4-bit and 8 bit (2^4 and 2^8) is less than 2^32.
And that's why some wallet force their user to verify and re-enter some/all of generated words.
I'm not sure that completely solves the problem. But i guess it's better than nothing.

You cannot completely solve problem caused by human error anyway.
I suspect that after some number of tries each cold wallet will do something like:
A) delete its private key(s). Not the best, but at least the thief is not rewarded.
B) each time a bad sequence is provided slow down the response. Start with, maybe 1 second of additional time, then double the time for each attempt. It could write the number of attempts into a storage location and reset it upon getting the correct seed.
This idea probably only works for hardware wallet where thief/attacker can't copy encrypted data to perform brute-force on their own computer. That's what some hardware wallet does.