Another trap of empirical analysis in this kind of discussion is that
we can only measure how the system is-- but then we use that to project the future; e.g. say we didn't have ECDSA caching today, you might then measure that it was taking >2 minutes to verify a maximum size block... and yet 100 lines of code and that cost vanishes; which is bad news if you were counting on it to maintain incentives.

Good point and more accurate is "we can only measure how we think it is". We don't even have absolute proof that someone isn't running a superior algorithm or hardware solution although we can often make very strong educated inductions.