I'm not talking about the trick with cheap second point, I mentioned that everybody knows about it. But noone could use this trick to improve K at DP>0 and you explained why. But I can and I demonstrated it. If you have any links to sources or papers that do it (improve K at any DP>0) - let me know!
Can you explain how that is, without having to be a C++ expert? Once you have P - Q you are at a point that is basically the end of a line, how does that help you further (unless maybe you save it and walk further, but this grows your herd size by 2x of course at each step). The cheap point is still one extra semi-addition (sort of), are you sure you're counting it correctly?

In my tests, as DP grows, K grows as well, but less than double, so overall it's an improvement. But then again, it depends what you are counting.
Yes I'm still sure that your non-looping method with K=1.0 (at DP>0) does not exist

Remains to be seen

It's debatable what K means anymore once the inner guts of adding points start to become intertwined and you only need parts of the final result. We'd need the hyperelliptic equations style of notation to get the complexity (how many field ops we need grouped by op type).