The way the client calculates the proof of work for each block is that it does the calculation only once.
Effectively, each block has a number which means "Proof of work for the chain, if this block was the leaf".
It also has a PoW for best chain variable. If you add a block and it is higher than the best chain, then it has to set that chain as the best.
Under your system, when a new block is added, it would be more complex.
A <- B <- C <- D
\ <- C' <- D'
\ <- D''
When comparing to D' to D, D' would get extra weight from D''. However, when comparing D' to D'', it wouldn't get the extra weight. There isn't one "work" number for each node.
You are right, it may not be as simple as just keeping the one number of "proof for the chain as if block was a leaf" but in principle the following approach works (and its refinement below is probably much more efficient):
Approach 1 (very naive):1) For each block, save the total amount of work in its subtree. Let us denote this saved value by W(X)=total work of subtree rooted at block X.
In your example above: the subtree of block C includes blocks C,D. The subtree of block C' includes C',D',D''.
2) When a new block arrives, simply increment W(X) for all blocks X along the path to the genesis by the amount of work represented by this new block.
3) If the new block was off-chain, find its connection to the main chain and check if a change to the chain is needed.
(the 2nd step is computationally expensive so the following refinement can be done)
Approach 2 (Lazy evaluation):The idea here is to still maintain values for total work in the subtree rooted at each block, but to do the updates only when required.
For each block on the main chain we maintain a lower bound on the work in its subtree, for each node off-chain we maintain an exact number. The idea being that off-chain branches die out after a relatively short while, and on-chain blocks that are sufficiently deep in the tree do not need to be examined often.
Invariants: W(X) for any block X that is off chain is exact, W(X) for any block X that is on the main chain is a lower-bound on the real value.
The updates:
When a new block is added:
1) If it extends the current chain, do not update W(X) for any block.
2) Otherwise, if it is off-chain then traverse its path (towards the genesis) until it meets the main chain, and increment the work W(X) of each subtree along the way accordingly. Also traverse the main chain up to this point and update W(X) for each node along the way.
3) check to see if a change of chain is needed. If it is, the off-chain values are exact and can be used to find the new best leaf by following the greedy path towards the leaves.
I hope I explained this well enough for you to at least get the basic idea... Please let me know if I didn't, or if you spot a mistake.