Our proposal includes keeping the hashes of off-chain blocks in each block (new ones not included in any of the ancestors of the block). The chain, when read sequentially therefore contains all off-chain block hashes, and thus any node can know the entire tree (if a node is missing some block it simply requests it or its header).
Interesting.
That eliminates the need for each sub-tree to be evaluated. The weight for each block can be directly calculated purely from within the block.
You would need to embed a sub-header into the coinbase (or OP_RETURN tx).
Each node would have to remember all headers from orphaned blocks that are referenced. A block would be invalid if it re-references a header.