...Because nodes are very likely to have already seen a transaction when it was first broadcast...
So transactions still need to be distributed from every node to all others. That's not a solution.
Actually, after reading it I think scalability concerns are addressed very poorly there.
They propose all sorts of tactical ways to affect K in the "K*n2" formula, while tip-toeing around the strategic issue of n2.
The K is irrelevant, that's why we have the big-O notation.
It seems to me that so long as there is a single chain of all transactions, the system is not scalable, super-nodes or no super-nodes.
Because workload is not spread between super-nodes, but rather duplicated and in the end every (super)node has to process ALL transactions in the world...
Ok, I don't want to hijack the thread, sorry

I will shut up now.