Do you care to share the raw data? I would like to independently verify your claim.
It should be moderate to do a back of the envelop calculation.
If a node has 24 peers, then it will send/receive 24 invs. That is 36 * 24 = 864 bytes. It also has to receive the transaction at, say 400 bytes.
It will also receive the transaction in the full block for another 400 bytes (invs being negligible).
With blocks only, it is only receives 400 bytes. That gives an efficiency of 400 / (400 + 864) = 0.31.
If inv messages were only 33% efficient (due to headers etc), then that gives 400 / (400 + 864 * 3) = 0.13.
That is pretty close to the stated number.
I wonder if simple Bloom filters would be the way to go (rather than IBLT). The sender sends a Bloom filter (sender_filter) containing all transactions that it added to its memory pool since the last (5 minute) broadcast. The receiver creates an empty Bloom filter (receiver_filter) with the same parameters. It finds the transactions in its memory pool that match sender_filter and adds them to receiver_filter. If both filters end up matching, then it assumes that it has all the transactions.
If not, it XORs the two filters (=rle_filter) and sends the result as a reply. Run length encoding could be used, since the XOR should be mostly strings of zeros.
The sending node generates receiver_filter using (sender_filter XOR rle_filter). It sends on any transactions that were in sender_filter that aren't in receiver_filter.