So am I understanding the speedup is by recognition of signatures by a particular key within a single block?
The signature test requires 2 point multiplications. One of them is a multiplication by the generation point (G) and the other is by the public key (Q).
Assuming you are checking 4 signatures, you normally check:
R1 = a1 * G + b1 * Q
R2 = a2 * G + b2 * Q
R3 = a3 * G + b3 * Q
R4 = a4 * G + b4 * Q
This can be simplified to
(R1 + R2 + R3 + R4) = (a1 + a2 + a3 + a4) * G + (b1 + b2 + b3 + b4) * Q
Point addition and normal integer addition are (relatively) very fast. That reduces the number of multiplies to 2, no matter how many signatures.
However, if Q is different for each signature (different public key), then you can only reduce the complexity of the (a1 + a2 + a3 + a4) * G. So, at most half of the multiplies are eliminated in that case.