Mathematically, it can only be about 3% [although ever so slight variation due to quantum effects, meaning you submit single shares, not partial shares, so there is variation in any given analysis window and also a change in the amount of work dumped or lost as stale when a block changes (the faster the miner, the less stale in a perfect world I think)].
The reason that it has to be about 3% is it was a rearrangement such that fewer computations on the GPU needed to be performed for a given unit of work ... so the performance increase is actually fixed.