I've actually implemented the algorithm but doesn't perform as well as expected with 30 blocks lookback. It however does no adjustments before block 30.
I ran it on testnet, and began with CPU mining at mininum difficulty, at about 9h30PM. I found 16 blocks in about 30 minutes, then kicked in some GPU power. So I multiplied the hasrate by a lot (I didn't note my CPU hashrate, so unfortunately I can't give the increase factor, but it was a lot, since I went from less than 100Khps to 4Mhps, let's say at least a 40x increase).
I managed to mine up to block 108, at around 0:00AM. So already, we were ~25 blocks of what should be expected (75 blocks) in 2.5 hours. I went to bed, at 8:00AM next morning, I was still at height 108. I was late for work, so I pointed my miners back to my pool, and left, unfortunately, not taking the time to see why it didn't mine anymore.
But something definitely was not right. It was apparently my second instance that disconnected, stopping any further mining.
I need to run more tests with different lookback values. I will get back here when I've actually ran more tests and found the right balance. The issue is that with the variance involved in mining, it is really difficult to detect whether a low block time is due to luck or hashrate increase. It might be that a 10% variation is a too sensitive value for difficulty adjustments, so I also need to test with allowing a higher variation (15%, 20%, 25% ?) and combine it with different lookback values.
However, if I want to be efficient, I need to write test cases for that instead of being a lazy bum and test mining the algo...

I'm not quite sure I understand the problem, but let me suggest some stuff for the variation problem. I'm not a statistician, but it seems like 2 standard deviations from the mean is a good trigger. How about you take the last 30 samples, compute the mean and standard deviation. If the new sample is more than 2 standard deviations from the mean, then force a retarget. In a random process (as I remember it), >95% of samples should be within 2 standard deviations of the mean. That means that less than 5% of retargets would be from a random event, and 95% of the would be triggered by changes in hashing power.