basically, add a __local key keyword to this line and it should increase your performance.
Sorry, this doesn't work for me, performance dropped 33% instead.
Are you using an ATI card? I know that I get a similar drop on my Nvidia, but I only have one and no ATI. This may only work with ATI, all the reports are with ATI cards.