I think the silentarmy improvements we are seeing are an example of open source development working well. I've known for several days that the the atomic_inc in ht_store was a speed bottleneck. I thought if all the counters were in their own table, it could improve L2 cache hit rate. The table would be 4MB, which is still large compared to a 512KB L2 cache. In my discussions with eXtremal, he had an idea for using a single uint for multiple counters by using atomic_add instead of atomic_inc. The two ideas together (and eXtremal's fast coding) made the latest optimizations.
If we were both working on closed-source miners, these improvements would've taken much longer, or may not have happened at all.