I also noticed the load average is way lower on the CPU now that it was. was always above 2 before. now hovers around 1.2. any additional benefit to these devices to run with lower/higher loads?
That's likely the advantage of the newer version. In principle it will lead to less lag time in keeping the device busy and less time to process and send shares. However cgminer is so heavily multithreaded that it probably won't matter since the latency critical parts of the cgminer code are already prioritised.
Still early days since I updated my S3 with the latest cgminer (3 hrs+) but I have noticed an improvement in the pool-side reported hashrate AND I suspect a reduced DiffR. This is on top of a faster UI (which I attribute to the lower load average).
Just a question, would this cgminer run on an S1 (like to earlier Kano cgminer)?
No, it doesn't work on S1. I tried it and it's not hashing.