EDIT: Also, cpu usage by cgminer seems to have greatly increased. I noticed these things with two machines running Windows 7.
I wanted to post the same observation. I've been upgrading every few releases and had been on 3.8-3.9 for a long time with a few BFL units. In 3.9 the CPU usually averages around 15%-16%. Going to 4.0.1 the CPU usage slowly increases and levels off at ~35%.
Ubuntu 12.04.4 x64, 3.11 kernel, gcc 4.6.3
CFLAGS="-O2 -Wall -march=native" ./configure --enable-bflsc
Yes that's almost certainly the queue increasing in size automatically, which I will likely remove next version.
Probably CPU increase is due to define LOCK_TRACKING 1
Or at least disabling it is reducing CPU usage for me

In 3.8-3.9 it was disabled if i remember correctly
@@ -690,7 +711,7 @@
* So, e.g. use it to track down a deadlock - after a reproducable deadlock occurs
* ... Of course if the API code itself deadlocks, it wont help

*/
-#define LOCK_TRACKING 1
+#define LOCK_TRACKING 0
Crap, good pickup. I must have enabled it by mistake during debugging and it ended up in one of my commits.