Im @ 3.53 with new FW 1.0
You found the reason behind the performance gain tolip.. I have two Neptunes, one with its controller card and two cubes currently stuck in a slower than communicated RMA process.
I repurposed a Jupiter controller card to run the orphaned cubes and this 3-cube machine has seen zero change on the new firmware. I believe that supports your pegged CPU observation quite nicely

That 2% was probably getting lost on the wire or flushed from a full buffer.