I think this is wrong. Although I primarily mine using Linux, I have a Windoze box that I use for testing cards. GPU-z appears to show only external bus bandwidth use (to the GDDR), and not the utilization of the bandwidth between the controller and core. In practical terms, a miner kernel may be using 200GB/s of memory bandwidth, but a significant percentage of it can be from the L2 cache. The collision counter tables in SA5 would be an example of this.
Any news about sa6?
Been fighting with amdgpu-pro drivers for the past day and finally got them working with a Rx 470 and R9 380. I still have to swap out a couple R9 380s from another rig for R7 370s since the amdgpu-pro drivers don't work with Pitcairn. Then I'll have one rig still running Ubuntu 14.04/fglrx with a R9 380 and a few R7 370s, and another rig running Ubuntu 16.04/amdgpu-pro with a few R9 380s and a Rx 470. That will give me the ability to test kernel tweaks on amdgpu-pro instead of relying on Marc.
Latest pull for 4.9 has SI support.