could you give me the config you use, i blind-benched myself and found the speed to be the same. maybe the regression is in some specific cases only?
i've just rebenched b6 and b7 and got almost the same speed. so i need your config and a confirmation the regression you observe is between b6 and b7
As a desperate attempt, a did a formal review between the b6 and b7 code and reverted, for Vega, all the differences.
If it doesn't fix, so i'm running out of ideas.
Online is the
0.33b12 GPU with that fix, and an extra little optim for all cards. I benched it to be ~0.2% within a range of -0.3% and +0.7%, so it should increase the hashrate, but it's so subtle that i'm not even sure the gain is positive.

edit: re-released with the optim restricted to smaller cards