Finally. Something interesting. You have qutie the interesting way of structuring your kernels for CryptoNight. And I must say - the number of waves in flight you've managed by doing so is hot.
Thanks, Sir.
Yes, the scheduling is very unorthodox, it uses 2 command queues which are started with an defined time offset. Still searching for an explanation why this faster than one larger queue. The scheduler does get the same work, but for some reason it likes it better that way. Probably very Vega specific but did not yet try it on different cards.
Cryptonight already has good multithreaded miner - xmr-stak-amd and your miner doesn't outperform it ...