Therefore, if you use _mm512 (and have a compatible processor for it), the speed can reach up to 12 MK/s per core
So, you use enterprise processors like AMD EPYC to handle AVX-512?

Why do you care what I use? Planning to build a competing setup in your garage?