JayDDee,
Thanks for your comments, I'd leave it at that for now as many others would be reading this thread.
My pleasure.
off-topic:
I'd think also that using features like NEON SIMD have severe implications on high core count ARM chips, e.g. those 'Ampere' chips
That is correct. SIMD reduces the number of instructions but increases the load of each one, like doubling or quadrupling the load on a truck.
It's more efficient but you need a strong truck. SIMD also increases the load on the memory system to try to keep the CPU fed. The end result
is more heat everywhere.