ag1233, I have written only i386 and amd64 asm code with and without SSE2 support. What's in scrypt-arm.S isn't of much importance to NeoScrypt in general as Salsa20 constitutes a rather small part of it. Sure, NEON can speed things up even if compiler generated. Memory bandwidth is another question. When I checked last, 32-bit LPDDR4 powered RPi 4B couldn't reach 5GB/s on memory reads or writes. Although a quad core 1.5GHz Cortex-A72 with 1Mb L2 cache doesn't seem a poor performer, modern high end smartphones are much better in this regard.