Is it non-pointless even if writing it in assembler?
The necessary double SHA-256 hashing in Bitcoin's address generation process is substantially slower than EC operations, ensuring that even perfect optimization of point additions will not eliminate the hashing bottleneck.
Therefore, whatever assembler optimizations and algorithmic tricks you can imagine to push the boundaries, they do not overcome the inherent speed limits imposed by modular inversion and hashing.
Double SHA-256 hashing is the ultimate bottleneck
