I have a few ideas about how to optimize the long class, but I need to spend time with a profiler first and see where all the time is being spent. Has anyone tried this already?
Look at my repo. I removed from Long all codepathsthat useless for long10 implementation. Also added multiplySmall, which faster when we multiple int64 by 1,2,4,9,19,38,76.
Maybe we can optimize further but with loss of readability or insignificantly

As i can seen now,
https://github.com/rev22/curve255js have nice performance only for one reason: it have simplified math, which can't work with negative numbers. This is reason, why "verify" used this math doesn't work.
I see. It looks like you removed the long 10 class and optimized the google long class. Are you getting good enough performance or thinking of scrapping the Long class altogether?