@just_a_miner
Man, you've obviously got some skills if you are reverse engineering this and making changing to it.
Instead, why don't you grab the ccminer source code, and improve the neoscrypt algorithm there? Then you have your own fork to do whatever you want with it, using actual code, instead of RE.
Here's my perspective on it, which may be very different from OP.
An attempt of improving already successful crypto algorithm implementation performance is very difficult task. In fact, it is so difficult that only a few people can do it successfully even when source code is widely open and has been around for awhile. Programming and optimizing with CUDA is not trivial and while one may get lucky and spot a bug or two that can be easily optimized, generally it takes exceedingly more time and effort the further you want to push it beyond a simple shortcut.
Another approach is to focus on improving the usability of the code, which is what OP is pursuing. This is also respectable and quite okay as long as the original HSRminer dev is not around.
If BTC is indeed going to rebound I will revisit this crazy idea of getting myself into coding miners.
Дaвнeнькo нe бpaл я в pyки шaшeк! 