Well it's anticipated that specialized hardware will be able to run parts of the algorithm more efficiently. However to run the whole hash efficiently you'd need to run all three parts efficiently (four counting SHA512), and specialized hardware to do that would look a lot like . . . . . . a CPU.
I bet you find the 2MB Scrypt is the dominant time factor.
I would have been interested in a detailed discussion on these points 2 weeks ago when there was more scope for re-engineering.
Understood. I just discovered my deeper insight into Scrypt and Litecoin this week.
I will get back to you when I have something concrete to grab your attention, even if it proves I was wrong, so that the issue is resolved.