This paper is about a way of implementing GALS (Globally Asynchronous Locally Synchronous) logic. IMO it won't help at all for SHA256D. It could probably work for something like scrypt() which is a sandwich of two layers of PBKDF2 with Salsa20 in the middle. Fixed length SHA-256 like in Bitcoin is too trivial to be susceptible to such advanced optimizations.