OK another question related to the decryption algorithm itself. We make L = ceil(log2(M)) rounds of XOR and cyclic shift encoding. However for small input data this means we're only running a few rounds. In addition to this, in each round we are also searching for the correct w0 and w1 parameters. Won't this be vulnerable to brute-forcing for both small and large L, since step 2 is essentially "find w0 and w1 that satisfy Si XOR w0 and Si XOR w1 respectively"?
Also by making the number of rounds dependent on the input size you are making this vulnerable to timing attacks. AES uses a constant number of rounds with word shuffling and shifting operations in each round so perhaps you can look into doing something like that to bullet-proof it.