Not sure how this project in particular encodes the images to generate the wallet, but in theory you could be able to train a deep learning model to describe each photo with a word from the seed phrase.
That way you would be able to crop, resize, and change the appearance of the photo a bit, and still be able to recover the wallet, as the deep learning model would still generate the correct word for each photo.