I thought about storing only the post diff, but then I need to rethink how my search engine works right now. Maybe there is another solution...
Have you ever considered having a manual scraping button? For example, a user would visit your page related to the post and click the button, and the scraping would be executed.
To prevent abuse, you would put a general time limit until the button is used again.
Yes, but which time limit? Even if it's every 24 hours, or 72 hours, rescraping a entire post can take a lot of space the way it works right now (where the entire post is saved, even if the difference between versions is a single character).