I thought about storing only the post diff, but then I need to rethink how my search engine works right now. Maybe there is another solution...
Have you ever considered having a manual scraping button? For example, a user would visit your page related to the post and click the button, and the scraping would be executed.
To prevent abuse, you would put a general time limit until the button is used again.