What you are describing would only detect entire posts that are copied. If someone added a single word, the post would not be flagged as a duplicate. I have also noticed that some plagiarists like to copy parts of posts/content, sometimes from different sources. Also sorting as you describe is just another way of checking every post against every other post. This is how you would put everything in the correct order.
You could index parts of the text (for example the first word or the first letter of the first word) to reduce runtime, but the cost to check each post against every other post is still expensive.
What you are describing would > 4081ffba8ba8e9aa702ba47c868c86ab
you are describing would only > a4476131d8f6a76d09daabe2c0e12ff5
are describing would only detect > f805a9b6e8756ec0e7e8d9d9b9f6085e
would only detect entire posts that > 3190387c55974545cf2d2045197ee70b
Only the checksums have to be sorted and searched for duplicates. I've used that before to find matching Bitcoin addresses in very long lists, and it only takes seconds to search through gigabytes of data.