Post
Topic
Board Meta
Merits 1 from 1 user
Re: HOLY FUCK !!!! THE SCAMMER IS ALSO A PLAGIARIST OF GIANT MAGNITUDE - SIG BAN NO
by
PrimeNumber7
on 22/05/2020, 07:14:30 UTC
⭐ Merited by mindrust (1)
What I really wonder is, what the plagiarism bot was doing? It found lots of people with that offense but somehow skipped lauda which had multiple offenses.

Is the bot dumb or was it because the data was too big (26k posts) so it said "fuck it im skippin it"
Checking every post for plagiarism is not scalable. The more posts the userbase writes, the more processing power is required to check the next post against all other posts for plagiarism. It is also expensive to check someone's posts against all other posts in the forum, and the costs go up if you are checking if parts of posts are plagiarized. As a result, targets of plagiarism investigations need to be targeted.

To demonstrate the cost of checking for plagiarism:
If you check every group of 5 consecutive words in each post in a user's post history against every group of 5 consecutive words in every other post that exists:
Assuming the userbase has made 50 million posts, 40 million of which (80%) has at least 5 words, and the average post length is 15 words.
Each post you check for plagiarism would cause you to make 11 queries, and each query would check against 440 million rows in your database. So each post you check would need to be compared against 4.84 billion rows.

You can do some things to speed up searching, but this can only do so much. You can also randomly skip rows you are looking at, and skip queries against the rows you do look at, but this will make it less than certain you will find all instances of plagiarism.

None of this would detect the majority of the plagiarism referenced in the OP when posts are copied from external sources.