Post
Topic
Board Meta
Merits 1 from 1 user
Re: "Multiple Accounts" / Copy-pasta detection scripts/bots
by
qwk
on 19/09/2018, 16:27:46 UTC
⭐ Merited by Initscri (1)
Detecting the text spinners will be a whole different level!
I guess a quick and dirty approach could be something like this:
1. take samples of all occurrences of 4 consecutive words
2. create their md5 (or whatever you prefer) hashes
3. store those hashes in a database
4. count number of hash collisions with other posts

So, a simple text like:
The quick brown fox jumps over the lazy dog

would result in 6 individual hashes:
The quick brown fox
quick brown fox jumps
brown fox jumps over
fox jumps over the
jumps over the lazy
over the lazy dog

Tinker a little with the number of words and the threshold for detection of duplicates, and you're probably almost there for a large share of the copy-pasta spam.