Re: A wave of bans: 400 yesterday, 300 the day before. What changed?

Quote from: qwk on June 01, 2019, 09:00:22 PM

Quote from: asche on June 01, 2019, 08:00:51 PM

I am mostly interested in the tool - the bot - used to actually report the plagiarists.

The who is mostly irrelevant, unless the who talks about said tool he/she/it uses.

I'm not sure if my "prime suspect" has been mentioned yet, but feel free to review an older thread on the topic:
https://bitcointalk.org/index.php?topic=5032322

I am not sure who your "prime suspect" is, but I found a "confession" in that thread:

Quote from: suchmoon on September 19, 2018, 04:07:13 AM

I'm experimenting with some NLP techniques for plagiarism detection and the results are promising although scalability is a bit of an issue. Currently working just on comparing Bitcointalk posts (not to outside sources).

Quote from: suchmoon on September 19, 2018, 05:01:24 PM

I experimented with n-grams a little bit and couldn't find a good value. Low n yields too many false positives, high n doesn't detect spinners, etc. So I'm using a mixture of algorithms and base the decision on the pattern of the results of those algorithms - e.g. if the similarity of two texts using algorithm A is 70%, then union/intersect/otherwise manipulate the texts, run algorithm B, if it scores 90% then run algorithm C to eliminate false positives - made up numbers but you get the idea. Works ok-ish, but as I mentioned it doesn't scale well and I need to do more testing on larger samples.