Re: "Multiple Accounts" / Copy-pasta detection scripts/bots

Quote from: khaled0111 on September 19, 2018, 03:53:58 PM

I don't know how it works but I think there is a bot on Steemit "@cheetah" that detect plagiarism, thus developing a similar bot wont be a problem (there are many senior developers in this forum).

It will be great if you succeed to write a script that detects members sendig Merits to each others.

I don't think it is going to be hard to code such script but you will need an access to the Merit database.

There's plenty of paid APIs to support plagiarism detection externally, so if I was lazy and rich I'd use those lol. Although, I'm uncertain of their reliability.

But realistically, external plagiarism detection isn't super difficult; although it may be more difficult than internal detection. I won't go too far into details (hashing, storage methods, etc), but essentially you're taking the copy of the text (or portions of it) & matching it against search engine results / meta descriptions.
I'm sure there's plenty of other methods as well.

The difficulty will be to find sources to match against (unsure if scraping Google will be permitted, we'll see).

Point is though: if 3 different developers develop it 3 different ways (using different sources) it will be far more difficult for bots/spammers to reverse engineer/abuse.

Quote from: suchmoon on September 19, 2018, 01:29:12 PM

Quote from: Initscri on September 19, 2018, 05:16:25 AM

If you're working on plagiarism detection already, I'll probably work on multiple account detection first. Granted, multiple bots running from different developers with different sets of algorithms probably isn't a bad idea (will make it harder for bots to avoid)

I think we can certainly run multiple attacks on plagiarism as long as we coordinate to reduce overlap in which users we've reported etc, e.g. using the thread I mentioned and also https://bpip.org to check for bans.

With the little time I have available I'm still probably weeks away from a reasonably usable product and even then it would cover only a relatively small set of potential plagiarism. LoyceV mentioned that forum gets ~50k posts a day - many of which can be ignored or whitelisted but still that's a lot of garbage to sift through.

Maybe we can create some sort of central location for defining which users have been reported by bots.
If I have time, maybe I'll create something web-based, and just give out API keys to users who can prove they have an operating script.

Would just sort of be a web-based platform to set which users are reported by scripts/bots, and then it would track if those users actually have a ban through the use of BPIP (If Vod permits)

Dumping the info into a thread probably isn't ideal, but worst comes to worst we can rely on that until a more advanced system is produced.

Quote from: Jet Cash on September 19, 2018, 08:52:33 AM

If it helps you guys to know about declared alts, here are mine.

Talk Merit
JetAid

Thanks Jet Cash, if I do implement an alt detection system, I'd make the reporting of users more manual than automated.
I'm sure there's many users (such as yourself) who have alts for various reasons and aren't being nefarious and don't deserve a report.

If anyone has any further ideas for methods, keep em comin'