Post
Topic
Board Meta
Re: Plagiarism: Where Do We Draw the Line?
by
amishmanish
on 08/09/2021, 05:18:20 UTC
The problem is that it is really not possible to check every new post for plagiarism because the cost of checking an additional post will grow for every additional post written. For example, if there are 100 posts that exist on the forum, the cost of checking a new post against all existing posts is 100 units. Once there are 1000 posts on the forum, the cost of checking a single new post against all existing posts is 1000 units. For each additional post made, it costs one additional unit to check a single additional post. This is obviously not sustainable.
Thanks for chiming in. Discussing these things is always interesting. You are talking about the time complexity of such a search and match algorithm. I read some of this stuff back when I took a course in Python. It was enlightening to read about algorithms and make small enumeration programs. Programming i guess is all about practice and actually building upon existing complexity. I did make a program to sort for myself a very poorly formatted data fed into excel in CSV forms. But having been busy in other stuff did not leave room to continue learning.

You'd first need a set of master data with all possible 6 word snippets of text from all the existing posts. (provided someone is copying only from existing Bitcoin posts). This would then have to be compared with the set of snippets formed from every new post. While this could be done, I believe the space and memory requirements would be pretty huge. Though, doesn't google do it for like, all of the internet? And Altavista used to do it at one time. Now, google has humungous capacity of course but I don't think that the old sites like Altavista had those.