Post
Topic
Board Meta
Merits 42 from 19 users
Topic OP
LoyceV's Topic Details: highlight deleted and edited posts (forum wide)
by
LoyceV
on 20/06/2020, 07:42:25 UTC
⭐ Merited by dbshck (8) ,Welsh (6) ,fillippone (6) ,OmegaStarScream (3) ,o_e_l_e_o (2) ,DdmrDdmr (2) ,Vod (2) ,Rikafip (2) ,cryptoaddictchie (1) ,hosseinimr93 (1) ,dragonvslinux (1) ,Rizzrack (1) ,SFR10 (1) ,GazetaBitcoin (1) ,Daniel91 (1) ,TheBeardedBaby (1) ,Poker Player (1) ,BitMaxz (1) ,vapourminer (1)
Archiving a thread is now as easy as posting the right link anywhere on the forum (and waiting a bit)!

Short version:
Get a topicID you want to see, for instance 5145594.
Insert the topicID into the following link and post it on any public board on Bitcointalk: http://loyce.club/archive/details/topic_5145594.html
Wait a bit, then click the link!

Full version
Almost a year ago, I opened 35M posts! View unedited/deleted posts (search per post, per user or per topic). This has a lot of data (currently around 60 GB), but it's painstaking to manually find exactly which posts in a topic are edited or deleted.
I started archiving posts in July 2019. Especially at the beginning I missed some posts due to down time, and even now I occasionally miss some posts due to connection problems.
Since February 2020, I'm scraping and archiving all older posts too (this will take a couple more months to complete).

What it does
I've created an on-demand service to get details from any topic for:
  • All posts that I didn't archive yet*
  • All posts that have been edited*
  • All posts that have been deleted
  • All posts that received Merit (not implemented yet)
*I create a new archive of every edited or unarchived post.

The Topic Details
If a new update for the same topic is requested, I'll include a list of all previous Topic Details.
You'll have to make a new post to be detected by my scraper. Editing an existing post won't get detected.
Please don't quote the archive link, it'll trigger another update.

Sample output explained
Image loading...
Post 50988796 links to the post on Bitcointalk (even if the post itself has been deleted).
Post 235 is older than my archive. I don't have an unedited backup, so I created a new backup.
Post 236 is Deleted! I have an unedited backup, no need for a new backup.
Post 242 was Edited! I have an unedited backup, and created a new backup of the current post.
Post 246 is Unedited! No need for a new backup.
Post 251 doesn't have an unedited backup (which means my scraper was offline at the moment), so I created a new backup.
The (link) at the end of each line points at that specific row in my list.

Image loading...
If I have no unedited backup, I check if I made a later backup. This backup can't tell if the post was edited in the first months (or even years), so I don't mark the post as Edited! or Unedited!. However, if the post was edited after I created the backup, I make a new backup.
If a post was removed before I tried to archive it, I (obviously) can't list it.

Limitations
  • Only one request per post. If you post more than one request, only the first one is processed.
  • I didn't implement multitasking, if my scraper is busy (see status.txt), you'll have to wait a bit and post a new request (in a new post). It's okay to delete or edit the post afterwards.
  • Topics in Investigation are ignored.
  • Creating an overview takes about 10 seconds per page (to limit load on the forum). It might take a bit longer for topics with many deleted posts.
  • Every quote has "Today" in my archived post, while the actual post now shows the date. I ignore this when comparing the current post and my archive. A few seconds around the end of each day, this can lead to a post accidentally being marked as edited.
  • My initial plan was to make this for a user's post history too, but it was much more work than anticipated, so skipped that.
  • This service is currently limited to scraping the first 100 pages of a topic. If that's not enough, or you want for instance pages 400-500, feel free to post your request.
  • Scrape time is Amsterdam time, but the time mentioned in scraped quotes is forum time.

Test it!
Please try it, and let me know if it works as expected.

Bugs
Please post! This is far from finished.

Intended use
I'm hoping this can be useful to expose certain scammers. Please don't turn this into a(nother) witch hunt.

Be nice
Don't try to abuse this. I don't want to make a blacklist, but I will if I have to.

Todo
  • Fix "Today" for today's posts.
  • Image tags seem to change within the HTML code over time, so unedited posts with images might be marked as edited. I'm not sure yet how to tackle this.
  • Show Merit per post