Post
Topic
Board Reputation
Merits 47 from 25 users
Topic OP
Viewing unedited posts and deleted posts
by
LoyceV
on 21/07/2019, 17:08:59 UTC
⭐ Merited by 1miau (10) ,DdmrDdmr (5) ,OmegaStarScream (3) ,Ucy (3) ,TMAN (2) ,Rikafip (2) ,Halab (2) ,DireWolfM14 (2) ,bitmover (2) ,Steamtyme (1) ,JayJuanGee (1) ,nutildah (1) ,hd49728 (1) ,lulucrypto (1) ,vapourminer (1) ,FontSeli (1) ,dragonvslinux (1) ,Coolcryptovator (1) ,TheQuin (1) ,Coin-1 (1) ,0x256 (1) ,sujonali1819 (1) ,wildan88 (1) ,Rrita (1) ,kaggie (1)
February 22, 2020: All updates are now live!



Ever wanted to see who's lying when a post has been edited or deleted? I may be able to help!

I archive most posts within seconds after they are created (before any edits). I started this data collection around the time I started this topic. All data I have since then is available online.
I also have older posts: I've saved (most) unedited posts (6.2 million posts) since September 12, 2018, until the start of this topic. This data has not been added to this topic, and I can't really add it because I tried to remove quotes and that has some bugs. You can request to dig up unedited data when needed.

Viewing unedited/deleted posts

How to use it
  • Find the msgID, userID or topicID you need. Let's use msgID 51902990.
  • Remove the last 4 digits from the msgID to get the directory name (if there are less than 4 digits, use 0): 5190.
  • Put everything together behind the (above) URL and add ".html": http://loyce.club/archive/posts/5190/51902990.html.

Details
  • Files are stored with their msgID, userID or topicID as file name. I remove the last 4 digits to create the directory name. Each directory contains up to 10,000 HTML-files. Use CTRL-F to find what you're looking for.
  • I don't scrape hidden boards (such as Investigations).
  • I don't keep post titles
  • I save raw HTML, including quotes
  • If I run out of disk space, I might create compressed archives per 10,000 posts.
  • Although I plan to preserve all data, I make no guarantees. Feel free to archive posts.
  • My current (sponsored) webhost has enough storage space for years to come.
  • All scrape-times use Amsterdam time (CET).
  • Usually, I capture at least 99.95% of all posts. Server or internet connection problems can severely reduce this.

Examples



Older posts
Sneak preview: http://loyce.club/archive/oldposts/
How to use:
  • Find the msgID you need. Let's use 28228
  • Remove the last 5 digits from the msgID to get the directory name (if there are less than 5 digits, use 0): 0
  • Replace the last 2 digits of the msgID by xx, and add .html (if there are less than 5 digits, use 0xx): 282xx.html
  • Add "#msg" and the msgID: #msg28228
  • Put everything together and go to http://loyce.club/archive/oldposts/0/282xx.html#msg28228

Limitations
  • Currently, the first 6.1 million posts are available.
  • I'll scrape the first 5.21 million topics and all posts in there.
  • That means I'll archive 53.36 million posts, this partially overlaps with my scraper for new posts.
  • This is a one-time thing, I won't update it with newer posts (I scrape unedited versions for those).
  • The time "scraped on" is Amsterdam time.

If no username is mentioned, it's either "Anonymous" or "random". I forgot those exist when I started scraping, and it's not important enough to start over.

If anything goes wrong, let me know here.



See [overview] LoyceV's useful data on Bitcointalk for more of my forum-related topics