Viewing unedited posts and deleted posts

February 22, 2020: All updates are now live!

Ever wanted to see who's lying when a post has been edited or deleted? I may be able to help!

I archive most posts within seconds after they are created (before any edits). I started this data collection around the time I started this topic. All data I have since then is available online.
I also have older posts: I've saved (most) unedited posts (6.2 million posts) since September 12, 2018, until the start of this topic. This data has not been added to this topic, and I can't really add it because I tried to remove quotes and that has some bugs. You can request to dig up unedited data when needed.

Viewing unedited/deleted posts

See http://loyce.club/archive/posts/ for all posts (Working!)
New posts are archived within seconds after being created, and instantly available.
See http://loyce.club/archive/members/ for posts made by a certain user (Working!)
Updated every 5 minutes.
See http://loyce.club/archive/topics/ for posts made in a certain topic (Working!)
Updated every 5 minutes.

How to use it

Find the msgID, userID or topicID you need. Let's use msgID 51902990.
Remove the last 4 digits from the msgID to get the directory name (if there are less than 4 digits, use 0): 5190.
Put everything together behind the (above) URL and add ".html": http://loyce.club/archive/posts/5190/51902990.html.

Details

Files are stored with their msgID, userID or topicID as file name. I remove the last 4 digits to create the directory name. Each directory contains up to 10,000 HTML-files. Use CTRL-F to find what you're looking for.
I don't scrape hidden boards (such as Investigations).
I don't keep post titles
I save raw HTML, including quotes
If I run out of disk space, I might create compressed archives per 10,000 posts.
Although I plan to preserve all data, I make no guarantees. Feel free to archive posts.
My current (sponsored) webhost has enough storage space for years to come.
All scrape-times use Amsterdam time (CET).
Usually, I capture at least 99.95% of all posts. Server or internet connection problems can severely reduce this.

Examples

The unedited version of this post: http://loyce.club/archive/posts/5190/51902990.html
(the layout looks better in more recent archived posts)
All posts made by me: http://loyce.club/archive/members/45/459836.html
(obviously only since I started archiving posts)
All posts made in this topic: http://loyce.club/archive/topics/516/5167469.html
(the first posts aren't shown because of the slightly different format used for my archive)

Older posts

Quote from: LoyceV on February 02, 2020, 06:05:42 PM

Sneak preview: http://loyce.club/archive/oldposts/
How to use:

Find the msgID you need. Let's use 28228
Remove the last 5 digits from the msgID to get the directory name (if there are less than 5 digits, use 0): 0
Replace the last 2 digits of the msgID by xx, and add .html (if there are less than 5 digits, use 0xx): 282xx.html
Add "#msg" and the msgID: #msg28228
Put everything together and go to http://loyce.club/archive/oldposts/0/282xx.html#msg28228

Limitations

Currently, the first 6.1 million posts are available.
I'll scrape the first 5.21 million topics and all posts in there.
That means I'll archive 53.36 million posts, this partially overlaps with my scraper for new posts.
This is a one-time thing, I won't update it with newer posts (I scrape unedited versions for those).
The time "scraped on" is Amsterdam time.

If no username is mentioned, it's either "Anonymous" or "random". I forgot those exist when I started scraping, and it's not important enough to start over.

If anything goes wrong, let me know here.

See [overview] LoyceV's useful data on Bitcointalk for more of my forum-related topics