Post
Topic
Board Reputation
Re: UPDATING Viewing unedited/deleted posts (search per post, per user or per topic)
by
LoyceV
on 22/02/2020, 11:08:03 UTC
If it is not a secret, how much data space is needed for all that millions of posts?
I'm currenly using 54 GB for loyce.club, and store 4.2 million files.

Quote
And is there a way to use some compression?
I mainly store HTML-files, so indeed, it would be great if a webbrowser would just be able to use index.html.gz to largely reduce the disk space consumption, but I just tested it and my browser doesn't get it.



Due to my lack of time it took longer than I wanted, but I now added live updates for posts per user and per topic:
Viewing unedited/deleted posts

How to use it
  • Find the msgID, userID or topicID you need. Let's use msgID 51902990.
  • Remove the last 4 digits from the msgID to get the directory name (if there are less than 4 digits, use 0): 5190.
  • Put everything together behind the (above) URL and add ".html": http://loyce.club/archive/posts/5190/51902990.html.

Details
  • Files are stored with their msgID, userID or topicID as file name. I remove the last 4 digits to create the directory name. Each directory contains up to 10,000 HTML-files. Use CTRL-F to find what you're looking for.
  • I don't scrape hidden boards (such as Investigations).
  • I don't keep post titles
  • I save raw HTML, including quotes
  • If I run out of disk space, I might create compressed archives per 10,000 posts.
  • Although I plan to preserve all data, I make no guarantees. Feel free to archive posts.
  • My current (sponsored) webhost has enough storage space for years to come.
  • All scrape-times use Amsterdam time (CET).
  • Usually, I capture at least 99.95% of all posts. Server or internet connection problems can severely reduce this.

Examples