Millions more posts added:I have now archived the first 35.5 million posts, all available online. This currently filles 43 GB.
Example:
my first post!
See this quote on how to use it:
Sneak preview: http://loyce.club/archive/oldposts/How to use:
- Find the msgID you need. Let's use 28228
- Remove the last 5 digits from the msgID to get the directory name (if there are less than 5 digits, use 0): 0
- Replace the last 2 digits of the msgID by xx, and add .html (if there are less than 5 digits, use 0xx): 282xx.html
- Add "#msg" and the msgID: #msg28228
- Put everything together and go to http://loyce.club/archive/oldposts/0/282xx.html#msg28228
Limitations- Currently, the first 2.1 million posts are available.
- I'll scrape the first 5.21 million topics and all posts in there.
- That means I'll archive 53.36 million posts, this partially overlaps with my scraper for new posts.
- This is a one-time thing, I won't update it with newer posts (I scrape unedited versions for those).
- The time "scraped on" is Amsterdam time.
If no username is mentioned, it's either "Anonymous" or "random". I forgot those exist when I started scraping, and it's not important enough to start over.
This bug is not fixed yet:
I found
a bug (which I'm posting here as a reminder to myself): Posts on the
עברי (Hebrew) board don't show up. Example:
this post is missing, while
it exists.
I'll see if I can add them later. I think it has something to do with the right-to-left writing, even selecting text on that board doesn't work as expected.
Update:
عربية (Arabic) has the same problem.
I'll re-scrape these boards after finishing scraping all posts.