Post
Topic
Board Meta
Re: Patrol: "Great Project". Cleanup request/discussion for Mods.
by
jackg
on 18/02/2018, 20:19:27 UTC
I'm not sure where you'd put it. I think most suggest pastebin for plain text and I assume it's raw html that you have gathered.
Lol no, I can't paste 1.5 GB into pastebin Cheesy

Quote
Is there a script you have used to gather this?
Basically, I wget "https://bitcointalk.org/index.php?action=recent;patrol" every 5 minutes.

Quote
I don't really want to kill the forum's server by sending pings every 5 minutes.
You're allowed 1 page per second, 1 per 5 minutes isn't going to kill the server. I do more than that just browsing.

Quote
I did have a script that attempted to clone every page on this forum directly from the forum's server (starting with 1) but it didn't incorporate the 1 second request limit so it just kept on getting a 502/503 server too busy page
A one second delay between requests fixes that. But, with more than 30 million posts, that will take at least a year. A huge waste of resources if you ask me.

Quote
Did you report their posts? They do nicely fit this thread:
Great project!
Good project Smiley

There's a site with a 2GB share limit (wetransfer I think it's called) you can probably get a link from that or send it to jackgbtc@gmail.com whenever you get the time. Or I might just start now.

And actually, you can do 20 posts a second, still very slow but...
Could always go since my first post as it can't be that bad. Maybe I can get it in bulk from archive.org I know there's options to do that but it's still a lot.

I will start reporting and patrolling tomorrow to offer help I'm not great at reporting posts/assessing others' quality but I'll try nonetheless I can't get anything from reporting stuff - report those two though if you can. FYI (interesting project also returns a lot of results).