If your architecture permits it, it's not a bad idea to dump and save the raw page on every crawl. It'll be insanely useful for debugging and being able to assign blame )
Yep, that's a good idea. It'll require a bit of change to my code actually, cause of the way its built (I'll spare the details, but the crawler bot is very modular and uses different HTTP libraries for different websites where required). I'll try to get on to that tomorrow. Thanks.
