Have you ever considered scraping the bbcode of posts instead of the html? Or at least scrape both of them?
I scrape posts as a guest, without logging in. That means I only see HTML, there is no BBCode.
The only way to see BBCode of a post would be by clicking "
edit" (for my own posts) or "
quote" for posts from other users. And that's only possible if the topic isn't locked.
I guess the forum database stores posts as BBCode, but that's above my pay grade.