Post
Topic
Board Meta
Merits 6 from 3 users
Topic OP
Auto detect Scams, Spam and AI in the forum
by
seoincorporation
on 24/06/2025, 04:10:27 UTC
⭐ Merited by hugeblack (4) ,Vod (1) ,Lafu (1)
The forum could use AI to fight against AI, Scams and Spam.

With the right tools it could be done, but isn't an easy task, The fact that i was able to spam with AI for multiple days is a good example bout the current problem that we have on the detection systems. It was a user who reported the post while i was expecting to get busted by the staff.

So, the thing is:
1.- we as users can create tools for monitoring the forum post and find who is exhibiting bad behavior and report it to mods or in the respective threads.
2.- Mods could have better tools for their mods job.
3.- The forum could implement some integrations to the code with tools like GptZero and any IA model for the scams and spam detection.

What i have right now.

With a python script i can choose a link and get all the posts in a JSON:

Code:
from bs4 import BeautifulSoup
import json
import sys
import requests
import re

def extract_after(text, key):
    try:
        return text.split(key)[1].split()[0]
    except:
        return None

def parse_quote_header(header_text):
    match = re.search(r"Quote from:\s*(.+?)\s+on\s+(.*)", header_text)
    if match:
        return match.group(1).strip(), match.group(2).strip()
    return None, None

def extract_user_profiles(soup):
    profiles = {}
    for td in soup.find_all("td", class_="poster_info"):
        a = td.find("a")
        if a:
            name = a.text.strip()
            href = a.get("href")
            profiles[name] = href
    return profiles

def extract_quotes_recursive(container, user_profiles):
    quotes = []
    headers = container.find_all("div", class_="quoteheader", recursive=False)

    for header in headers:
        quote = {}
        link_tag = header.find("a")
        quote["link"] = link_tag["href"] if link_tag else None
        user, date = parse_quote_header(header.get_text(strip=True))

        quote["author"] = user
        quote["profile_url"] = user_profiles.get(user, None)
        quote["date"] = date

        quote_block = header.find_next_sibling("div", class_="quote")
        if quote_block:
            quote["quotes"] = extract_quotes_recursive(quote_block, user_profiles)
            for q in quote_block.find_all("div", class_="quote", recursive=False):
                q.decompose()
            quote["content"] = quote_block.get_text(strip=True)
            quote_block.decompose()
        else:
            quote["quotes"] = []
            quote["content"] = ""

        header.decompose()
        quotes.append(quote)

    return quotes

def parse_html_posts(html_content):
    soup = BeautifulSoup(html_content, "html.parser")
    post_blocks = soup.find_all("td", class_="msgcl1")
    user_profiles = extract_user_profiles(soup)
    posts_data = []

    for block in post_blocks:
        post = {}
        anchor = block.find("a")
        post["message_id"] = anchor.get("name") if anchor else None

        poster_td = block.find("td", class_="poster_info")
        if poster_td:
            user_link = poster_td.find("a")
            post["author"] = user_link.text.strip() if user_link else None
            post["profile_url"] = user_link["href"] if user_link else None

            activity_text = poster_td.get_text()
            post["activity"] = extract_after(activity_text, "Activity:")
            post["merit"] = extract_after(activity_text, "Merit:")

        subject_div = block.find("div", class_="subject")
        post["title"] = subject_div.get_text(strip=True) if subject_div else None

        date_div = subject_div.find_next_sibling("div") if subject_div else None
        post["date"] = date_div.get_text(strip=True) if date_div else None

        post_div = block.find("div", class_="post")
        if post_div:
            post["quotes"] = extract_quotes_recursive(post_div, user_profiles)
            post["content"] = post_div.get_text(strip=True)

        posts_data.append(post)

    return posts_data

def main():
    if len(sys.argv) < 2:
        print("Usage: python3 post_last.py <URL> [output.json]")
        sys.exit(1)

    url = sys.argv[1]
    output_path = sys.argv[2] if len(sys.argv) > 2 else "bitcointalk_parsed.json"

    try:
        headers = {"User-Agent": "Mozilla/5.0"}
        response = requests.get(url, headers=headers)
        response.raise_for_status()

        posts_json = parse_html_posts(response.text)
        with open(output_path, "w", encoding="utf-8") as outfile:
            json.dump(posts_json, outfile, indent=2, ensure_ascii=False)

        print(f"Success! Saved to {output_path}")

    except requests.RequestException as e:
        print(f"Error fetching URL: {e}")
        sys.exit(1)

if __name__ == "__main__":
    main()


Code:
$ python3 posts.py https://bitcointalk.org/index.php?topic=5547609.msg65510556#msg65510556 out.json

If we send that JSON yo an AI Agent to analyze the data, we could get a report from each post, i trained my agent to post the feedback in the next format:


And that shows us a basic example about how we could use AI to create a nice filter and avoid the bad guys.

Some implementations that i would do to my code would be:

-Get only the last post and not all the thread
--This could be done by having the sections' links and making some sorting based on the date and time, for this we don't need AI
-Add IA detectors API and not only ChatGPT

The way that i think this should work:

This would work better if it were implemented on the server side and not on the user's side, it would be easier to verify the post when the users push the post button than waiting for the automation to find the new post.

And my question for the community is: Is it really worth trying to automate a process like this, or are we fine with the current situation?