Auto detect Scams, Spam and AI in the forum

The forum could use AI to fight against AI, Scams and Spam.

With the right tools it could be done, but isn't an easy task, The fact that i was able to spam with AI for multiple days is a good example bout the current problem that we have on the detection systems. It was a user who reported the post while i was expecting to get busted by the staff.

So, the thing is:
1.- we as users can create tools for monitoring the forum post and find who is exhibiting bad behavior and report it to mods or in the respective threads.
2.- Mods could have better tools for their mods job.
3.- The forum could implement some integrations to the code with tools like GptZero and any IA model for the scams and spam detection.

What i have right now.

With a python script i can choose a link and get all the posts in a JSON:

Code:

from bs4 import BeautifulSoup
import json
import sys
import requests
import re

def extract_after(text, key):
try:
return text.split(key)[1].split()[0]
except:
return None

def parse_quote_header(header_text):
match = re.search(r"Quote from:\s*(.+?)\s+on\s+(.*)", header_text)
if match:
return match.group(1).strip(), match.group(2).strip()
return None, None

def extract_user_profiles(soup):
profiles = {}
for td in soup.find_all("td", class_="poster_info"):
a = td.find("a")
if a:
name = a.text.strip()
href = a.get("href")
profiles[name] = href
return profiles

def extract_quotes_recursive(container, user_profiles):
quotes = []
headers = container.find_all("div", class_="quoteheader", recursive=False)

for header in headers:
quote = {}
link_tag = header.find("a")
quote["link"] = link_tag["href"] if link_tag else None
user, date = parse_quote_header(header.get_text(strip=True))

quote["author"] = user
quote["profile_url"] = user_profiles.get(user, None)
quote["date"] = date

quote_block = header.find_next_sibling("div", class_="quote")
if quote_block:
quote["quotes"] = extract_quotes_recursive(quote_block, user_profiles)
for q in quote_block.find_all("div", class_="quote", recursive=False):
q.decompose()
quote["content"] = quote_block.get_text(strip=True)
quote_block.decompose()
else:
quote["quotes"] = []
quote["content"] = ""

header.decompose()
quotes.append(quote)

return quotes

def parse_html_posts(html_content):
soup = BeautifulSoup(html_content, "html.parser")
post_blocks = soup.find_all("td", class_="msgcl1")
user_profiles = extract_user_profiles(soup)
posts_data = []

for block in post_blocks:
post = {}
anchor = block.find("a")
post["message_id"] = anchor.get("name") if anchor else None

poster_td = block.find("td", class_="poster_info")
if poster_td:
user_link = poster_td.find("a")
post["author"] = user_link.text.strip() if user_link else None
post["profile_url"] = user_link["href"] if user_link else None

activity_text = poster_td.get_text()
post["activity"] = extract_after(activity_text, "Activity:")
post["merit"] = extract_after(activity_text, "Merit:")

subject_div = block.find("div", class_="subject")
post["title"] = subject_div.get_text(strip=True) if subject_div else None

date_div = subject_div.find_next_sibling("div") if subject_div else None
post["date"] = date_div.get_text(strip=True) if date_div else None

post_div = block.find("div", class_="post")
if post_div:
post["quotes"] = extract_quotes_recursive(post_div, user_profiles)
post["content"] = post_div.get_text(strip=True)

posts_data.append(post)

return posts_data

def main():
if len(sys.argv) < 2:
print("Usage: python3 post_last.py <URL> [output.json]")
sys.exit(1)

url = sys.argv[1]
output_path = sys.argv[2] if len(sys.argv) > 2 else "bitcointalk_parsed.json"

try:
headers = {"User-Agent": "Mozilla/5.0"}
response = requests.get(url, headers=headers)
response.raise_for_status()

posts_json = parse_html_posts(response.text)
with open(output_path, "w", encoding="utf-8") as outfile:
json.dump(posts_json, outfile, indent=2, ensure_ascii=False)

print(f"Success! Saved to {output_path}")

except requests.RequestException as e:
print(f"Error fetching URL: {e}")
sys.exit(1)

if __name__ == "__main__":
main()

Code:

$ python3 posts.py https://bitcointalk.org/index.php?topic=5547609.msg65510556#msg65510556 out.json

If we send that JSON yo an AI Agent to analyze the data, we could get a report from each post, i trained my agent to post the feedback in the next format:

And that shows us a basic example about how we could use AI to create a nice filter and avoid the bad guys.

Some implementations that i would do to my code would be:

-Get only the last post and not all the thread
--This could be done by having the sections' links and making some sorting based on the date and time, for this we don't need AI
-Add IA detectors API and not only ChatGPT

The way that i think this should work:

This would work better if it were implemented on the server side and not on the user's side, it would be easier to verify the post when the users push the post button than waiting for the automation to find the new post.

And my question for the community is: Is it really worth trying to automate a process like this, or are we fine with the current situation?