Re: Talksearch.io - Advanced Bitcointalk Search Engine

Version 1

Last scraped

Scraped on 18/04/2025, 11:37:47 UTC

An update to the enhanced search feature:

A new dataset is being uploaded to Elasticsearch. This dataset is more enriched than the current unprocessed posts and includes even more metadata such as the lock type, scrape time and check time, which will be used along with other parameters to determine in what order should topics be checked for updates and the frequency they will be checked.

An experimental quality score is also included with each post, in an attempt to deprioritize low-quality posts and sig spam from the search results.

In an effort to remove irrelevant data such as quotes from the search results, posts are now divided into chunks, delimited by the presence of a quote or a line separator.

This upload process was started yesterday, and about 45 million records have been indexed so far, out of a total estimated to be around 120 million.

The v2 indices contain the data which Talksearch will use for searching in the future. Also, local language posts are categorized to facilitate for local search.

I continue to work on automatic scraping support. However, the v2 dataset is more recent than the original, and contains posts from up to March 2025.

New translated ANN links will be added shortly.

Original archived Re: Talksearch.io - Advanced Bitcointalk Search Engine

Scraped on 18/04/2025, 11:32:45 UTC

An update to the enhanced search feature:

A new dataset is being uploaded to Elasticsearch. This dataset is more enriched than the current unprocessed posts and includes even more metadata such as the lock type, scrape time and check time, which will be used along with other parameters to determine in what order should topics be checked for updates and the frequency they will be checked.

In an effort to remove irrelevant data such as quotes from the search results, posts are now divided into chunks, delimited by the presence of a quote or a line separator.

This upload process was started yesterday, and about 4 million records have been indexed so far, out of a total estimated to be around 120 million.