How is your data classified? Tree-structure or raw recent-first stack? In the second case, I'd probably reorganize it myself into a tree-like structure. This way should be quicker to filter out some data. Maybe start with the Bitcoin discussion board, then scale up.
Also I can always chop those 100GB into various time series. Perhaps the 2020-2022 time period contains jucier data than the rest (due to the rise and drop of BTC). Everything can be explored.