Premise: I'm planning to set up a full Bitcoin node to run advanced analytics and answer some research questions I have on wallet behavior, scarcity, and how price fluctuations affect the ecosystem. I'm looking for feedback from the community on the hardware setup, potential bottlenecks, and issues I might not have considered at this stage.
About Me: I’m a data engineer with a background in computer science, network engineering, and physics. My professional experience includes working with large data sets, complex models, and analytics. Over the past decade, I’ve applied these skills in scientific research, stock market analysis, and behavioral studies. I’m now diving into blockchain data and seeking to develop models that address some unanswered questions.
Questions I’m Exploring:- Modeling the behaviors of active vs. inactive wallets.
- How price volatility influences ecosystem behavior.
- Tracking scarcity flows between known and unknown entities over time.
- Identifying and analyzing "gatherers" — addresses that continuously accumulate BTC, regardless of price trends, and modeling their impact on scarcity.
- Projecting Bitcoin’s scarcity under various price scenarios up to 2050 and beyond.
- I’ve seen opinions on these topics, but I’m struggling to find solid research backed by real-world data models. If anyone knows of existing work in these areas, I’d love to hear about it!
Planned Build (Hardware):
Processor: AMD Ryzen 7 7700
Motherboard: MSI MAG B650 TOMAHAWK (AM5 socket)
Memory: G.SKILL Trident Z5 RGB DDR5-6000 (32GB)
Storage: Samsung 990 Pro 2TB (NVMe SSD for fast data access)
Power Supply: Corsair RM850x
Case: Corsair 4000D Airflow
CPU Cooler: Corsair iCUE H100i Elite Capellix (AIO)
Optional GPU: MSI GeForce GT 1030 2GB (mainly for potential machine learning features later)
My Questions/Concerns:
Hardware Bottlenecks: Are there any obvious weak points in this build for running full node operations and handling data-intensive tasks like blockchain analytics? I'm especially interested in potential memory or storage issues.
Connectivity: I’m on Starlink Residential (150-200Mbps download), which should be fine after the initial blockchain sync (~600GB). Does anyone have experience with how connectivity might impact node reliability, particularly in rural areas?
Software: I plan to use Ubuntu Server. Is this a solid choice for running a full node and developing analytics tools, or are there other distributions better optimized for this kind of work?
Future Expansion: I'm looking to scale this setup to handle machine learning models on the blockchain data in the future. Should I anticipate the need for more advanced GPUs or additional hardware as I expand the complexity of my models?
Any advice on potential pitfalls, better component choices, or tips for managing a full node with advanced analytics would be appreciated!