Building blkindex.dat is what causes all the disk activity.
[...]
Maybe Berkeley DB has some tweaks we can make to enable or increase cache memory.
The following code in AddToBlockIndex(main.cpp) is horribly inefficient, and dramatically slows initial block download:
CTxDB txdb;
txdb.WriteBlockIndex(CDiskBlockIndex(pindexNew));
// New best
if (pindexNew->bnChainWork > bnBestChainWork)
if (!SetBestChain(txdb, pindexNew))
return false;
txdb.Close();
This makes it impossible to use a standard technique for loading large amounts of records into a database (db4 or SQL or otherwise): wrap multiple record insertions into a single database transaction. Ideally, bitcoin would only issue a TxnCommit() for each 1000 blocks or so, during initial block download. If a crash occurs, the database remains in a consistent state.
Furthermore,
database open + close for each new block is incredibly expensive. For each database-open and database-close operation, db4
- diagnose health of database, to determine if recovery is needed. this test may require data copying.
- re-init memory pools
- read database file metadata
- acquire file locks
- read and initialize b-tree or hash-specific metadata. build hash table / b-tree roots.
- forces a sync, even if transactions called with DB_TXN_NOSYNC
- fsync memory pool
And, additionally, bitcoin forces a database checkpoint, pushing all transactions from log into main database.
That's right, that long list of operations is executed per-database (DB), not per-environment (DB_ENV), for a database close+open cycle. To bitcoin, that means we do this for
every new block. Incredibly inefficient, and not how db4 was designed to be used.
Recommendations:
1) bitcoin should be opening databases, not just environment, at program startup, and closing database at program shutdown. db4 is designed to handle crashes, if proper transactional use is maintained -- and bitcoin already uses db4 transactions properly.
2) For the initial block download, txn commit should occur once every N records, not every record. I suggest N=1000.
EDIT: Updated a couple minor details, and corrected some typos.