Post
Topic
Board Development & Technical Discussion
Problem: opening and closing database for each block
by
jgarzik
on 29/11/2010, 19:01:12 UTC
Building blkindex.dat is what causes all the disk activity.
[...]
Maybe Berkeley DB has some tweaks we can make to enable or increase cache memory.

The following code in AddToBlockIndex(main.cpp) is horribly inefficient, and dramatically slows initial block download:

Code:
   CTxDB txdb;
    txdb.WriteBlockIndex(CDiskBlockIndex(pindexNew));

    // New best
    if (pindexNew->bnChainWork > bnBestChainWork)
        if (!SetBestChain(txdb, pindexNew))
            return false;

    txdb.Close();

This makes it impossible to use a standard technique for loading large amounts of records into a database (db4 or SQL or otherwise):  wrap multiple record insertions into a single database transaction.  Ideally, bitcoin would only issue a TxnCommit() for each 1000 blocks or so, during initial block download.  If a crash occurs, the database remains in a consistent state.

Furthermore, database open + close for each new block is incredibly expensive.  For each database-open and database-close operation, db4
  • diagnose health of database, to determine if recovery is needed.  this test may require data copying.
  • re-init memory pools
  • read database file metadata
  • acquire file locks
  • read and initialize b-tree or hash-specific metadata.  build hash table / b-tree roots.
  • forces a sync, even if transactions called with DB_TXN_NOSYNC
  • fsync memory pool

And, additionally, bitcoin forces a database checkpoint, pushing all transactions from log into main database.

That's right, that long list of operations is executed per-database (DB), not per-environment (DB_ENV), for a database close+open cycle.  To bitcoin, that means we do this for every new block.  Incredibly inefficient, and not how db4 was designed to be used.

Recommendations:

1) bitcoin should be opening databases, not just environment, at program startup, and closing database at program shutdown.  db4 is designed to handle crashes, if proper transactional use is maintained -- and bitcoin already uses db4 transactions properly.

2) For the initial block download, txn commit should occur once every N records, not every record.  I suggest N=1000.



EDIT:  Updated a couple minor details, and corrected some typos.