Post
Topic
Board Development & Technical Discussion
Topic OP
Reading/Writing Blocks and FLATDATA
by
Hepatizon
on 24/07/2010, 01:27:32 UTC
I'm trying to understand the way Bitcoin stores block data - among other things I want to run some statistics on the block chain / transaction history and check just how anonymous Bitcoin really is.  So I went to the source to see how Bitcoin reads/writes block data to file.

(ETA: this is in 0.3.2)

In main.h we have:

Code: (CBlock::WriteToDisk)
bool WriteToDisk(bool fWriteTransactions, unsigned int& nFileRet, unsigned int& nBlockPosRet)
{
    // Open history file to append
    CAutoFile fileout = AppendBlockFile(nFileRet);
    if (!fileout)
        return error("CBlock::WriteToDisk() : AppendBlockFile failed");
    if (!fWriteTransactions)
        fileout.nType |= SER_BLOCKHEADERONLY;

    // Write index header
    unsigned int nSize = fileout.GetSerializeSize(*this);
    fileout << FLATDATA(pchMessageStart) << nSize;

    // Write block
    nBlockPosRet = ftell(fileout);
    if (nBlockPosRet == -1)
        return error("CBlock::WriteToDisk() : ftell failed");
    fileout << *this;

    // Flush stdio buffers and commit to disk before returning
    fflush(fileout);
#ifdef __WXMSW__
    _commit(_fileno(fileout));
#else
    fsync(fileno(fileout));
#endif

    return true;
}

and

Code: (CBlock::ReadFromDisk)
bool ReadFromDisk(unsigned int nFile, unsigned int nBlockPos, bool fReadTransactions=true)
{
    SetNull();

    // Open history file to read
    CAutoFile filein = OpenBlockFile(nFile, nBlockPos, "rb");
    if (!filein)
        return error("CBlock::ReadFromDisk() : OpenBlockFile failed");
    if (!fReadTransactions)
        filein.nType |= SER_BLOCKHEADERONLY;

    // Read block
    filein >> *this;

    // Check the header
    if (CBigNum().SetCompact(nBits) > bnProofOfWorkLimit)
        return error("CBlock::ReadFromDisk() : nBits errors in block header");
    if (GetHash() > CBigNum().SetCompact(nBits).getuint256())
        return error("CBlock::ReadFromDisk() : GetHash() errors in block header");

    return true;
}

FLATDATA is defined in serialize.h like so:
Code: (FLATDATA)
//
// Wrapper for serializing arrays and POD
// There's a clever template way to make arrays serialize normally, but MSVC6 doesn't support it
//
#define FLATDATA(obj)   REF(CFlatData((char*)&(obj), (char*)&(obj) + sizeof(obj)))
class CFlatData
{
protected:
    char* pbegin;
    char* pend;
public:
    CFlatData(void* pbeginIn, void* pendIn) : pbegin((char*)pbeginIn), pend((char*)pendIn) { }
    char* begin() { return pbegin; }
    const char* begin() const { return pbegin; }
    char* end() { return pend; }
    const char* end() const { return pend; }

    unsigned int GetSerializeSize(int, int=0) const
    {
        return pend - pbegin;
    }

    template
    void Serialize(Stream& s, int, int=0) const
    {
        s.write(pbegin, pend - pbegin);
    }

    template
    void Unserialize(Stream& s, int, int=0)
    {
        s.read(pbegin, pend - pbegin);
    }
};

Now - and I apologize if I'm reading this wrong, this is a little more advanced C/C++ code than I'm used to - as I understand it, the FLATDATA call interprets the raw bytes of a CBlock object as an array (stream??) of characters.  The CBlock::WriteToDisk method writes the constant 4-byte message header (0xf9, 0xbe, 0xb4, 0xd9), the size of the CBlock object in bytes, and then the FLATDATA of the CBlock it's writing to disk - which is just the raw bytes of the CBlock object.  So after the header, the data written to file is byte-for-byte the same as the CBlock object represented in memory.  Also, if I'm reading it correctly, CBlock::ReadFromFile copies those bytes directly into the space allocated for a CBlock object in memory to re-create the block.  Is this correct?

Related question - I am under the impression that the exact way an instance of a C++ class is represented internally not guaranteed under standards; compiling a program with different compilers or different optimization flags can change the order in which member variables are stored in memory, and some debug mode compilers even add a few bytes between member variables to make memory inspection easier.  I'm not positive about this, it's just something I've picked up and never seriously questioned.