Databases have been around for an awfully long time and while there have been a few major revolutions – such as the emergence of the relational database and the subsequent NoSQL counter-revolution – from a developer’s standpoint, database operations have stayed pretty much the same.
Typically, an application creates a data record, reads it back, possibly updates it and eventually deletes it. This Create-Read-Update-Delete cycle has been given the convenient acronym of “CRUD”. CRUD illustrates the transitory nature of database storage – data is created, modified and deleted. Updates destroy old versions of data and once deleted, database records are (unless we completely restore an old version of the database) gone forever.
Furthermore, we’ve gotten very used to the fact that databases offer limited guarantees about the integrity of transactions. A privileged developer can almost always overwrite a data record and can even set that records timestamp to whatever they choose. There’s no inherent way within database technology to guarantee that a data element has not been overwritten.
The blockchain supports a completely different paradigm. In the original Bitcoin blockchain, the need to prevent double-spending of coins was paramount and it was absolutely essential that the records of every transaction be preserved forever.
Therefore, the Blockchain provides an append-only immutable ledger – a relatively simple database in which data elements can be added but can never be deleted or modified. Bitcoin’s Proof of Work algorithm and the cryptographic links between successive Blockchain entries make tampering impossible.
So, for the first time in computing history, we have a datastore in which we can have absolute certainty about a data item’s creation date and can be absolutely certain that the item has not been modified. That is a real revolution!
Unfortunately, we can’t use the Blockchain as a general-purpose data store – it is simple, too slow, expensive and unwieldy when compared to a traditional database. For instance, the Bitcoin blockchain will only generate a new block every 10 minutes, can only handle about 27 transactions per second and would cost millions of dollars per Gigabyte if used as a traditional data store.
How to integrate blockchain capabilities
If we do want to integrate blockchain capabilities into our existing database applications we have two routes forward:
- Build new database technologies that integrate blockchain concepts, but which can still provide economic performance.
- Create an integration layer between databases and existing blockchains.
There have been some early attempts to build new database systems that are based on blockchain foundations. Unfortunately, rather than being “best of both worlds” they risk being “worst of both worlds”: harder to use and less functional than existing databases and without the strong integrity guarantees offered by the public Blockchains such as Ethereum and Bitcoin.
For the time being, we can only get the best of both worlds solution through an integration layer.
Luckily, there are technology patterns that allow us to maintain immutable copies of database records and have them anchored to the blockchain.
Log Structured Merge Tree & Merkle Tree
Firstly, we can structure our data as a Log Structured Merge Tree (LSM). In a Log Structured Merge Tree, all data – including deletes and updates – are processed as inserts into the tree. A delete inserts a “tombstone” record that notes that a data item has been deleted. An update leaves the old record intact and simply inserts a new version.
The second data structure we can use is the Merkle Tree. A Merkle tree is a hash tree in which successive pairs of hashes are themselves hashed until a single root hash is obtained. This hash can be used to validate the integrity of thousands of data elements of arbitrary size. If we store this root hash on a blockchain, that root hash can be used to prove the integrity of any number of database elements in a single blockchain transaction.
Therefore, if we implement our database schema as a Log Structured Merge Tree, we can use Merkle trees to anchor database state to the blockchain. We’d then have all the advantages of blockchain immutability and the power of whatever database we choose.
If this sounds like a lot of work, you are right. However, do not despair. In ProvenDB, we’ve built this integration layer right into MongoDB. Using ProvenDB, you can use MongoDB as usual, but under the hood, we are maintaining the LSM structure and giving you access to blockchain proof of integrity and timestamp. You can sign up to the free early adopters’ version of ProvenDB at