The Four Layers of the Blockchain
The recent DAO hack has called into question the speed and aggressiveness with which the Blockchain (and specifically Ethereum) community has moved forward on Blockchain applications. Amid many calls for reflection, one that I found especially convincing is Joi Ito’s call for a layered approach to Blockchain design (ironically posted just before the hack occurred).
While some have already proposed a layered approach to the Blockchain,many (but not all) treat the Blockchain as the base layer, which is like building a skyscraper starting at the 30th floor. Focusing on deploying complex applications on top of a Blockchain is, in my opinion, premature, as the DAO hack has demonstrated.
Instead, the Blockchain itself should be decomposed into separate layers in order to better understand the security and economics of Blockchain design. Below I propose such a decomposition and argue why it is both natural and useful.
First a recap: layered system design is best exemplified in the Internet. Each layer is more abstract than the lower layer, until we get to the physical transport layer. This approach allows for robust system design because each layer can be upgraded, patched, or even completely swapped out without affecting other layers. To give an over-simplified summary, on the Internet you have:
- The physical layer: the actual medium that transports the bits whether it be wireless spectrum, cable fiber, or phone lines.
- The network layer: manages addressing and routing of packets between different physical routers, most commonly IP.
- The transport layer: manages raw connection state, most commonly TCP.
- The session layer: manages higher level connection state, such as HTTP.
- The application layer: where actual applications live, e.g. Google search, Facebook, etc.
This layered design means that TCP/IP functions just as well over wireless LTE as over fiber. (This point is a little trickier at the application end of the stack, but largely still holds: you could run Facebook over HTTP/2 just as well over as over HTTP/1.1.)
A Bitcoin layer decomposition
So, what might such a layered decomposition look like for Blockchain applications? Here is one proposal:
- Consensus layer: a protocol that describes the format of a ledger that is publicly visible and a consensus function that anyone can use to determine which of multiple candidate ledgers is the consensus ledger. The protocol must also allow new blocks to be added to the ledger.
- Mining layer: a protocol that incentivizes parties to maintain the consensus and add blocks to the ledger.
- Propagation layer: a protocol that determines how the ledger and blocks are transmitted between nodes in the network.
- Semantic layer: a specification of how new blocks must relate to previous blocks and a protocol for verifying conformity with the specification.
- Application layer: application code that implements some desired functionality.
The first four layers encompass what we think of as the Blockchain, while the application layer allows for overlays, APIs, applications, etc.
Before continuing, I want to point out that the consensus layer described above is different from the one used in Eric Lombrozo’s decomposition. His use of “consensus layer” corresponds to our consensus + mining + semantic layers combined, while his “peer-to-peer layer” corresponds to our propagation layer.
Why four layers instead of one?
Treating the Blockchain as a single layer is like lumping everything between the physical and transport layers in the Internet into a single layer. While it’s fine to ignore these layers and treat them as one once they’re mature, at this moment Blockchain technology is too young and we don’t understand these base layers well enough.
By separating out the Blockchain into multiple layers, we can better study the various properties that we want the Blockchain to enjoy and where they need to be implemented. These properties are:
- Security: no party who doesn’t control a majority of some scarce resource (typically computing power) can convince nodes that an alternate version of the ledger is the consensus
- Liveness: nodes can add new blocks to the ledger with acceptable latency
- Stability: nodes in the network should not alter their opinion of the consensus ledger
- Correctness: only blocks that represent valid transactions (i.e. they conform to a specification of how new blocks may relate to previous blocks) may be added to the ledger
These properties are drawn from the survey of Bonneau et al. though we chose to rename [eventual consensus → stability] and [exponential convergence → security] for clarity’s sake.
Our classification into four layers is natural because it’s quite easy to identify that each of these properties is achieved primarily at one layer of our decomposition:
- Security is achieved at the consensus layer, and requires building a consensus function that cannot be fooled into accepting an alternate ledger without using a majority of all existing resources
- Liveness is achieved at the mining layer, and requires there to be enough incentive for participants in the network to continually confirm new blocks
- Stability is achieved at the propagation layer, and requires nodes to be able to quickly disseminate confirmed blocks to other nodes so that they know to build on the most recent blocks instead of older, stale blocks
- Correctness is achieved at the semantic layer, where blocks have a meaning, which could range from sending currency between parties as in Bitcoin to encoding state transitions in a state machine as in Ethereum, and where this meaning is validated by nodes to conform to the specification stating how new blocks must relate to previous blocks
Before moving on, I’d like to acknowledge this relationship between properties and layers is not strict and that different properties and layers are not totally independent. However, they are distinct enough to warrant study in isolation in order to gain better understanding. (Note that interdependency of layers is true of the Internet as well; HTTP runs on TCP/IP because it provides a reliable connection, but not on UDP.)
I’d also like to point that the term security is used above in a narrow sense to only mean the security of the underlying protocol. There are attacks that rely on creating unintended transactions (this is what occurred in the DAO hack) that may be called security problems, but are the result of buggy code and not a security breach in the protocol itself.
Which layer does cryptocurrency live at?
Because existing Blockchains including Bitcoin and Ethereum work at all four layers (consensus+mining+propagation+semantic) simultaneously, it’s not immediately clear at which layer cryptocurrency “lives”. In fact, it lives at two layers and in two different forms. This fact is implicit in Bitcoin and explicit in Ethereum.
- The mining layer: Bitcoins and Ether are created and/or transferred as valid blocks are created and added to the ledger. The currency is either generated from the network itself (“out of thin air”) or taken from the transactions contained in the block (“transaction fees”). The currency is used to maintain an incentive for miners to hash blocks.
- The semantic layer: Bitcoins and Ether can be transferred among nodes at the semantic layer by creating valid transactions signed by the holders of the cryptocurrency or by creating smart contracts that transfer the cryptocurrency between accounts. Here the cryptocurrency is used as a store of value and means of payment.
Since Ethereum is a general-purpose VM, you can also create alternate cryptocurrencies at the semantic layer to be used in the application layer. For example, DAO tokens functioned this way. These alternate cryptocurrencies live only at the semantic and application layers.
While the fact that cryptocurrency is used to incentivize mining makes them valuable and hence attractive as an asset at the semantic layer, there is no reason that they should be the primary asset at the semantic layer. We’ll argue farther down that there is a strong case to be made for strictly limiting the mining cryptocurrency to the mining layer.
Working with the layers
This decomposition allows us to reason about aspects of the Blockchain in manageable chunks, and should help us provide more rigorous arguments for evaluating design choices when building Blockchain technologies.
In a forthcoming paper, I will present a simple theoretical proof of security of the Bitcoin Blockchain protocol, whose simplicity is possible because it only requires looking at the consensus layer. I believe that this (or a similar) decomposition into layers will open the door to simple analyses of the higher layers as well.
A first proposal: restriction on mining coins
Let me end the post with a proposal based on this layering that could help improve the security and flexibility of Blockchains.
As mentioned above, existing cryptocurrencies live both at the mining layer and the semantic layer of the Blockchain stack.
We argue that mining coins (the cryptocurrency used at the mining layer) should be usable only at the mining layer, and even more stringently they should only support two operations: creation and consumption. In addition, consuming a mining coin doesn’t mean transferring it from the publisher to the miner, consuming a mining coin means attaching it to a block andretiring the coin when that block is confirmed, sending it back into the void where it came from.
Note that we have explicitly prohibited mining coins from being transferred between nodes. This restriction improves security because it means that it’s literally impossible to steal mining coins, eliminating fraud at the mining layer!
Of course, the question becomes how do people get their hands on enough mining coins to use the Blockchain? The distribution of mining capacity is uneven and will be concentrated in the hands of dedicated miners, so we must have some means of reallocating mining coins.
This is solved by user coins. User coins live at the semantic level and can be used as a store of value and means of payment, meaning in particular that they are transferrable. (Just to clarify, calling them “user coins” does not mean each user gets their own kind of coin. There may be one type of user coin for the entire system, or there could be more than one, it depends on that particular Blockchain’s design. The term “user coin” is meant to emphasize that these are the coins that should be used by end-users to handle non-mining related transactions.)
A node who wishes to publish a transaction t but doesn’t have any mining coins to spend on publishing will attach a related contract t’ that says “transfer n user coins to the node who attaches enough mining coins to allow both t and t’ to be published”. A miner who has mining coins could then take t and t’, contribute enough mining coins to the block, and publish them together in order to obtain the n user coins.
This might seem a roundabout way to achieve something that can already be done with a single type of coin, but in fact a strict separation of user and mining coins has several advantages:
- User coins can be structured using a different economic model than mining coins. For example the creation and transfer of user coins can have rules that reflect a central banking model, a community currency model, etc. Multiple models can exist on the same Blockchain. The user coins can also have fraud prevention measures, e.g. a trusted third party who is allowed to reverse certain transactions.
- Even if one type of user coin is compromised, the system can survive by switching to a different user coin. Thus, the fundamental viability of the network can be decoupled from the economics of the semantic layer (at least to some extent).
- The issuance models of existing cryptocurrencies are extremely ad hoc: Bitcoin has a predefined schedule of Bitcoin creation, Ethereum has a somewhat different schedule, but none of them use a model where the money supply adapts to economic conditions, which is something that is useful for moderating the effects of business cycles. Typical arguments for using these arbitrary models are that prices / interest rates will adjust to take into account the issuance rate, but this is a cop-out. In practice price levels are important psychological signposts when making economic decisions, and ideally a currency system will have some mechanism to promote price stability (not necessarily zero inflation, but hopefully low and predictable inflation).
- This dual mining / user coin system allows for flexible issuance models. First, mining coins, are transient by nature and their supply adapts naturally to the rate of block creation: they are created when blocks are confirmed and are also retired when blocks are confirmed, which means that demand and supply of mining coins should move together and should lead to a stable price for mining coins.
Second, user coins can use any issuance model, and so decoupling the mining and user coins enables less risky experimentation with user coin issuance models.
Of course this is a cursory analysis and pinning down the exact issuance and consumption model of mining coins is an important topic for further research and I’ll share some more detailed ideas in future posts.