P.S. Medium is the absolute worst platform for math — If your eyes are as scalded by LaTeXIT as mine are, there is a link to a Dropbox Paper version of this article at the end (but please clap here if you liked this article!). I was just committed to finishing this series on the platform that I started on :)
In the past three parts of this series, we investigated probabilistic threat models for comparing blockchain protocol guarantees and estimating liveness and safety. The last blog post, in particular, illustrated how the choice of probabilistic model can lead a transformation that transfers some guarantees from one blockchain model to another. In light of this, a natural set of questions to ask are:
- Can guarantees from an arbitrary single chain protocol be transferred to any other single chain protocol?
- The reason for excluding multiple chain protocols is that inter-shard communication and hiding schemes add in enough non-determinism to make it hard to evaluate whether certain properties are actually the same
- How do we figure out what guarantees are transferable between seemingly different protocols?
- Can we construct a taxonomy of protocols that connects protocols based on their probabilistic properties?
- Can we use these transferable features to determine our protocol resistant to ‘small’ perturbations or changes?
- These small changes could arise due to an implementation detail when one is writing a client or if there is a sudden change in the demographics within our blockchain system (e.g. if there is a shock and 50% of miners / validators stop mining / validating)
This post will try to answer these questions by generalizing the notion of ideal functionality, which is commonly used in the blockchain literature to prove guarantees on a variety of protocols:
- Arbitrum [Off-Chain Labs]
- Avalanche [Ava]
- Ekiden [Oasis Labs]
- GRANDPA [PolkaDot]
- Hawk
- Ouroboros [Cardano]
- Zexe
We aim to show that ideal functionalities, at least as used in the blockchain literature, are equivalent to the mathematical concept of universality. At a high-level, universality is the idea that seemingly unrelated complex systems that are made up of individual units (transactions, people, atoms, DNA, etc.) with different microscopic or local behaviors can actually have the same macroscopic/collective/global behavior. The simplest notion of a universal law that you might have experience with is the Central Limit Theorem (CLT), which states that the sum of independent, identically distributed random variables divided by $$\sqrt{n}$$ converges in distribution to the familiar Bell curve (or Gaussian distribution) [0]:
Fields Medalist Terence Tao has an excellent presentation for non-experts that explain how universality occurs in a plethora of seemingly unrelated natural contexts and how it shows up in unexpected places in mathematics. Our goal is to take these ideas and show how they manifest themselves within the blockchain literature. Once we do this, it will be a lot easier to answer the above questions and figure out how to we might make a statistical taxonomy of consensus protocols.
P.S. While it was hard to avoid using the words and phrases, Virasoro Algebra, hitting time, and martingale, this is the first and last time that they will appear within this post.
In the second blog post of this series, we spoke about environments that show up within the analysis of blockchain protocols. Recall that the idea of an environment is to encode the total state of the blockchain system (over all participants) in an object with random and deterministic components that closely approximate what the underlying protocol is doing. This allows us to analyze the entire blockchain system in the same way that one analyzes random algorithms (e.g. using classical methods from books like Randomized Algorithms by Motwani and Raghavan). Environments help by centralizing the analysis and making the game theoretic aspects of a protocol more transparent. However, environments are often times too difficult to analyze directly. This difficulty stems from subtle correlations and dependencies between steps in a protocol. Before going further, let’s look at a few examples:
- Bitcoin: The order of arrival of blocks on different forks affects which fork a given miner will choose. To statistically analyze the chain growth, chain quality, and liveness of the chain, we will have to average over all permutations of block arrivals
- PoS: If multiple users are to get slashed for signing on an invalid fork (to prevent nothing-at-stake attacks), then the ordering in which they get slashed is ambiguous. To correctly compute the probability that the system will fail (e.g. will have insufficient chain growth, chain quality, or liveness), we will need to compute this probability over all orderings of slashes.
Such ordering complexities make probabilistic analysis untenable — we have dramatically increased the size of our probability space, decreasing the ease and efficiency of computing moments (such as means and variance), tail bounds, and rates of convergence. How do we get around this?
The intuitive answer goes back to Satoshi and his famously incorrect Poisson analysis of fork probabilities in the original Bitcoin paper. The idea is that the probability of ordering being important decays quickly — hopefully, exponentially — in a system parameter and you can compute statistics as if these ordering complexities don’t matter. If you need to have more precise control about how ordering affects an observable, you can add in a decaying correction term to account for this. Let’s illustrate this via a simple example example. Suppose that $$\lambda$$ is the block production rate of a Bitcoin system and that $$H_i(t)$$ represents the height of blockchain of the $$i$$th miner in a Bitcoin system at time $$t$$, which is a random variable. It can be shown that under synchrony and partial asynchrony, $$H_i(t) = H(t) + o\left(\frac{\lambda}{\sqrt{n}}\right)$$ for a deterministic, linear function $$H(t)$$ (see the analysis here, for instance). Ideal functionalities aim to formalize this intuition in a more rigorous manner than Satoshi. They generically do this in the following manner:
- Construct a simpler model, which is usually referred to as ideal functionality, that doesn’t have the correlation or complexity that is vexing your analysis. In the cryptography literature, this is referred to as the hybrid method.
- Show that the probability that the true model differs from the ideal functionality can be bounded by an error tolerance that is a function of parameters controlled by the protocol designer
An an example, let’s go through how this works for Bitcoin (as represented by Pass, Shelat, and Seeman):
- Originally, we define Bitcoin as close to the original spec as possible:
- We have a multiplayer game where there are a number of rounds. Each round is demarcated by an evaluation of a hash function (in the random oracle model)
- Each user is allow to compute one hash per round but can has an unlimited number of verification queries; this is done to replicate
- The oracle determines if a hash is valid (e.g. $$H(x)<D_p$$, where $$D_p$$ is the current difficulty)
- We reduce this to an ideal functionality that removes the explicit hashing and simply draws a random fork with the right probability (e.g. $$\frac{D_p}{2^{\ell}}$$ where $$\ell$$ is the bit width of the hash):
To use a hybrid argument, we need to show that these two models are ‘close’. The Poisson arrival times of Bitcoin blocks coupled with the random oracle model are used to show that the second model is “close” if the Poisson parameter is large enough (e.g. the block production rate is large enough [1]). All of the aforementioned protocols utilize a statistical reduction of this form to provide protocol security guarantees. This probabilistic transformation from a complex model to a simple one will be our guide as we aim to connect Blockchain analysis to modern probability theory. But first, we will need to take a detour through statistical physics to see an example of a universal model that naturally has blockchain-style ‘voting’ dynamics.
Universal probabilistic laws provide a way to take sequences of random variables and show that their tail bounds and moments are equal. While the first universal law discovered was inspired by gambling and games of chance (the CLT, discovered by Carl Friedrich Gauss and, to some extent, Daniel Bernoulli [2]), the subsequent examples of universal laws came from statistical and high-energy physics. These areas of physics study complex systems — systems of millions of particles and highly coupled particle systems that are recovered from smashing atoms together, respectively — and aim to find laws that hold statistically, regardless of the precise initial conditions of an experiment [3]. Over time, Physicists discovered that for many systems, one finds universal laws at different large parameter scales that correspond to ‘zooming out’ (in some parameter) enough. The following picture is a visual representation of this using the Ising Model, a prototypical model for how iron can be magnetized at low temperatures, but cannot at higher temperatures. The simplest version of the model represents molecules as points on a lattice that either have a +1 or a -1, representing their net spin [4], and each molecule’s probability of being a +1 or -1 is influenced by polling it’s neighbors (treating the lattice as a graph) and deciding whether +1 or -1 is in the majority [5]. This idea that your neighbors’ votes influence your vote is how we will be able to make the connection back to distributed consensus. In the following picture, we see images of the two-dimensional Ising Model at different temperatures (red and blue represent +1 and -1):
- The transition from purely random (the upper left box) at high temperatures to much more structured clusters corresponds to going from a purely random microscopic description (e.g. particles moving about randomly) to undergoing to a phase transition and having well-defined clusters (e.g. magnetized, correlated particles). When these clusters are more aligned to one color than the other, then the material is magnetized in the direction of that color. This color alignment is very similar to consensus — when the system can magnetize to a non-zero number $$m$$, then the system has reached consensus on that the value is $$m + |m|\min(1-m,m+1)$$ (e.g. round to $$\pm 1$$). On the other hand, if the system is at a high temperature and cannot magnetize (on average) then we do not reach consensus. In this setting, temperature serves as a parameter that controls a phase transition between when the system is able to reach ‘consensus’ on average (e.g. has non-trivial magnetization) or not. At this point, you might be wondering if phase transition behavior shows up in existing blockchain analysis, since many protocols seem to have a similar behavior (albeit with different parameters than a physical temperature). It turns out it does — take a look at the following theorem statements:
Let us interpret these:
- In the first two theorems, we see numerical conditions that need to be satisfied in order to achieve exponential probability decay
- In the third theorem, we see that the security of the ideal functionality $$\mathcal{F}_{Ekiden}$$ reduces (with high probability) to that of the underlying blockchain $$\mathcal{F}_{blockchain}$$
In both scenarios, the proofs rely heavily on the use of concentration inequalities, which aim to show that a sequence of random variables deviates from a center (mean, median, etc.) with decreasing probability as we get more samples. Concentration inequalities, just like the Ising Model, have phase transitions that are only satisfied under certain conditions on the moments of the random variables under analysis [6a]. In fact, the seemingly random conditions that are specified in the first two theorems arise directly from:
- Taking the conditions that are needed for a concentration inequality to be satisfied
- Plugging in the blockchain-related variable definitions
- Back-computing the conditions on the blockchain random variables.
In all three papers, we find that the authors either directly use concentration inequalities or reprove simple versions within their analysis [6b]. Interestingly enough, since the formal security of PoS protocols seems to rely on back-computing conditions for convergence from concentration inequalities, it is natural to ask if we can construct a transformation between two protocols that use the same assumptions and the same concentration inequalities in their proofs of security that let us map properties from one to another. Let’s look at an explicit example:
Suppose that we have two PoS protocols $$P_1, P_2$$. If we know a stake distribution that arose after $$E$$ epochs in $$P_1$$ has, in expectation, Gini coefficient $$g$$, then can we construct a function $$f : [0,1] \rightarrow [0,1]$$ such $$f(g)$$ is the expected Gini coefficient in $$P_2$$ after $$E$$ epochs?
The reason an answer to the above question is important is that there exists a deterministic mapping $$f$$ that maps one statistic to another. We can expand this notion of, ‘two protocols are equal if there exists a way to transfer Gini coefficients’ to something more general. Let $$\mathcal{S}_{DCP}$$ denote the space of distributed consensus protocols [7] and define a measurement to be a function $$m : \mathcal{S}_{DCP} \rightarrow [0,1]$$ [8]. Suppose that there exists a set of interesting measurements (such as Gini) $$m_1, \ldots, m_n$$, then and construct an equivalence relation $$\sim$$ on $$\mathcal{S}_{DCP}$$:
$$P \sim Q, P, Q \in \mathcal{S}_{DCP} \Longleftrightarrow \forall i \in [n],\; \exists f : [0,1] \rightarrow [0,1] \text{ such that } f(m_i(P)) = m_i(Q)$$
This definition is exactly analogous to the definition of universality classes in probability and statistical physics [9] — we choose a set of sufficient statistics (e.g. the measurements) and show that there exists a deterministic way to map these from one class element to another. In the next section, we will try to provide some more intuition for this.
The study of universality classes in mathematics is relatively new — it took a variety of surprising discoveries in the 1980s to prompt a generation of algebraists and topologists to brush up on probability and measure theory. We won’t go into any mathematical detail in this article (this is a post for a blockchain audience, after all!), but we will note that roughly 36% of Fields Medals awarded over the last twenty years went to those studying universality:
- Curtis McMullen (1998)
- Andrei Okounkov (2006)
- Terence Tao (2006) [10a]
- Wendelin Werner (2006)
- Stanislav Smirnov (2010) [10b]
- Atur Avila (2014)
- Martin Hairer (2014)
- Alessio Figali (2018)
Instead, we’re going to try to illustrate what universality looks like with pictures and present a picture of how universality for distributed consensus might look. Let’s start off with this excellent image from Ivan Corwin (Columbia):
In this picture, we see three different probabilistic models for dropping blocks on a one-dimensional surface. The visual differences between the different colors corresponds to statistical differences in the correlations between blocks that are dropped. The first figure, on the left, deposits blocks randomly and has a lot of variance in how the strips of blocks are created. The second image allows for blocks to try to go to local height minima, by moving to the ‘lowest’ point that they can once they reach the surface. Finally, the third image is like ‘tetris’ which has a non-trivial probability of generating holes that can never be filled in. If we put in $$n$$ blocks onto a surface of width $$h$$, the expected height of the blocks is the same in all three models; however, the fluctuations vary widely:
- The random deposition has fluctuations of size $$\sqrt{n}$$
- The random deposition with relaxation has fluctuations of size $$o(n^{\nu}) \;\; \forall \nu >0$$
- The ballistic deposition has Tracy-Widom-esque fluctuation of the form $$n^{1/3}$$
It is a conjecture that that in one-dimension, these are basically the only ways for the height of a block deposition system to evolve [11]. Any transformation that takes a block deposition system from one type of fluctuation size to another has caused a phase transition [12].
Let’s try to map these pictures to the time-evolution of blockchains. Note that this type of model looks like the evolution of the height of the longest chain in a single chain:
- Each miner is a point on the x-axis
- The height of their longest chain at time t (time is the y-axis) is reflected by the colored bars
- The $$\mathcal{F}_{tree}$$ model from Bitcoin actually corresponds to a random deposition model — the only difference is that each entry on the y-axis would actually be a tree, as opposed to a line and the ballistic mechanism would first figure out which miner will get a block and then pick a fork. It is not totally clear what universality class this would be in, although it is possible that such a tree growth process corresponds to the Schramm-Loewner Evolution (which parametrizes a family of universality classes)
- If we’re in a PoS system, the height of the color stripe for a given miner corresponds to how long an epoch takes (e.g. for our height to increase by one)
From this analogy, it is clear that the ideal blockchain looks like the middle picture — there is little variation across miners and the epoch times are roughly the same. Moreover, a phase transition from the middle interface picture to either of other two interfaces is not just a change in the rate of block height growth — it reflects a huge increase in the number of forks. The middle model has a constant expected number of forks, whereas the side pictures have an expected number of blocks that grows with $$n$$. Correctly choosing protocol parameters — such as block production rate, slashing rates, and difficulty adjustment periods — is important to ensuring that a protocol:
a. Achieves a level of statistical consistency in the measurements that we care about (e.g. block height, Gini, etc.) that looks like the middle picture
b. Is far from a phase transition boundary that can make the protocol look like one of the other pictures
This last point is very important for practical protocol design, as one does not want to choose a parametrization that works well most of the time but can be adversarially driven into an inconsistent phase. We should expect a similar behavior in sharded chain, except that we likely have a higher dimensional problem [13]. Finally, note that this analysis is generic to PoW and PoS — the only difference is that the PoS parameters are much more sensitive to pushing the system over a phase transition boundary.
It is natural to ask how one can compute what the universality classes for existing blockchain protocols are, given that we have established that:
- Blockchain systems likely map to existing, known probabilistic universality classes
- Characterizing and classifying systems based on which class they are in provides guarantees that a protocol is stable under ‘small’ deformations (e.g. ones that keep it in the same class)
We are not going to do that in this post and will defer to future papers that I am working on. Instead, we’re going to provide some intuition as to how this looks for the Ising Model, via renormalization:
The most intuitive form of renormalization for the two-dimensional Ising Model is known as block spin renormalization. If we think of the spins as $$n²$$ binary voters on a two-dimensional $$n \times n$$ lattice, this corresponds to clustering voters in groups of size $$k²$$, polling their vote via majority vote, and then treating that as the vote of a new lattice of size $$\frac{n}{k} \times \frac{n}{k}$$. This is illustrated in the above diagram for $$k = 2$$. As we perform this renormalization, we implicitly change the ‘effective temperature’ of the model — the dependencies between adjacent cells in the smaller lattice are ‘weaker’. It turns out that we can only iterate this transformation a certain number of times before we hit the phase transition point. Intuitively, this is because each application of the transformation ‘weakens’ the dependence of each new spin and this effect compounds. At some point, we’ve weakened the dependence so much that the spins at each site look ‘independent’. In statistical physics, one classifies two models to be equivalent if there exists a renormalizing transformation (e.g. one that does not cause us to go over a phase transition) [14]. Thus, we can tease out the universality class of a model by starting from a model whose class is known (e.g. Gaussian universality class) and constructing a renormalizing transformation. In a way, this is the statistical analogue of polynomial time transformations that one constructs to show that a problem is in NP [15]. All of the above blockchain protocols that rely on ideal functionality end up constructing such transformations, albeit using sequences of concentration inequalities!
Before we conclude by circling back to the questions that opened this post, let’s recap:
- The probabilistic threat model analysis of blockchain protocols relies on defining environments and ideal functionalities that model how agents can interact with the system
- These analyses often work by showing that an environment/ideal functionality is ‘statistically indifferent’ to a simpler random model
- Such reductions are very similar to reductions in statistical physics and probability that give rise to universal behaviors (e.g. things that are common to any model, if there exists such a reduction) and is the statistical analogue of a polynomial time reduction of two problems in NP
- We can classify protocols (which are like statistical physics models) based on whether a deterministic, statistics preserving transformation exists between a pair of protocols
Now, let’s go through the questions:
- Can guarantees from an arbitrary single chain protocol be transferred to any other single chain protocol?
- In the above framework, guarantees can be transferred if they can be constructed from the measurements used to define the equivalence relation. For instance, guarantees on how fast fork probabilities decay can be transferred from one model to another if we can guarantee that the height and height fluctuations of the chain are in the same universality class.
- How do we figure out what guarantees are transferable between seemingly different protocols?
- This boils down to how we define the equivalence relation. For single-chain systems, it seems quite clear that most universal properties will be a function of the height and height fluctuations, but for multichain systems, things can be more complicated and we will need to preserve most measurements.
- Can we construct a taxonomy of protocols that connects protocols based on their probabilistic properties?
- Given an equivalence relation, we can partition the set of protocols into equivalence classes that represent whether they are equivalent in probabilistic terms. This is exactly what the formal analyses of existing protocols does!
- Can we use these transferable features to determine our protocol resistant to ‘small’ perturbations or changes?
- The beauty of the equivalence class model is that we can use the distance from a ‘critical point’ (see [12]) to characterize stability of a given model.
Now, it might seem like universal properties are all that we need to guarantee that a protocol reaches consensus and can do this in a stable manner. However, these properties only tell us whether two protocols are qualitatively or topologically the same. A concrete analogy to this is if I told telling you that two algorithms run in $$O(n^k)$$. In practice, we also need to optimize the real performance, which is analogous to optimizing the pre-factor or constant in big-Oh notation. In the final blog post, we are going to discuss how one can view consensus from the lens of distributed optimization and how tools from simulation and optimization can help find the ‘best’ protocol within a given equivalence class.
The author would like to thank Guillermo Angeris, Arjun Balaji, Arthur Breitman, Uthsav Chitra, Clément Hongler, Niraj Pant, Illia Polosukhin, Dan Robinson, Kevin Sekniqi, John Sheridan, Yi Sun, and Howard Wu for reviews and comments on this post.
[0] Quantitative extensions such as the Berry-Esseen theorem and Talagrand’s concentration inequality generalize this and show that a) only independence is needed and b) the bounds are tight and can be controlled by higher moments
[1] See this paper of Rafael Pass for more details. Note that the error tolerance depends on the Poisson parameter, which is a function of the block production time (~10m for Bitcoin) and the delay diameter of the miner graph (estimated to be ~100ms). Also, the author would like to thank Dan Robinson and Howard Wu for some helpful comments regarding this section.
[2] See the amazing book, The Life and Times of the Central Limit Theorem by William J. Adams, for a detailed look at the history of this theorem.
[3] The perspective of physicists in these fields is that one should value reproducible experimental observables more than they value precise dynamics. That is, if I measure the density of a glass of water in New York or London and all else is equal (e.g. temperature, pressure, elevation, etc.), then we should get the same answer. Microscopic estimates depend on the precise particle orientations and interactions, whereas macroscopic observables average out these minute differences to describe things that are true over a wide variety of scenarios
[4] Real systems usually have more complicated spins (e.g. sections of a $$\mathsf{SU}(2)$$ fiber bundle), but we’ll start with this simple example, which captures blockchain examples better.
[5] The full Ising Model is described by a sequence of weights $$J_{i,j}$$ that represents how much node i is influenced by node j. There are some beautiful connections between these weights and complex geometry and moduli spaces from String Theory (e.g. see this review that is going to become a book from Scott Sheffield), but this is waaaay out of scope for a blockchain post.
[6] Concentration Inequality Notes:
a. Technically, this also depends on other features of a random variable (e.g. it’s support) and possibly on measure theoretic conditions (e.g. how fine the $\sigma$-algebra is and what cylinder sets are supported) that ensure that we can use things like the Cameron-Martin Theorem
b. For provenance, let’s go over the use of concentration in these three examples:
- Pass, Seeman, Shelat: Use multiplicative Chernoff bounds (see sections 7 and 8)
- Ouroboros Praos: Standard Chernoff
- Ekiden: More or less reprove the Markov Inequality
[7] As per the Bitcoin Backbone Protocol, this space is defined by the set of programs that halt in finite expected time on Probabilistic Polynomial Time Turing Machines.
[8] One might argue that this restriction to a compact range $$[0,1]$$ is not general enough. However, it makes the probabilistic analysis easier and it encodes all mappings to the integers (e.g. take any non-negative integral mapping $$m : \mathcal{S}_{DCP} \rightarrow \mathbb{N}$$ and compose it with the map $$\mathsf{inv} : \mathbb{N} \rightarrow [0,1], \; \mathsf{inv}(i) = \frac{1}{i}$$; we can do the same for $$\mathbb{Z}, \mathbb{Q}$$ via a half-embedding and diagonalization, respectively). Plus, we can assume that our protocols are using a bounded real number model (e.g. an SMT solver wouldn’t use an actual real model) which naturally leads to the unit cube.
[9] Universality is usually described by either explicit concentration bounds or tail bounds; however, for the sake of providing intuition (especially given that the blockchain and probability communities have very small intersection), I’m going down the route of measurements. This is usually what students are taught when they first learn about the equivalence of different exponential family distributions (via sufficient statistics, which are the statistical analogue of measurements) and is more intuitive.
[10] Fields Medal notes:
a. To be fair, he studied a lot more than just universality; I suppose the Green-Tao theorem for arithmetic progressions of primes is the real reason that he won the Fields Medal that year. A lot of his better work on universality came after the Fields Medal (e.g. SVD universality).
b. Smirnov won for proving that the Ising Model, in a certain renormalized limit, converges to a well-defined infinite-dimensional probabilistic process that is distinct from the Gaussian process.
[11] There is another technically another class, with $$t^{1/4}$$ fluctuation scaling, but its construction is quite similar to the KPZ case (e.g. the $$t^{1/3}$$ case)
[12] Formally, a parametrized family of probability measures, $$\mu(\vec{p}), \vec{p} \in \mathbb{R}^k$$ undergoes a phase transition at $$p^*$$ if for all one parameter paths $$\gamma : [0,1] \rightarrow \mathbb{R}^k$$ with $$\gamma(0.5) = p^*$$ we have $$\mu(\gamma([0,\frac{1}{2}))$$ is in a universality class that is disjoint from $$\mu(\gamma((\frac{1}{2}, 1])$$.
[13] The dimension here is the minimum embedding dimension of our miner/validator/user graph. A naïve bound for this dimension is the delay diameter of the user graph. There may also be a dimensional penalty for the embedding cost of the transaction graph (e.g. HashGraph, MimbleWimble on a user graph $$G$$ might have worse embedding dimension than Bitcoin on $$G$$ because we need to embed features of the transaction graph as well)
[14] Confusingly, Physicists refer to the set of such transformations as the renormalization group, but this is a misnomer. There exist a variety of renormalizing transformations that do not form a group or semi-group. The historical reason for this terminology is because the original usage of renormalization came from the study of Wilson Loops, whose renormalizing transformations do form a group (in particular, a holonomy group of a manifold).
[15] It should be noted that computing these transformations can be very hard. For instance, the proof for the form of the exponent of the 3D Ising Model, which has been estimated by simulation to high-precision, for the past fifty years, is still eluding researchers today. The conformal bootstrap program is the closest that we have gotten in the 21st century. This is why simulation is necessary (and we will emphasize this more in the next post)
Dropbox Paper Link
Blog