Why Has The Byzantine System Been Unknown to The Public for Over 30 Years Until The Advent of Blockchain?

Sep 17, 2018

It has been more than 30 years since the Byzantine system was developed, and it was only in 2009when Satoshi Nakamoto re-optimized the solution to the Byzantine Problem by using bitcoin and blockchain models thatthe Byzantine system gradually became known to the public.

Why didn’t the Byzantine system attract much attention before the advent of blockchain technology when the Byzantine Problem could reach agreement on the great human trust decision? What changes has blockchain actually made?

Background:

The Byzantine Problem, first put forward by Professor Lamport in 1982, can be understood simply as a matter of three or more generals deciding whether to attack or retreat.

In this decision-making process, if a general issues an order, the others, as lieutenants, decide whether to attack or retreat. But if one or more generals or lieutenants are traitors, there will be problems in decision-making and no concerted action. If the general is a traitor, he might ask one lieutenant to attack and the other to retreat. Likewise, if one lieutenant is a traitor, he might tell one lieutenant that the general had asked him to attack, while telling another lieutenant that the general had asked him to retreat, hence no concerted action. The Byzantine system was designed to reach a consistent decision in this complex environment.

As our start-up business was mainly to help enterprises quickly identify the causes of system problems such as slow use and crash caused by the increased concurrent pressure after the launch of the app, we have accumulated a large amount of business data and applied various distributed system schemes. This year, our team is mainly engaged in Lamba distributed storage to solve the pain point that there is no distributed storage available to DApp data in the blockchain. Therefore, we have fully experienced the evolution from distributed system to blockchain. The following shows the changes of the Byzantine system.

I. When N>=3f+1 the Byzantine Problem can be solved

First, let’s understand the objective precondition (N>=3f+1) for solving the Byzantine Generals Problem. In other words, if the number of nodes of the Byzantine Generals Problem is N<=3f, then the Byzantine Generals Problem will not be solved. Let’s take three nodes as an example to deduce:

In the left-hand scenario, the lieutenant P3 malfunctions. In the right-hand scenario, the commander P1 malfunctions. We can see the two rounds of message exchanges: the values sent by the commander and the values sent by the two lieutenants to each other. Prefixes of the values show the source of the messages and the different numbers of rounds.

In the left-hand scenario, P2 receives two different values (v, u), but does not know which one was sent by the commander. So he cannot make a decision.

In the right-hand scenario, because the commander sent two different values by mistake, P3 also receives two different values and cannot tell which one is correct.

When the number of nodes solving the Byzantine Generals Problem is N>=3f+1, let’s take N>=4, f=1 as an example to deduce the above problem again:

When the left-hand scenario occurs, the function majority(which returns at most to the final result) can take the value in the majority as the final decision value of lieutenants. Therefore, lieutenants can come to the same decision: P2decision: majority(v,u,v)=v

P4 decision: majority(v,v,w)=v

In the right-hand scenario, the commander malfunctions, but the three correct lieutenant nodes can reach an agreement:

P2,P3,P4 decision: majority(u,v,w)= ⊥ (which represents the fact that there is no value in the majority)

This deduction was done by Professor Lamport in 1982, and it’s easy to see that the logical solution to the Byzantine Problem is very simple: we just need to make the number of decision nodes N>=3f+1.

But why was the Byzantine system not widely used or recognized for so many years?

Although the Byzantine system can reach a final consensus on human trust decision and be applied in many business scenarios, we found that the system was very inefficient when we wanted to use it. We know that the cost of each round of decision-making is the exponential growth in the number of rounds of messaging, so we use this approach to solve the Byzantine Problem. However, the inefficiency prevents the system from being applied in the real business world.

II. The technological evolution of the Byzantine system

PBFT Byzantine system

The improvement of efficiency has long been the main direction of the technological evolution of the Byzantine system. In 1999, Castro and Liskov introduced the Practical Byzantine Fault Tolerance (PBFT) system, which first reduced the complexity of Byzantine protocol from exponential level to polynomial level, making it possible to apply the Byzantine protocol in distributed systems.

As shown in the figure below, the PBFT consistency protocol takes five phases to complete each client request. It executes the client request through two pairwise interactions after the server has reached agreement. Since the client cannot get any information about the status of the server from the server side, only the server can monitor if the primary of the PBFT protocol fails. If the server cannot complete the client request for a period of time, the view-change protocol (i.e. node switching) is triggered.

The PBFT is designed to do all the work on the server side, such as achieving consistency and monitoring Byzantine primaries. The problem with it is that even though the complexity has been reduced relative to the Lamport model, the design of the consistency protocol is still complex, with two phases requiring pairwise communication between servers, a large amount of data processing, and complex calculations.

HQ Byzantine system

Therefore, in 2006 Cowling and others improved the PBET by HQ. The following figure is the schematic diagram of the HQ model:

We can see that in the HQ protocol communication model, the selection of primaries has been removed. The client issues a request to all nodes and checks in the Write 1 phase to see if there are 2f+1 nodes out of 3f+1 nodes (minimum number of decision nodes) with a common return value. If so, it means that the server is in the same state.

In the HQ model, requests are performed in two phases. In the first phase, the client collects the state of the server when sending the request. If more than 2f+1 servers are in the same state and can execute the client’s request, then the client will execute the operation of the second phase, i.e. send orders for the servers to execute the client request. Otherwise, the request encounters competition and the PBFT process will be executed.

Therefore, the HQ protocol communication model does not change the architecture of the PBFT, but is effective only when the client sends the request and there is no competition.

Speculation-based Byzantine system

Dahlin and others proposed the Zyzzyva and Zyzzyva5 protocols in 2007, bringing the Speculation technology into the Byzantine protocol. Because the Zyzzyva protocol client does not have to wait until the interactive confirmation between servers, it greatly improves the Byzantine system’s performance.

The main idea behind it is that the server is in a normal state most of the time, so every request can be executed without waiting until reaching agreement which is only needed when an error occurs.

The Speculation mechanism differs from traditional protocols mainly in the consistency protocol. The figure below shows the execution process of Speculation. When a server receives a request from the client, it does not conduct an expensive pairwise interaction like the traditional protocols, but directly executes the client request.

As long as the client receives the feedback from all servers (execution result, server status, execution history, etc.) and the information is consistent, the client considers the result is correct and returns it to the upper application. Otherwise, after receiving the feedback from 2f+1 servers, the client executes a sequence number confirmation phase, telling at least f+1 non-Byzantine servers that at least f+1 non-Byzantine servers have executed the request.

In general, in a Speculation mechanism, if the execution results of the servers are consistent, the client will adopt the result. If they are not consistent, the client discards the results and feeds back to the server to trigger the view change protocol to switch the primary. This may temporarily cause system inconsistency, but it does not affect the execution of non-Byzantine client requests.

If the client is Byzantine, it doesn’t matter if the system results are inconsistent. If the client is non-Byzantine, it will trigger the view change protocol to adjust the system when the system is found to be inconsistent, so as to ensure the consistent state of the system.

The small number of stages needed for a single request of the Speculation technology is ideally optimized, reducing the system’s response delay. In addition to the server-based Speculation technology, the client-based Speculation technology was also proposed in 2009. Its essence is the improved Zyzzyva algorithm and its core idea is that the client does not need to receive the feedback from all servers before executing an operation, but responds after receiving the first server request. This greatly improves the system’s efficiency when there are no Byzantine nodes in the system, but when Byzantine nodes exist, there are still efficiency problems in large-scale deployments. So despite the broad commercial prospect, the Byzantine system has not been popular among the mainstream media.

III. The optimization of the Byzantine Generals Problem by blockchain

As we all know, the great innovation of blockchain is further optimizing the efficiency of the Byzantine system for large-scale node deployment.

The difficulty of the Byzantine Generals Problem is that at any time, a node has to consider how to make a decision in the face of multiple proposals.

Blockchain optimizes this model by introducing the POW mechanism that allows only one node to make a request at the same time by competing in block generation. Meanwhile, the asymmetric encryption technology ensures that message receivers can confirm the identity of senders, and the system becomes a trusted distributed network.

In addition, blockchain is different from the previous forms of optimizing the Byzantine Problem through pure algorithm evolutions. Blockchain also introduced an economic model that ensures the cost of Byzantine errors is negative. We know that a fundamental argument in the security world is that the benefits of an attack must outweigh the costs.

In the blockchain, POW (Proof of Work) determines who serves as the general of the next node. If you want to be a traitor and attack the entire network, you need to pay the corresponding cost which, under the POW consensus mechanism of bitcoin, means possessing over 50% computational power of the entire network. In other words, more than 50% traitors are required, which is a much higher fault tolerance rate than the PBFT. From the economic prospective, if you really had that much computational power, the benefit of maintaining the network (honest mining) would actually be higher than the cost of destroying the network.

About Lambda

Lambda is a fast, safe, and scalable blockchain infrastructure project, which provides decentralized applications (DAPPs) data storage capabilities with unlimited scalability through logic decoupling and independent implementation of Lambda Chain and Lambda FS. Lambda is unique as it achieves Provable Data Possession and Proofs of Retrievability by verifying the consensus of nodes, thus ensuring the integrity and retrievability of data stored on “un-trusted storage nodes”.

Comments