Hive: How we strive for a clean fork
The DAO soft fork attempt was difficult. Not only did we underestimate the side effects of the consensus protocol (e.g. DoS vulnerabilities), but we also succeeded in introducing data races into a hasty implementation that was a time bomb. It wasn’t ideal, and despite the last-minute aversion, it looked terribly bleak, not to mention the rapidly approaching hard fork deadline. I needed a new strategy…
The stepping stone towards this was an idea borrowed from Google (courtesy of Nick Johnson). autopsy It aims to assess the root cause of the problem, focusing only on technical aspects and appropriate measures to prevent recurrence.
Technology solutions scale and last. I don’t blame people. ~ Nick
Post-mortem analysis yielded one interesting discovery from the perspective of this blog post. The softfork code inside go-ethereum (https://github.com/ethereum/go-ethereum) looked solid from all perspectives. a) Thoroughly covered with 3:1 tests vs. unit tests. code rate; b) Thoroughly reviewed by 6 foundation developers. c) Even manually tested in real time on a private network. However, critical data races still remain, potentially leading to severe network outages.
It turns out that this flaw can only occur in networks consisting of multiple nodes, multiple miners, and multiple blocks being produced simultaneously. Even if all of these scenarios were true, it was highly unlikely that a bug would have surfaced. Unit tests won’t catch this, code reviewers may or may not catch it, and manual testing is unlikely to catch it. Our conclusion was that development teams need more tools to perform reproducible tests that address the complex interactions of multiple nodes in concurrent network scenarios. Without these tools, manually checking various edge cases would be unwieldy. If you don’t perform these checks consistently as part of your development workflow, it becomes impossible to catch rare errors in a timely manner.
thus, hive Born…
What is Hive?
Ethereum has grown to the point where testing implementation is a huge burden. Unit tests are great for checking for a variety of implementation issues, but it’s quite simple to verify that a client adheres to some basic qualities, or that the clients can work well together in a multi-client environment.
hive It is intended to serve as an easily expandable test harness. everyone Testing (simple verification or network simulation) can be added. any It’s a programming language you’re comfortable using, and Hive should be able to run those tests simultaneously. every potential customers. Harness is therefore intended to perform black box testing where client-specific internal details/states cannot be tested and/or inspected, rather the focus is on compliance with formal specifications or behavior in a variety of situations.
Most importantly, Hive is designed from the ground up to run as part of every client’s CI workflow!
How does Hive work?
The body and soul of Hive is (docker)(https://www.docker.com/). All client implementations are Docker images. All verification suites are Docker images. All network simulations are Docker images. Hive itself is an all-encompassing Docker image. This is a very powerful abstraction…
from Ethereum client Since it is a Docker image of Hive, client developers can configure the best environment (in terms of dependencies, tools, and configuration) in which the client can run. Hive spins up as many instances as you need, all running on your own Linux machines.
Likewise test suite When verifying that the Ethereum client is a Docker image, test authors can use the programming environment with which they are most familiar. When Hive starts the tester, it can check if the clients are running and then verify that a particular client adheres to the desired behavior.
finally, network simulation Although redefined as a Docker image, compared to simple tests, the simulator not only executes code against a running client, but can actually start and terminate the client at will. These clients run on the same virtual network and can freely (or as directed by the simulator container) connect with each other to form an on-demand private Ethereum network.
How did Hive help Fork?
Hive is not a replacement for unit testing or thorough reviews. All currently used practices are essential to cleanly implement all features. Hive can provide validation beyond what is possible from the average developer’s perspective. This means running a wide range of tests that may require complex execution environments. Identify networking corner cases that can take hours to set up.
For a DAO hard fork, beyond all the consensus and unit testing, the most important thing was to ensure that the nodes were cleanly split into two subsets, one supporting the fork and one opposing it, at the networking level. This was essential because, especially from a minority perspective, it is impossible to predict what side effects running two competing chains on one network might have.
Therefore, we implemented three specific network simulations in Hive.
first We ensure that miners running the full Ethash DAG generate the correct block extra data fields for both pro and non-poker, even if they are naively trying to spoof.
second Ensures that a network of mixed pro-fork nodes/miners is correctly split in two when a fork block arrives, and that the split is maintained thereafter.
third In an already forked network, ensure that newly joining nodes can perform sync, fast sync, and light sync to the selected chain.
But the interesting question is, did Hive actually catch the error, or did it just serve as further confirmation that everything was okay? And the answer is, both. Beehive caught Three fork-related bugs We have greatly supported the development of Geth’s hard fork by continuously providing feedback on how changes in Geth affect network behavior.
There was also criticism of the Gothereum team for investing time in implementing the hard fork. I hope that now people can know what we are doing while also implementing the fork itself. All in all, I believe. hive It turns out that this plays a very important role in the purity of the transition.
What does the future hold for Hive?
The Ethereum GitHub organization feature (already 4 test tools) (https://github.com/ethereum?utf8=%E2%9C%93&query=test) uses at least one EVM benchmark tool from some external repository. They are completely underutilized. They have tons of dependencies, generate tons of junk, and are very complicated to use.
With Hive, we aim to unify all the various distributed tests into one. Universal client validator It has minimal dependencies, can be extended by anyone, and can be run as part of a client developer’s daily CI workflow.
We welcome everyone to contribute to the project, whether it’s adding a new client to verify, a validator to test, or a simulator to find interesting networking problems. In the meantime, we’ll be working to further improve Hive itself by adding support for benchmark execution and mixed client simulations.
With a little effort, you will be able to run Hive in the cloud, allowing you to run network simulations at a much more exciting scale.