1.x file: January call summary
January 14th tl;dc (I didn’t call because it was too long)
disclaimer: This is a summary of topics discussed in recurring Eth1.x research meetings and does not represent final plans or commitments to network upgrades.
The main topics of this call are:
- High-level data quantifying the benefits of switching to a binary trie structure.
- Transition strategies and potential challenges for transitioning to binary trials
- “Merclising” Contract Codes for Witnesses and What They Mean for Gas Scheduling/Metering
- Chain pruning and historical chain/state data – network influence and deployment approaches.
logistics center
The weekend after EthCC (March 7-8) there will be a smaller 1.x Research Summit, aimed at providing several days of robust discussion and work on the topic at hand. Sessions are limited to 40 attendees (due to venue constraints). This will be sufficient for prospective participants.
There may be informal, ad-hoc gatherings around Stanford Blockchain Week and ETHDenver, but nothing is explicitly planned.
The next call is tentatively scheduled for the first or second week of February, between now and the Paris summit.
technical discussion
EIP #2465
Although not directly related to stateless Ethereum, this EIP improves the network protocol for transaction propagation, so it is a very simple improvement that moves things in the right direction for our research objectives. Apply!
Reduce binary try size
Switching to a binary trie structure (instead of the current hexadecimal trie structure) should theoretically reduce the size of the witness by a factor of 3.75. But in reality, depending on how you look at it, the reduction may only be half that..
Witness is about 30% code and 70% hash. Using a binary tree does not improve your code because the hashes within the tree are reduced by a factor of 3 but must always be included in the watch. Therefore, switching to a binary tree format reduces the watch size from ~800-3,400 kB for a hexadecimal tree to ~300-1400 kB.
Create a switch
Executing the actual conversion to binary trie is another matter, and there are several questions that need to be fleshed out. There are essentially two different possible strategies that can be followed.
gradual transition — This is a ‘Ship of Theseus’ transition model where the entire state tree is migrated into a per-account and per-StorageSlot binary format as each part of the state is touched by the EVM execution. This means that forever Ethereum’s state will be a hexadecimal/binary hybrid and that you will need to “fork” your account to update to the new tree format (possibly using the POKE opcode). The advantage is that this does not disrupt the normal functioning of the chain and does not require large-scale adjustments to upgrade. The downside is that it’s complicated. The process is never really “complete” because both hexadecimal and binary trie formats must be considered by the client, and some parts of the state are not accessible from the outside and must be explicitly specified by their owners. That probably won’t happen statewide. A progressive strategy would also require clients to modify their databases into some sort of ‘virtualized’ binary tree within a hexadecimal database layout to avoid sudden, dramatic increases in storage requirements for all clients (Note: This database improvement is dependent on the operating environment). may occur without it). A complete ‘progressive’ transition would take place, and that alone would still be beneficial).
Calculate and organize neatly — This is an ‘on the fly’ transition via one or more hard forks, where a future date is chosen for the transition and all participants in the network must recompute the state into a binary tree. , and then switch to the new format together. This strategy is, in a sense, ‘simpler’ to implement because it is simpler from an engineering perspective. However, from a coordination perspective it is more complex. The new binary tree state must be precomputed before the fork, which can take an hour or so. It is not clear how transactions and new blocks will be processed during that period (since they will need to be included in a binary state tree and/or legacy tree that has not yet been computed). This is due to the fact that many miners and exchanges prefer to upgrade their clients at the last minute. The fact that this process will be more difficult – or you could even imagine stopping the entire chain for a short period of time to recalculate the new state – would be a much more tricky and potentially controversial process to coordinate .
Both options are still ‘on the table’ and require further consideration and discussion before a decision is made regarding next steps. In particular, we consider the trade-offs between implementation complexity and coordination issues.
Code “Chunking”
Some prototyping work has been done on code ‘merklization’ while covering the witness’ code portion. This essentially allows you to split the contract code into chunks before putting it into the witness. The basic idea is that when a method in a smart contract is called, the witness only needs to include the portion of contract code that was actually called, not the entire contract. This is still early research, but it suggests a further ~50% reduction in Witness’s code portion. More ambitiously, one could extend the execution of chunks of code to create a single global ‘code tree’, but this is not a well-developed idea and may have its own challenges that require further investigation.
There are various methods you can use to split your code into chunks and then generate witnesses. The first is ‘dynamic’. The problem is that it looks for the JUMPDEST instruction and splits near that point, resulting in variable chunk sizes depending on the code being split. The second is ‘static’, which divides the code into fixed-size chunks and adds the necessary metadata that specifies the correct jump target location within the chunk. Either of these two approaches may be valid, both may be compatible, and it can be left up to you to decide which one to use. Either way, using chunks can further reduce witness size.
(a)Gas
One open question is what changes to gas scheduling will be necessary or desirable with the introduction of block witnesses. Witness production costs must be paid in gas. When code is chunked, multiple transactions within a block overlap parts of the same code, causing every transaction in the block to pay part of the block watch more than once. A safe idea (and a good idea for miners) would seem to be to leave it to the transaction poster to pay the full cost of his or her transaction witness, and then let the miner keep the overpayment. This minimizes the need for gas cost changes and encourages miners to generate witnesses, but unfortunately breaks the current security model of only trusting sub-calls (in transactions) using a portion of the total committed gas. How you handle changes to your security model must be fully and thoroughly considered. Ultimately, the goal is to charge each transaction the cost of creating its own witness proportional to its code.
Wei Tang’s YOUNG proposal Changes to the EVM can be made more easily. This is not strictly necessary for stateless Ethereum, but it is an idea for how to make groundbreaking changes to the gas schedule easier in the future. The question to ask is, “What does the change look like with and without UNGAS? And, taking this into account, does UNGAS actually make this feature much easier to implement?” Answering this requires experimentation to run the job with Merkized code and the new gas rules, and then see what needs to change regarding the cost and execution of the EVM.
Cleaning and data delivery
In a stateless model, nodes missing some or all of their state need a way to inform the rest of the network what data they have and what data they are missing. This affects network topology. A data-starved stateless client must be able to reliably and quickly find the data it needs somewhere on the network, as well as proactively broadcast data it doesn’t have (and may need). . Adding this functionality to one of the chain pruning EIPs is a networking (but not consensus) protocol change, something that can still be done today.
The second aspect of this problem is where to store historical data, and the best solution proposed so far is an Eth-specific distributed storage network that can provide the requested data. This can take many forms. Complete states can apply ‘chunking’ similar to contract code. Partial state nodes can watch over (randomly assigned) chunks of state and provide them via requests at the edge of the network. Clients can use additional data routing mechanisms to ensure that stateless nodes continue to obtain missing data through intermediaries (connected to other nodes that do not have the required data but do have the required data). However, the general goal of the implementation is that a client should be able to join the network and reliably get all the data it needs, effectively without competing to connect to the full state node. This actually happens in the LES. node now. Work surrounding these ideas is still in its early stages, but the geth team has been experimenting with ‘state tiling’ (chunking) with some promising results, and Turbo-geth is working on data routing to gossip parts of the state.
As always, if you have any questions about Eth1x efforts, topic requests, or contributions, please attend the event and introduce yourself at ethresear.ch or contact @gichiba and/or @JHancock on Twitter.