A zone-structured unified binary state tree for Ethereum with uniform 256-bit keys, structural per-account subtree boundaries, and future-proof collision resistance.
The Partitioned Binary Tree (PBT) is a unified binary state tree for Ethereum that stores all state — account headers, contract storage, and contract code — in a single non-sparse binary trie while preserving structural boundaries between semantic categories of data.
Unlike a flat unified tree where all key-value pairs are undifferentiated, PBT uses a deterministic key derivation scheme that embeds zone and account identity information directly into the key structure. This creates topological boundaries in the tree that protocols can reason about: a node at a known depth is always the root of a specific account's storage, or the root of all code, etc.
PBT uses uniform 256-bit keys across all zones, with zone prefixes encoding data category. In a non-sparse trie, unused key space costs nothing — actual tree depth adapts to content density, not key width. Storage achieves collision resistance within 256 bits by combining a 60-bit address prefix for account-local grouping with a 187-bit stem suffix derived from both the address and storage index.
PBT partitions the key space into three zones, each storing a different category of Ethereum state. The zone is encoded in the most significant bits of every key using variable-width prefixes: 1 bit for storage, 3 bits for accounts and code, and 2 bits for a reserved zone.
Storage uses a 1-bit prefix ("1"), giving it 50% of the top-level tree address space. At depth 1, the right child of the root IS the root of all storage. Accounts ("000") and code ("001") use 3-bit prefixes, each occupying 12.5%. Prefix "01" (25%) is reserved for future state categories.
Ethereum's state is approximately 75% contract storage, 20% account headers, and 5% code. Allocating equal tree space to all data types would create an imbalanced tree where the storage subtree is significantly denser than the others.
By giving storage a 1-bit prefix (50% of tree space), the tree is more balanced at the top levels. This reduces average proof depth for storage lookups — the most common operation — and creates a natural split at depth 1: left child = accounts and code, right child = all storage.
Every key in PBT is 256 bits and encodes semantic segments: the zone prefix (which data category), address-derived hash bits (identifying the Ethereum account), a path segment (which item within the account), and a sub-index (position within a stem group of 256 leaves). Storage keys additionally include an address prefix for account locality.
Each Ethereum account occupies a single header stem in zone 000, following the EIP-7864 layout. The stem holds up to 194 populated leaves: account metadata, the first 64 storage slots, and the first 128 code chunks — all co-located under one 248-bit prefix. The 245-bit address hash provides 2¹²²·⁵ birthday resistance.
BASIC_DATA packing (32 bytes):
The 3-byte code_size supports contract sizes up to 16 MB — orders of magnitude beyond any foreseeable contract size limit (current mainnet: 24 KB, Glamsterdam: 40 KB). The 4 bytes of reserved space are available for future protocol fields (epoch counters, flags, etc.).
The header stem packs account metadata, hot storage, and initial code into a single stem of up to 194 leaves. All share the 248-bit prefix 000 || H(addr)[:245] and differ only in the last 8 bits (sub_index). Proving an account's nonce, balance, and first few storage slots requires opening only one stem — maximizing proof sharing for the most common access patterns.
128 code chunks × 31 bytes = ~4 KB of per-account code. This covers the majority of deployed contracts by count. Contracts larger than ~4 KB store their remaining chunks in the content-addressed code zone (zone 001).
64 storage slots × 32 bytes = 2 KB of hot storage. Slots 0–63 are the most accessed in typical contracts (ERC-20 balances, ownership, reentrancy guards). Slots ≥64 are stored in the storage zone (zone 1).
Code chunks beyond the first 128 (which live in the account header stem) are stored in zone 001 using content-addressed keys. The stem path is derived from code_hash (keccak256 of the full bytecode) and tree_index, so contracts with identical bytecode share code zone leaves — deduplicating storage for thousands of identical ERC-20 contracts from the same factory.
tree_index = (chunk_id − 128) // 256. The header stem covers chunks 0–127. With Glamsterdam's 40 KB limit: ~1,290 total chunks. Header covers 0–127, code zone covers 128–1,289: tree_index ∈ {0, 1, 2, 3, 4}. Sequential chunks share stems: chunks 128–383 share one stem, 384–639 another.
Collision security: H(code_hash || tree_index)[:245] provides 2¹²²·⁵ birthday resistance between any two distinct (code_hash, tree_index) pairs. Since code_hash is keccak256(bytecode), a 256-bit uniform output, truncation to 245 bits preserves uniformity. A collision would require finding two distinct bytecodes whose hashed (code_hash, tree_index) pairs match in 245 bits — computationally infeasible.
Why 8 bits for sub_index? The sub_index = (chunk_id − 128) % 256 groups up to 256 consecutive chunks under the same stem node. Each stem covers ~8 KB of bytecode (256 × 31-byte chunks). This is the same 8-bit stem width used in all zones for consistency. Sequential code execution naturally reads consecutive chunks within a stem.
Storage slots ≥64 are stored in zone 1 (slots 0–63 live in the account header stem in zone 000). Storage keys are 256 bits — the same width as all other zones. The key uses a 60-bit address prefix for account-local grouping, followed by a 187-bit stem suffix derived from both the address and tree_index. Storage slots are adversarially addressable (any contract can write to any slot key), so the stem suffix must resist birthday attacks — 187 bits provides 2⁹³·⁵ birthday resistance.
tree_index = slot_key // 256 (32-byte big-endian integer division). Slots 64–255 share the first zone 1 stem (tree_index = 0, sub_idx 64–255), slots 256–511 share a stem, etc. Stem co-location is deterministic — adjacent slots always group together. Slots 0–63 are accessed from the account header stem (zone 000, sub_idx 0x40–0x7F) without ever touching zone 1.
In practice, even the largest Ethereum contracts use only a few thousand storage slots. The useful information in the stem suffix is maybe 10–15 bits of entropy. Yet we allocate 187 bits. Most of this key space will never contain leaves.
This is not a problem — and it is exactly how Ethereum's current Merkle Patricia Trie already works. In the MPT, storage keys are keccak256(slot_key) — a full 256-bit hash. A typical contract uses fewer than 1,000 of the 2²⁵⁶ possible slots. The hexary Patricia trie adapts: it only creates nodes where keys actually diverge, resulting in ~3–4 levels of branching (log₁₆(1000) ≈ 2.5), not the theoretical maximum of 64 levels.
PBT's non-sparse binary trie behaves identically. A contract with 1,000 storage slots produces ~10 levels of branching below the per-account bucket boundary (log₂(1000) ≈ 10). Empty regions of the key space create no nodes, no storage overhead, and no proof overhead. The wide key space exists solely to provide cryptographic collision resistance — the same function it serves in today's MPT, but made explicit.
The 60-bit address prefix creates a soft per-account bucket boundary at depth 61 in the storage zone (1-bit zone prefix + 60-bit prefix). For the vast majority of accounts, this bucket contains exactly one account's storage — providing de facto per-account grouping.
Prefix collisions are not consensus-fatal. At 10 billion accounts, the expected number of address pairs sharing a 60-bit prefix is ~43. These accounts would share a bucket in the tree, meaning their storage leaves are interleaved below depth 61. This is benign — the accounts' data remains distinguishable via the stem suffix H(addr || tree_index), which binds each stem to both the address and the slot index. Two accounts sharing a bucket simply have slightly reduced parallelism at that bucket boundary.
Why H(addr || tree_index)? The stem suffix hashes the address together with the tree_index, not just the tree_index alone. This ensures that two different accounts sharing the same 60-bit prefix produce independent stem suffixes — there is no structural correlation between their storage layouts. An attacker who finds a prefix collision gains no ability to predict or influence the victim's storage key positions within the shared bucket.
Each zone uses a different amount of address-derived bits, tailored to its threat model and structural needs:
Account zone (000): H(addr)[:245] — the full hash (minus zone prefix and sub_index) provides maximum collision resistance at 2¹²²·⁵ birthday bound. The header stem holds up to 194 leaves (BASIC_DATA, CODE_HASH, 64 storage slots, 128 code chunks), all sharing the same 248-bit prefix.
Code zone (001): H(code_hash || tree_index)[:245] — content-addressed stems with 2¹²²·⁵ birthday resistance. No per-account boundary in the code zone; per-account code locality is provided by the first 128 chunks in the header stem (zone 000). Contracts with identical bytecode share code zone leaves.
Storage zone (1): H(addr)[:60] prefix + H(addr || tree_index)[:187] suffix — a 60-bit prefix creates a soft per-account bucket at depth 61. The 187-bit suffix is bound to both address and slot index, providing 2⁹³·⁵ birthday resistance within each bucket. This design fits storage's unique requirements: many adversarially-chosen slots per account, with account-local grouping that doesn't require extended key width.
All zones use 256-bit keys. There is no variable-width complexity. Each zone achieves its security and structural goals within the same 256-bit envelope:
Storage achieves collision resistance via a 187-bit stem suffix (birthday 2⁹³·⁵) within 256 bits. The key structure uses H(addr || tree_index) to bind each stem to both the address and slot index, preventing cross-account structural correlation.
Account locality in the storage zone is achieved via a 60-bit address prefix, providing soft per-account grouping without extending key width. At 10B accounts, ~43 address pairs share a prefix — benign because prefix collisions affect only grouping, not data integrity.
Code chunks use content-addressed keys: H(code_hash || tree_index)[:245] provides 2¹²²·⁵ birthday resistance and enables bytecode deduplication. Account headers use 245 bits of address hash for maximum per-account collision resistance (birthday 2¹²²·⁵).
SSTORE/SLOAD — it never sees tree keys. The entire key derivation happens inside the Ethereum client, below the EVM layer. No contract, Solidity code, or Yul assembly needs to change. This is the same abstraction boundary that exists today: the MPT internally hashes every storage slot through keccak256(slot_key) to produce a 256-bit trie path, and every account address through keccak256(address) — yet no contract is aware of this transformation. PBT simply replaces what happens behind that boundary.The key derivation creates deterministic structural boundaries at known depths. Any implementation or protocol can compute the exact depth of a per-account subtree root, a stem node, or a zone root without inspecting the tree.
Per-account boundaries differ by zone, reflecting each zone's structure:
Code zone (001): No per-account structural boundary. The code zone uses content-addressed keys — H(code_hash || tree_index)[:245] — so stems are organized by bytecode content, not by account. Per-account code locality is provided by the first 128 code chunks in the account header stem (zone 000, sub_idx 0x80–0xFF). Code zone leaves for a given bytecode are shared across all contracts with that bytecode.
Storage zone (1): Probabilistic per-account grouping at depth 61 (1-bit zone prefix + 60-bit address prefix). For the vast majority of accounts, the node at path 1 || H(addr)[:60] roots exactly one account's storage. For the ~43 collision pairs expected at 10B accounts, two accounts share a bucket — their data is interleaved below depth 61 but remains distinguishable via the stem suffix.
Account zone (000): Each account's header stem holds up to 194 leaves (BASIC_DATA, CODE_HASH, 64 storage slots, 128 code chunks). All share the 248-bit prefix 000 || H(addr)[:245] and form a single stem. This is the primary per-account structural boundary in PBT.
In Ethereum's current Merkle Patricia Trie (MPT), an account's leaf value is:
account_trie[hash(address)] = RLP(nonce, balance, storage_root, code_hash)
The storage_root field is the root hash of a separate trie containing that account's storage. This creates a sequential dependency: you cannot finalize the account trie leaf (and thus the state root) until you have computed the storage trie root. The storage root is an input to the account leaf value.
Step 1: Update A's storage trie → compute storage_root_A
↓ MUST FINISH FIRST
Step 2: Update account trie with new storage_root_A → compute state_root
↓ DEPENDS ON STEP 1In PBT, the account header leaf in zone 000 contains:
basic_data = [version(1) | reserved(4) | code_size(3) | nonce(8) | balance(16)] code_hash = keccak256(bytecode)
Notice what is not there: storage_root. No leaf value anywhere in the tree contains a hash computed from another part of the tree. Every leaf value is self-contained data — a nonce is just a nonce, a slot value is just a slot value.
When a block modifies account A's nonce (zone 000) and storage slot 5 (zone 1), these are two independent leaf updates in the same tree. Neither leaf's value depends on the other. Root recomputation is a single bottom-up pass:
1. Apply all leaf modifications (any order, or in parallel) 2. Recompute hashes bottom-up from modified leaves to root 3. Done — one pass, one root The zone 000 branch and zone 1 branch propagate upward INDEPENDENTLY and meet at the root.
Zone-level parallelism: The zone 000 (accounts) and zone 1 (storage) subtrees can be updated by separate threads. They share no data dependencies until their hashes merge near the root.
Account-level parallelism within zones: Different accounts' subtrees within the same zone are independent. Updating account A's storage and account B's storage can proceed in parallel — they occupy disjoint subtrees below the bucket boundary.
Stem-level parallelism: Within a single account's subtree, different stems are independent sub-computations that can be hashed in parallel.
A stem is a group of up to 256 leaves that share every bit of their key except the last 8 (the sub_index). In the tree, a stem corresponds to a subtree of depth 8 rooted at the node where the common prefix ends.
The stem concept enables proof compression: when proving multiple values that share a stem, the Merkle branch from root to stem node is shared. Only the individual 8-level openings within the stem differ. For k values in the same stem at effective branch depth d:
In zone 1 (storage), stems are formed by the tree_index = slot_key // 256 derivation. This means slots 0–255 always share a stem, slots 256–511 share a stem, etc. This is deterministic, not hash-dependent.
This co-location is valuable because common contract patterns (mappings, arrays, sequential storage) access nearby slots together. ERC-20 balances stored in consecutive mapping slots naturally share a stem.
PBT's structural boundaries enable per-account subtree pruning as a natural state expiry mechanism, with different granularity per zone:
Code zone: Content-addressed code zone leaves are potentially shared across contracts with identical bytecode. Expiry requires reference counting (how many live accounts share this code_hash?) or permanent storage. The first 128 code chunks in the header stem (zone 000) expire naturally with the account — no reference counting needed.
Storage zone: Bucket-level expiry at depth 61 (1-bit zone prefix + 60-bit address prefix). The node at path 1 || H(addr)[:60] is the bucket boundary. For the vast majority of accounts, this bucket contains exactly one account's storage. Record the subtree hash and prune below depth 61.
Account zone: Per-account expiry at the stem level. The header stem holds up to 194 leaves (BASIC_DATA, CODE_HASH, 64 storage slots, 128 code chunks). Expiry of the entire header stem removes the account's core data, hot storage, and initial code in one operation.
To resurrect expired data, a client provides a proof consistent with the recorded commitment stub. The subtree is re-attached at its original position.
PBT's zone structure directly enables partial statefulness — a mode where a node stores the complete account trie (zone 000) but only syncs and maintains storage and code for a configured subset of contracts. This is the architecture behind EIP-7928 (Block Access Lists), where a partial state node reduces disk usage from ~640 GB to ~59 GB by skipping untracked contracts' storage entirely.
In PBT, this maps cleanly onto the zone topology. Zone 000 (all account headers) is the subtree at prefix "000" — a self-contained subtree that can be synced independently. A partial node downloads zone 000 in full, giving it complete account data (balances, nonces, code hashes, hot storage, and initial code) for every address. For tracked contracts, it additionally syncs their code zone stems (by code_hash) and storage subtrees in zone 1. For untracked contracts, zones 001 and 1 are simply never fetched.
Within the storage zone, a partial node can restrict scope to specific accounts by syncing only the subtrees below their prefix at depth 61. This gives fine-grained control: a node tracking 100 "hot" contracts downloads only those 100 storage subtrees, not the millions of inactive ones. For code, syncing is by code_hash — the node fetches the relevant content-addressed stems for each tracked contract's bytecode.
Block-by-block state updates for partial nodes come via Block Access Lists (BALs) — per-block diffs specifying which storage slots changed and to what values. A partial node applies BAL diffs directly to the trie for tracked contracts without re-executing transactions, then verifies the resulting state root against the block header. PBT's single-pass root computation (no sequential dependency) means this verification is straightforward: apply leaf updates, recompute bottom-up, compare root.
This architecture also serves as the foundation for Valid-only Partial Statelessness (VOPS): validators that store only a portion of the state can still validate blocks by verifying witness proofs for the portions they don't hold. PBT's structural boundaries make it possible to define precisely which subtrees a validator holds (per-account, per-zone, or any combination) and to verify proofs against the subtree roots at known depths.
Archival nodes can selectively prune cold storage on a per-account basis. Identify accounts whose storage has not been accessed for N epochs. For each, record the bucket root at depth 61 in the storage zone and prune the subtree. The archival node retains zone 000 (header stems, including initial code and hot storage) plus the commitment stubs. Code zone stems are retained only for bytecodes referenced by live accounts.
This allows an archival node to serve balance/nonce/code queries for all accounts while offloading cold storage to dedicated archive providers. The commitment stubs ensure that the archival node can verify storage proofs provided by archive providers without trusting them.
EIP-7503 (zero-knowledge wormholes) proposes plausibly deniable privacy for Ethereum: users "burn" ETH by sending it to a cryptographically unspendable address, then present a zero-knowledge proof to re-mint the funds in a fresh account. The burn transaction looks like a normal transfer — no interaction with a privacy contract is visible on-chain.
The fundamental problem is Ethereum's 160-bit address space. A burn address is derived as trunc_160(H("worm" || secret)). The birthday paradox means an attacker can find two secrets s₁, s₂ producing the same 160-bit address in approximately 2⁸⁰ operations — within reach of nation-states. With such a collision, the attacker deposits once and mints twice with different nullifiers: an infinite inflation bug.
Worse, even "fixing" the nullifier scheme doesn't help. An attacker can find s₁, s₂ such that trunc_160(H("worm" || s₁)) == trunc_160(H(skToPk(s₂))), producing a burn address that is also spendable via a regular private key. This is the same 2⁸⁰ collision search. As the original analysis concludes: plausibly deniable wormholes require collision resistance that 160-bit addresses cannot provide.
PBT's 245-bit account key (in zone 000) raises the birthday bound from 2⁸⁰ to 2¹²²·⁵. The difference is categorical: 2⁸⁰ is attackable by nation-states today; 2¹²²·⁵ is computationally infeasible by a very wide margin, even accounting for foreseeable advances in hardware.
The critical question is where the wormhole identity lives. In EIP-7503's current design, the burn identity is a 160-bit Ethereum address — constrained by the transaction format, not by the tree. PBT does not change this surface constraint. However, PBT creates infrastructure that a protocol-level wormhole could leverage:
Enshrined burn at the tree level. If the protocol natively understands PBT's key structure, a burn operation could target a 245-bit account key directly — bypassing the 160-bit address bottleneck entirely. The burn identity would be H(H("worm" || secret))[:245], a 245-bit value with birthday resistance of 2¹²²·⁵. A BASIC_DATA leaf (sub_idx 0x00) would exist at this key in zone 000, holding the burned balance. The minting proof would demonstrate knowledge of a secret whose derived key maps to an existing balance in the tree.
This sidesteps the fundamental tension identified in the wormhole analysis: the collision resistance needed for security is provided by the wider identifier, while plausible deniability is preserved because the burn still creates an ordinary-looking account entry in the tree — indistinguishable from any other account key derived from a regular address.
What PBT enables vs. what it solves. PBT does not implement wormholes. It provides the structural precondition — a wide enough identifier space with collision resistance above the security threshold. Several open problems remain for a complete wormhole design:
Anonymity set. EIP-7503 currently exposes a beacon_block_root as public input, shrinking the anonymity set to transactions in a single block. A tree-level design could prove against a state root (committing to all accounts), potentially expanding the anonymity set to all accounts satisfying a predicate at a given block — but this requires efficient proof generation over PBT's Merkle structure.
Hash function alignment. If PBT uses a SNARK-friendly hash (e.g., Poseidon2), the wormhole proof — which must demonstrate a valid path through the tree — becomes dramatically cheaper to generate. The hash function choice in §10 (Open Questions) has direct implications for wormhole feasibility.
Beacon deposit integration. A complementary approach uses private beacon chain deposits as the wormhole mechanism. With ~33% of ETH staked and tens of millions in daily withdrawals, the anonymity set is large. PBT's structural boundaries could help here too: the deposit's tree footprint is a well-defined subtree that proofs can reference.
| Field | Bits | Birthday Bound | Threat Model | Assessment |
|---|---|---|---|---|
| account key (zone 000) | 245 | 2¹²²·⁵ | Two addresses → same key prefix | Excellent |
| addr_prefix (storage) | 60 | ~43 pairs at 10B | Two addresses → same bucket | Not consensus-fatal |
| stem_suffix (storage) | 187 | 2⁹³·⁵ | Two tree_indexes → same stem | Future-proof for decades |
| storage effective | 195 | 2⁹⁷·⁵ | Full key collision within bucket | Excellent |
| code stem (code zone) | 245 | 2¹²²·⁵ | Two (code_hash, tree_index) → same stem | Excellent |
| sub_index | 8 | n/a | Direct mapping, not hashed | Collision impossible* |
* sub_index = slot_key % 256. Two distinct slot keys collide in sub_index only if they share the same tree_index AND the same remainder — meaning they are the same slot. Collision is logically impossible between distinct keys.
Code zone keys are derived as H(code_hash || tree_index)[:245], where code_hash = keccak256(bytecode). The 245-bit truncation provides 2¹²²·⁵ birthday resistance between any two distinct (code_hash, tree_index) pairs. Since code_hash is a keccak256 output (256 uniformly distributed bits), truncation preserves uniformity. Two distinct bytecodes mapping to the same stem key is computationally infeasible. Contracts with identical bytecode intentionally share stems — this is the deduplication mechanism, not a collision.
Code zone: Content-addressed — contracts with identical bytecode intentionally share code zone leaves. Contracts with different bytecodes are isolated by 245-bit H(code_hash || tree_index) with 2¹²²·⁵ birthday resistance. Per-account code isolation for the first 128 chunks is provided by the header stem in zone 000.
Storage zone: Isolation is probabilistic via the 60-bit address prefix. The ~43 expected collision pairs at 10B accounts share a bucket, meaning their storage leaves are interleaved below depth 61. This is not consensus-fatal — the stem suffix H(addr || tree_index) ensures each account's stems are independent. Two accounts sharing a bucket can coexist, with slightly reduced parallelism at the bucket boundary. An attacker cannot exploit a prefix collision to corrupt another account's storage.
P=60 provides ~43 collision pairs at 10B accounts — is this the right tradeoff? Increasing P strengthens per-account isolation but reduces stem suffix bits. P=64 would give ~2.7 collision pairs at 10B accounts but reduce stem suffix to 183 bits (birthday 2⁹¹·⁵). P=48 would give ~700K collision pairs — likely too many. The choice depends on how much bucket sharing is tolerable.
When a storage bucket is expired to a commitment stub at depth 61, how does resurrection work? Options include: (A) full subtree re-provision with a Merkle proof against the stub, (B) incremental resurrection where individual slots are re-proved on access, (C) epoch-based resurrection with bulk re-attachment. Each has different bandwidth and latency tradeoffs.
PBT is hash-function agnostic. The choice affects proving performance (SNARK/STARK friendliness), hashing speed (native execution), and proof compactness. Candidates: BLAKE3 (fast native), Poseidon2 (SNARK-friendly), Keccak (ecosystem compatibility). The tree structure is independent of this choice.
Proving both an account header (zone 000) and a storage slot (zone 1) for the same account requires two Merkle branches. These share the first ~1 level (from root to the depth-1 split). Defining a compact multi-proof format that exploits this sharing needs specification.
When a contract is destroyed, its header stem (including first 128 code chunks) is deleted. But code zone stems (zone 001) may be shared with other contracts via content-addressing. Options: (A) reference counting per code_hash — track how many live accounts use each bytecode, (B) never delete code zone stems — since SELFDESTRUCT is deprecated, dead code stays until a state expiry sweep, (C) garbage collection during state expiry — periodically scan for unreferenced code_hash entries. Option B is simplest and may be sufficient given that most contracts fit within the 128-chunk header.
The EIP-7864 header layout embeds specific sub-index boundaries into the protocol: 0x40 for first storage slot, 0x80 for first code chunk. These constants determine how many hot storage slots (64) and initial code chunks (128) are per-account vs zone-separated. Should these be configurable per-fork, or fixed permanently? Changing them would require migrating all account header stems.
When two accounts share a 60-bit storage prefix (~43 pairs at 10B accounts), bucket-level expiry at depth 61 would expire both accounts' storage together. Should the protocol handle this explicitly (e.g., check for sharing before expiring), accept it as a rare edge case, or use finer-grained expiry within shared buckets?