Merkle Trees

Merkle trees are fundamental to blockchain efficiency. They enable verification of large datasets with minimal data transfer—you can prove a transaction exists without downloading the entire blockchain.

What Is a Merkle Tree?

A Merkle tree (or hash tree) is a binary tree where:

  • Leaf nodes contain hashes of data blocks
  • Internal nodes contain hashes of their children
  • The root summarizes the entire dataset
Text
                    ┌─────────────┐
                    │   Root Hash │
                    │   (H_ABCD)  │
                    └──────┬──────┘
                           │
             ┌─────────────┴─────────────┐
             │                           │
       ┌─────┴─────┐               ┌─────┴─────┐
       │   H_AB    │               │   H_CD    │
       └─────┬─────┘               └─────┬─────┘
             │                           │
       ┌─────┴─────┐               ┌─────┴─────┐
       │           │               │           │
    ┌──┴──┐     ┌──┴──┐         ┌──┴──┐     ┌──┴──┐
    │ H_A │     │ H_B │         │ H_C │     │ H_D │
    └──┬──┘     └──┬──┘         └──┬──┘     └──┬──┘
       │           │               │           │
    ┌──┴──┐     ┌──┴──┐         ┌──┴──┐     ┌──┴──┐
    │  A  │     │  B  │         │  C  │     │  D  │
    └─────┘     └─────┘         └─────┘     └─────┘

    H_A = hash(A)
    H_B = hash(B)
    H_AB = hash(H_A + H_B)
    H_ABCD = hash(H_AB + H_CD)

Why Merkle Trees Matter

Efficiency Gains

Without Merkle trees, proving a transaction exists requires the entire block:

  • Download all transactions
  • Check if target transaction is present
  • Data: O(n) where n = number of transactions

With Merkle trees:

  • Download only the Merkle proof
  • Verify by hashing up to root
  • Data: O(log n)
Text
For 1 million transactions:
Without Merkle: ~1,000,000 hashes to verify
With Merkle:    ~20 hashes to verify

That's 50,000x more efficient!

Use Cases

  1. SPV (Simple Payment Verification): Light wallets verify transactions without full blockchain
  2. State synchronization: Quickly identify which parts of state differ
  3. Data integrity: Detect tampering in any part of dataset
  4. Efficient updates: Change one leaf, update only O(log n) nodes

Building a Merkle Tree

Step-by-Step Construction

TypeScript
import { createHash } from "crypto";

function sha256(data: Buffer): Buffer {
  return createHash("sha256").update(data).digest();
}

// Combine two hashes
function hashPair(left: Buffer, right: Buffer): Buffer {
  return sha256(Buffer.concat([left, right]));
}

function buildMerkleTree(leaves: Buffer[]): Buffer[][] {
  if (leaves.length === 0) {
    return [[Buffer.alloc(32)]]; // Empty tree
  }

  // Start with leaf hashes
  let currentLevel = leaves.map((leaf) => sha256(leaf));
  const tree: Buffer[][] = [currentLevel];

  // Build tree bottom-up
  while (currentLevel.length > 1) {
    const nextLevel: Buffer[] = [];

    for (let i = 0; i < currentLevel.length; i += 2) {
      const left = currentLevel[i];
      // If odd number of nodes, duplicate the last one
      const right = currentLevel[i + 1] ?? currentLevel[i];
      nextLevel.push(hashPair(left, right));
    }

    tree.push(nextLevel);
    currentLevel = nextLevel;
  }

  return tree;
}

// Get the root hash
function getMerkleRoot(leaves: Buffer[]): Buffer {
  const tree = buildMerkleTree(leaves);
  return tree[tree.length - 1][0];
}

Example with Real Data

TypeScript
// Transactions as buffers
const transactions = [
  Buffer.from("Alice pays Bob 5 BTC"),
  Buffer.from("Bob pays Carol 2 BTC"),
  Buffer.from("Carol pays Dave 1 BTC"),
  Buffer.from("Dave pays Eve 0.5 BTC"),
];

const tree = buildMerkleTree(transactions);

console.log(
  "Leaf level:",
  tree[0].map((h) => h.toString("hex").slice(0, 8)),
);
console.log(
  "Level 1:",
  tree[1].map((h) => h.toString("hex").slice(0, 8)),
);
console.log("Root:", tree[2][0].toString("hex").slice(0, 8));

// Output:
// Leaf level: ['a1b2c3d4', 'e5f6a7b8', 'c9d0e1f2', 'a3b4c5d6']
// Level 1: ['f7e8d9c0', 'b1a2f3e4']
// Root: ['d5c6b7a8']

Merkle Proofs

A Merkle proof proves that a leaf is part of the tree without revealing other leaves.

Proof Structure

Text
To prove Transaction B is in the tree:

                    ┌─────────────┐
                    │    Root     │  ← Known (in block header)
                    └──────┬──────┘
                           │
             ┌─────────────┴─────────────┐
             │                           │
       ┌─────┴─────┐               ┌─────┴─────┐
       │   H_AB    │               │ * H_CD *  │  ← Proof element
       └─────┬─────┘               └───────────┘
             │
       ┌─────┴─────┐
       │           │
    ┌──┴──┐     ┌──┴──┐
    │*H_A*│     │ H_B │  ← Proof element    ← Target (we have B)
    └─────┘     └──┬──┘
                   │
                ┌──┴──┐
                │  B  │  ← Data we're proving
                └─────┘

Proof for B: [H_A, H_CD] + positions
Verifier computes:
1. H_B = hash(B)
2. H_AB = hash(H_A + H_B)
3. Root' = hash(H_AB + H_CD)
4. Check: Root' == Root

Proof Generation

TypeScript
interface MerkleProof {
  leaf: Buffer;
  leafIndex: number;
  proof: Array<{
    hash: Buffer;
    position: "left" | "right";
  }>;
  root: Buffer;
}

function generateProof(leaves: Buffer[], targetIndex: number): MerkleProof {
  const tree = buildMerkleTree(leaves);
  const proof: MerkleProof["proof"] = [];

  let index = targetIndex;

  // Walk up the tree, collecting siblings
  for (let level = 0; level < tree.length - 1; level++) {
    const isRightNode = index % 2 === 1;
    const siblingIndex = isRightNode ? index - 1 : index + 1;

    if (siblingIndex < tree[level].length) {
      proof.push({
        hash: tree[level][siblingIndex],
        position: isRightNode ? "left" : "right",
      });
    } else {
      // Odd number of nodes, sibling is self (duplicated)
      proof.push({
        hash: tree[level][index],
        position: "right",
      });
    }

    // Move to parent index
    index = Math.floor(index / 2);
  }

  return {
    leaf: leaves[targetIndex],
    leafIndex: targetIndex,
    proof,
    root: tree[tree.length - 1][0],
  };
}

Proof Verification

TypeScript
function verifyProof(proof: MerkleProof): boolean {
  let hash = sha256(proof.leaf);

  for (const step of proof.proof) {
    if (step.position === "left") {
      hash = hashPair(step.hash, hash);
    } else {
      hash = hashPair(hash, step.hash);
    }
  }

  return hash.equals(proof.root);
}

// Usage
const proof = generateProof(transactions, 1); // Prove transaction at index 1
const isValid = verifyProof(proof);
console.log("Proof valid:", isValid); // true

Merkle Trees in Bitcoin

Block Header Structure

Text
Bitcoin Block Header (80 bytes):
┌────────────────────────────────────────┐
│ Version (4 bytes)                      │
│ Previous Block Hash (32 bytes)         │
│ Merkle Root (32 bytes)  ←──────────────┤ Hash of all transactions
│ Timestamp (4 bytes)                    │
│ Difficulty Target (4 bytes)            │
│ Nonce (4 bytes)                        │
└────────────────────────────────────────┘

SPV Verification

Light clients can verify transactions with minimal data:

TypeScript
async function verifySPV(
  txHash: string,
  blockHeader: BlockHeader,
  merkleProof: MerkleProof,
): Promise<boolean> {
  // 1. Verify block header is valid (enough PoW)
  if (!verifyBlockHeader(blockHeader)) {
    return false;
  }

  // 2. Verify merkle proof leads to block's merkle root
  if (!merkleProof.root.equals(blockHeader.merkleRoot)) {
    return false;
  }

  // 3. Verify proof is valid
  if (!verifyProof(merkleProof)) {
    return false;
  }

  // 4. Verify the proven data matches our transaction
  const proofTxHash = sha256(merkleProof.leaf);
  if (proofTxHash.toString("hex") !== txHash) {
    return false;
  }

  return true;
}

Merkle Trees in Solana

Transaction Merkle

Solana blocks contain a Merkle root of transaction signatures:

Text
Entry in Solana Block:
┌─────────────────────────────────────────┐
│ PoH Hash                                │
│ Transaction Merkle Root  ←──────────────┤
│ Transactions [...]                      │
└─────────────────────────────────────────┘

Account State Merkle

Solana's bank uses Merkle-like structures for state:

Text
Account State:
┌─────────────────────────────────────────┐
│ Accounts Hash (Merkle-ish)              │
│                                         │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐    │
│ │Account A│ │Account B│ │Account C│    │
│ └─────────┘ └─────────┘ └─────────┘    │
└─────────────────────────────────────────┘

Advanced: Merkle Patricia Tries

Ethereum uses a more sophisticated structure—the Merkle Patricia Trie:

Text
Combines:
1. Merkle tree (cryptographic proofs)
2. Patricia trie (prefix compression)
3. Radix tree (path optimization)

Enables:
- Proof of account existence
- Proof of account non-existence
- Efficient storage updates

Patricia Trie Structure

Text
Storing keys: "dog", "doge", "cat"

Traditional trie:
       root
      /    \
     c      d
     |      |
     a      o
     |      |
     t      g
     *     / \
          *   e
              *

Patricia trie (compressed):
       root
      /    \
    cat    dog
     *     / \
          *  e
              *

Fewer nodes = more efficient

Performance Analysis

Merkle Tree Complexity

OperationTime ComplexitySpace Complexity
Build treeO(n)O(n)
Generate proofO(log n)O(log n)
Verify proofO(log n)O(log n)
Update leafO(log n)O(1)
Find rootO(1)O(1)

Practical Performance

TypeScript
// Benchmark merkle operations
function benchmark() {
  const sizes = [100, 1000, 10000, 100000];

  for (const size of sizes) {
    const leaves = Array.from({ length: size }, (_, i) =>
      Buffer.from(`Transaction ${i}`),
    );

    // Build tree
    const buildStart = performance.now();
    const root = getMerkleRoot(leaves);
    const buildTime = performance.now() - buildStart;

    // Generate proof
    const proofStart = performance.now();
    const proof = generateProof(leaves, Math.floor(size / 2));
    const proofTime = performance.now() - proofStart;

    // Verify proof
    const verifyStart = performance.now();
    const valid = verifyProof(proof);
    const verifyTime = performance.now() - verifyStart;

    console.log(`Size: ${size}`);
    console.log(`  Build: ${buildTime.toFixed(2)}ms`);
    console.log(`  Proof: ${proofTime.toFixed(2)}ms`);
    console.log(`  Verify: ${verifyTime.toFixed(2)}ms`);
    console.log(`  Proof size: ${proof.proof.length} hashes`);
  }
}

Common Merkle Tree Variants

Binary Merkle Tree (Standard)

  • Each node has exactly 2 children
  • Most common in blockchains

Sparse Merkle Tree

  • Fixed depth, most leaves empty
  • Enables proofs of non-membership
  • Used for account states

Merkle Mountain Range

  • Append-only, multiple peaks
  • Good for growing datasets
  • Used in some blockchains for transaction history

Key Takeaways

  1. Merkle trees summarize large datasets in a single hash
  2. Merkle proofs enable O(log n) verification
  3. Blockchain headers store Merkle roots, not full data
  4. SPV clients verify transactions without full blockchain
  5. Space efficiency enables light clients and sharding

Common Mistakes

  1. Forgetting odd-node handling: Trees with odd leaves need duplication or other handling
  2. Order dependency: Merkle roots are order-dependent; same data in different order = different root
  3. Not salting leaves: Without per-leaf salt, identical leaves produce identical hashes

Try It Yourself

  1. Build a Merkle tree: Implement the algorithms above and verify a proof manually

  2. Calculate proof size: For a block with 2,048 transactions, how many bytes is a Merkle proof? (Assuming SHA-256)

  3. Explore real proofs: Use a block explorer API to fetch a Merkle proof for a real Bitcoin transaction


Next: Cryptographic Hashing - Deep dive into the hash functions that power Merkle trees and blockchain security.