14 minute read

“Find the point where two paths in a tree first meet.”

TL;DR

Recursively search left and right subtrees. If both return non-null, the current node is the LCA. If only one returns non-null, propagate it up. Runs in O(N) time and O(H) space. For BSTs, compare values to navigate in O(H) time. For batch queries, use Tarjan’s offline algorithm with Union-Find or reduce to RMQ for O(1) queries after preprocessing. This connects to graph traversal patterns and dependency resolution in AI agent systems.

1. Problem Statement

Given a binary tree, find the lowest common ancestor (LCA) of two given nodes in the tree.

The lowest common ancestor is defined between two nodes p and q as the lowest node in T that has both p and q as descendants (where we allow a node to be a descendant of itself).

Example:

 3
 / \
 5 1
 / \ / \
 6 2 0 8
 / \
 7 4
  • LCA(5, 1) = 3
  • LCA(5, 4) = 5 (a node can be its own ancestor)
  • LCA(6, 4) = 5

2. Recursive Solution (Most Intuitive)

Intuition:

  • If the current node is NULL, return NULL.
  • If the current node is p or q, return the current node.
  • Recursively search left and right subtrees.
  • If both subtrees return non-NULL, current node is the LCA.
  • If only one subtree returns non-NULL, propagate it upward.
class Solution:
    def lowestCommonAncestor(self, root: 'TreeNode', p: 'TreeNode', q: 'TreeNode') -> 'TreeNode':
        # Base case: reached NULL or found one of the targets
        if not root or root == p or root == q:
            return root

            # Search in left and right subtrees
            left = self.lowestCommonAncestor(root.left, p, q)
            right = self.lowestCommonAncestor(root.right, p, q)

            # If both sides found something, current node is LCA
            if left and right:
                return root

                # Otherwise, return whichever side found something
                return left if left else right

Time Complexity: \(O(N)\) where N is the number of nodes (we visit each node once). Space Complexity: \(O(H)\) for recursion stack, where H is the height (\(O(\log N)\) for balanced, \(O(N)\) for skewed).

3. Path Storage Solution

Idea: Find the path from root to p and root to q, then find the last common node.

class Solution:
    def lowestCommonAncestor(self, root: 'TreeNode', p: 'TreeNode', q: 'TreeNode') -> 'TreeNode':
    def find_path(root, target, path):
        if not root:
            return False

            path.append(root)

            if root == target:
                return True

                if find_path(root.left, target, path) or find_path(root.right, target, path):
                    return True

                    path.pop()
                    return False

                    path_p, path_q = [], []
                    find_path(root, p, path_p)
                    find_path(root, q, path_q)

                    lca = None
                    for i in range(min(len(path_p), len(path_q))):
                        if path_p[i] == path_q[i]:
                            lca = path_p[i]
                        else:
                            break

                            return lca

Time Complexity: \(O(N)\) (two DFS traversals). Space Complexity: \(O(N)\) (storing paths).

4. Iterative Solution with Parent Pointers

Idea: Use a parent map to track each node’s parent, then trace back from p and q to find the first common ancestor.

from collections import deque

class Solution:
    def lowestCommonAncestor(self, root: 'TreeNode', p: 'TreeNode', q: 'TreeNode') -> 'TreeNode':
        # Build parent pointers using BFS
        parent = {root: None}
        queue = deque([root])

        while p not in parent or q not in parent:
            node = queue.popleft()
            if node.left:
                parent[node.left] = node
                queue.append(node.left)
                if node.right:
                    parent[node.right] = node
                    queue.append(node.right)

                    # Collect all ancestors of p
                    ancestors = set()
                    while p:
                        ancestors.add(p)
                        p = parent[p]

                        # Find first ancestor of q that's also an ancestor of p
                        while q not in ancestors:
                            q = parent[q]

                            return q

Time Complexity: \(O(N)\). Space Complexity: \(O(N)\) (parent map and ancestor set).

5. Edge Cases

# Test cases
def test_lca():
    # Case 1: One node is ancestor of the other
    # LCA(5, 4) = 5

    # Case 2: Nodes in different subtrees
    # LCA(6, 0) = 3

    # Case 3: One node is root
    # LCA(3, 4) = 3

    # Case 4: Both nodes are same
    # LCA(5, 5) = 5

Deep Dive: Why the Recursive Solution Works

The key insight is the bottom-up propagation:

Case 1: p and q are in different subtrees

 LCA
 / \
 p q
  • Left subtree returns p.
  • Right subtree returns q.
  • Since both are non-NULL, LCA is the current node.

Case 2: p is ancestor of q

 p
 \
 q
  • When we hit p, we return p immediately.
  • The subtree containing q also eventually returns p (propagated up).
  • Since only one side returns non-NULL, we return p.

Mathematical Proof: Let \( T(n) \) be a binary tree rooted at \( n \). Define \( \text{LCA}(p, q) \) as the deepest node \( n \) such that \( p \in T(n) \) and \( q \in T(n) \).

Claim: The recursive algorithm correctly finds \( \text{LCA}(p, q) \).

Proof by Induction:

  • Base: If \( n = p \) or \( n = q \) or \( n = \text{NULL} \), return \( n \). Correct.
  • Inductive Step:
  • If \( \text{left} \neq \text{NULL} \) and \( \text{right} \neq \text{NULL} \), then \( p \) and \( q \) are in different subtrees. Thus, \( n \) is the LCA.
  • If only \( \text{left} \neq \text{NULL} \), both \( p \) and \( q \) are in the left subtree, so we propagate the LCA from the left subtree.

Deep Dive: LCA in a Binary Search Tree (BST)

If the tree is a BST, we can optimize using the BST property.

def lowestCommonAncestor_BST(root, p, q):
    while root:
        if p.val < root.val and q.val < root.val:
            root = root.left
        elif p.val > root.val and q.val > root.val:
            root = root.right
        else:
            return root

Time Complexity: \(O(H)\) where H is height. Space Complexity: \(O(1)\) (iterative, no recursion).

Why BST is Special:

  • If both p and q are smaller than root, LCA must be in the left subtree.
  • If both are larger, LCA must be in the right subtree.
  • Otherwise, root is the LCA (one is on each side, or root is one of them).

Deep Dive: LCA in a Directed Acyclic Graph (DAG)

In a DAG, a node can have multiple parents. LCA becomes more complex.

Approach: Topological Sort + DFS

  1. Find all ancestors of p using DFS.
  2. Find all ancestors of q using DFS.
  3. The LCA is the common ancestor with the maximum topological order (deepest).

Time Complexity: \(O(V + E)\) where V is vertices and E is edges.

Deep Dive: Range Minimum Query (RMQ) and LCA

There’s a deep connection: LCA can be reduced to RMQ.

Euler Tour Technique:

  1. Perform a DFS and record the sequence of nodes (visiting each node when entering and leaving).
  2. For each node, record its first occurrence in the Euler tour.
  3. LCA(p, q) = Node with minimum depth in the Euler tour between first[p] and first[q].

Example:

Tree: 1
 / \
 2 3
 / \
 4 5

Euler Tour: [1, 2, 4, 2, 5, 2, 1, 3, 1]
Depths: [0, 1, 2, 1, 2, 1, 0, 1, 0]
First occurrence: 1->0, 2->1, 3->7, 4->2, 5->4

LCA(4, 5):
 first[4] = 2, first[5] = 4
 Min depth in range [2, 4] is at index 3 (depth 1, node 2)
 LCA = 2

With RMQ Preprocessing:

  • Preprocess the depth array with Sparse Table or Segment Tree.
  • Answer LCA queries in \(O(1)\) after \(O(N \log N)\) preprocessing.

Deep Dive: Tarjan’s Offline LCA Algorithm

If we have many LCA queries offline (all queries known in advance), Tarjan’s algorithm uses Disjoint Set Union (DSU).

Algorithm:

def tarjan_lca(root, queries):
    parent = {}
    ancestor = {}
    color = {} # 0: white, 1: gray, 2: black
    result = {}

def find(x):
    if parent[x] != x:
        parent[x] = find(parent[x])
        return parent[x]

    def union(x, y):
        parent[find(x)] = find(y)

    def dfs(node):
        parent[node] = node
        ancestor[node] = node
        color[node] = 1 # gray

        for child in [node.left, node.right]:
            if child:
                dfs(child)
                union(child, node)
                ancestor[find(node)] = node

                color[node] = 2 # black

                for (u, v) in queries:
                    if u == node and color.get(v) == 2:
                        result[(u, v)] = ancestor[find(v)]
                        if v == node and color.get(u) == 2:
                            result[(u, v)] = ancestor[find(u)]

                            dfs(root)
                            return result

Time Complexity: \(O((N + Q) \cdot \alpha(N))\) where Q is number of queries and \(\alpha\) is the inverse Ackermann function (nearly constant).

Deep Dive: Lowest Common Ancestor of K Nodes

Problem: Find the LCA of K nodes \( {p_1, p_2, \ldots, p_k} \).

Approach 1: Iterative LCA

def lca_of_k_nodes(root, nodes):
    lca = nodes[0]
    for i in range(1, len(nodes)):
        lca = lowestCommonAncestor(root, lca, nodes[i])
        return lca

Time Complexity: \(O(K \cdot N)\) in worst case.

Approach 2: DFS with Counter

def lca_of_k_nodes_optimized(root, nodes):
    node_set = set(nodes)

def dfs(node):
    if not node:
        return 0, None

        count = 1 if node in node_set else 0
        left_count, left_lca = dfs(node.left)
        right_count, right_lca = dfs(node.right)

        total_count = count + left_count + right_count

        if total_count == len(node_set) and not hasattr(dfs, 'lca'):
            dfs.lca = node

            return total_count, dfs.lca if hasattr(dfs, 'lca') else None

            _, lca = dfs(root)
            return lca

Time Complexity: \(O(N)\) single pass.

Deep Dive: LCA with Node Values (Not References)

Problem: Given a tree and two integer values, find their LCA.

Challenge: We need to search for the nodes first.

def lca_by_values(root, val1, val2):
def find_lca(node):
    if not node or node.val == val1 or node.val == val2:
        return node

        left = find_lca(node.left)
        right = find_lca(node.right)

        if left and right:
            return node
            return left if left else right

            return find_lca(root)

Caveat: What if one value doesn’t exist?

  • The above code would return the other node as LCA (incorrect).
  • Fix: Verify both values exist first with a separate traversal.

Deep Dive: LCA in a Binary Tree with Parent Pointers

If each node has a parent pointer, the problem becomes finding the intersection of two linked lists.

def lca_with_parent_pointers(p, q):
def get_depth(node):
    depth = 0
    while node:
        depth += 1
        node = node.parent
        return depth

        depth_p = get_depth(p)
        depth_q = get_depth(q)

        # Move the deeper node up to the same level
        while depth_p > depth_q:
            p = p.parent
            depth_p -= 1

            while depth_q > depth_p:
                q = q.parent
                depth_q -= 1

                # Move both up until they meet
                while p != q:
                    p = p.parent
                    q = q.parent

                    return p

Time Complexity: \(O(H)\). Space Complexity: \(O(1)\).

Comparison Table

Approach Time Space Pros Cons
Recursive \(O(N)\) \(O(H)\) Elegant, simple Recursion overhead
Path Storage \(O(N)\) \(O(N)\) Easy to understand Extra space for paths
Parent Pointers (BFS) \(O(N)\) \(O(N)\) Iterative Requires building parent map
BST Optimized \(O(H)\) \(O(1)\) Fast for BST Only works for BST
Tarjan (Offline) \(O((N+Q)\alpha(N))\) \(O(N)\) Multiple queries Requires all queries upfront
RMQ Reduction \(O(1)\) query \(O(N \log N)\) Very fast queries Complex preprocessing

Implementation in Other Languages

C++:

class Solution {
public:
 TreeNode* lowestCommonAncestor(TreeNode* root, TreeNode* p, TreeNode* q) {
 if (!root || root == p || root == q) return root;
 
 TreeNode* left = lowestCommonAncestor(root->left, p, q);
 TreeNode* right = lowestCommonAncestor(root->right, p, q);
 
 if (left && right) return root;
 return left ? left : right;
 }
};

Java:

class Solution {
 public TreeNode lowestCommonAncestor(TreeNode root, TreeNode p, TreeNode q) {
 if (root == null || root == p || root == q) return root;
 
 TreeNode left = lowestCommonAncestor(root.left, p, q);
 TreeNode right = lowestCommonAncestor(root.right, p, q);
 
 if (left != null && right != null) return root;
 return left != null ? left : right;
 }
}

Top Interview Questions

Q1: What if the tree is very deep and we hit stack overflow? Answer: Use the iterative solution with parent pointers or convert the recursive solution to iterative using an explicit stack.

Q2: Can LCA be \(O(1)\) query time? Answer: Yes, with \(O(N \log N)\) preprocessing using the RMQ reduction (Euler tour + Sparse Table).

Q3: What if we’re given a forest (multiple trees) instead of a single tree? Answer: If p and q are not in the same tree, return NULL. Otherwise, find the root of their tree and apply LCA.

Q4: How do you handle the case where one of the nodes doesn’t exist? Answer: Add a validation step to ensure both nodes exist in the tree before running LCA.

Key Takeaways

  1. Recursive Solution is Elegant: The post-order traversal naturally solves LCA.
  2. BST Optimization: Leverage BST properties for \(O(H)\) time.
  3. RMQ Connection: LCA and Range Minimum Query are equivalent problems.
  4. Offline Queries: Tarjan’s algorithm with DSU is optimal for batch queries.
  5. Parent Pointers: Reduce to “intersection of two linked lists” problem.

Summary

Aspect Insight
Core Idea Find the deepest node that is an ancestor of both targets
Key Trick Post-order DFS with bottom-up propagation
BST Optimization Navigate by comparing values
Advanced RMQ reduction for \(O(1)\) queries

FAQ

How do you find the lowest common ancestor in a binary tree?

Use recursive DFS. If the current node is null, or matches p or q, return it. Recursively search the left and right subtrees. If both calls return non-null, the current node is the LCA because p and q are in different subtrees. If only one returns non-null, propagate that result upward since both nodes must be in that subtree. This runs in O(N) time and O(H) space.

How is LCA different in a Binary Search Tree?

In a BST, compare the values of p and q with the current node. If both are smaller, the LCA must be in the left subtree. If both are larger, it must be in the right subtree. Otherwise, the current node is the LCA. This iterative approach runs in O(H) time with O(1) space, exploiting the sorted structure to avoid visiting every node.

Can LCA queries be answered in O(1) time?

Yes, with O(N log N) preprocessing. Perform an Euler tour of the tree, recording nodes and their depths. The LCA of two nodes corresponds to the node with minimum depth between their first occurrences in the Euler tour. Preprocessing the depth array with a sparse table enables O(1) Range Minimum Queries, yielding constant-time LCA answers.

What if one of the target nodes does not exist in the tree?

The standard recursive algorithm may incorrectly return the other node as the LCA, since it stops searching once it finds one target. To handle missing nodes correctly, first verify both nodes exist with a separate O(N) traversal, or modify the algorithm to track two boolean flags indicating whether each target was actually found during the search.


Originally published at: arunbaby.com/dsa/0029-lowest-common-ancestor