Word Search II

Q: How does incremental pruning optimize Word Search II?

After finding a word, you remove it from the Trie and delete any leaf nodes with no remaining children. This progressively shrinks the Trie, reducing future DFS branches and making the algorithm faster as it finds more words.

Q: What is the time complexity of Word Search II with a Trie?

The worst-case time is O(M x N x 4^L) where M,N are grid dimensions and L is max word length. However, Trie pruning makes the average case much faster by eliminating invalid paths early.

7 minute read

“Don’t look for one needle in a haystack. Magnetize the hay to find all needles at once.”

TL;DR

Word Search II finds all dictionary words on a character grid by flipping the search: instead of searching for each word, build a Trie from the dictionary and let the grid DFS be guided by the Trie. At each cell, check if the current character is a valid Trie child before continuing. Incrementally prune found words from the Trie to shrink the search space over time. This Trie-driven grid search pattern is foundational for typeahead and autocomplete systems and reuses the backtracking template from earlier problems.

1. Problem Statement

This is the “Boss Level” of grid-based search problems. Given an m x n board of characters and a list of strings words, return all words on the board.

Rules:

Each word must be constructed from letters of sequentially adjacent cells (horizontally or vertically neighboring).
The same letter cell may not be used more than once in a word.
The output should contain all unique words found.

Example:

Board:
[
 ['o','a','a','n'],
 ['e','t','a','e'],
 ['i','h','k','r'],
 ['i','f','l','v']
]
Words: ["oath","pea","eat","rain"]
Output: ["eat","oath"]

Note: “pea” is not present. “rain” is not present.

2. Understanding the Problem

2.1 The Native vs. Optimized Mindset

Word Search I: Find one word. We iterate the grid and run DFS for that word.
Word Search II: Find K words (K can be 30,000).

If we define M, N as grid dimensions and L as max word length:

Naive Approach: For each word, scan grid. Total Time: K \times (M \times N \times 4^L).
This is horribly inefficient. We are visiting the same cells (0,0) ‘o’ -> (0,1) ‘a’ thousands of times, once for “oath”, once for “oats”, once for “oatmeal”…

2.2 The Inversion of Control

Instead of “For each word, search the grid”, we flip it: “For each path in the grid, does it form any word?”

To support this query (“Does this prefix exist in my list?”), we need a Trie (Prefix Tree).

3. Approach 1: Naive (Repeated DFS) – Don’t do this

Run the solution for “Word Search I” inside a loop. Most interviewers will fail you immediately for this because it ignores the shared structure of the dictionary words.

4. Approach 2: Backtracking with Trie

We build a Trie from the dictionary words. Then we start DFS from every cell in the grid. As we traverse the grid (e.g., o -> a -> t), we move a pointer down the Trie.

Key Logic

Build Trie: Insert all words into Trie. Mark leaf nodes with the actual word.
Grid DFS:
- Start at (r, c).
- Check if board[r][c] is a child of Current_Trie_Node.
- If Yes: Move to child. Mark visited. Recurse.
- If No: Stop (Pruning).
Optimization (Leaf Pruning):
- Once we find a word like “oath”, we add it to results.
- Crucial: We should remove “oath” from the Trie (or mark it found) so we don’t find it again via a different path.
- Even better: If a node has no children after finding a word, delete the node. This progressively shrinks the search space!

5. Implementation

class TrieNode:
    def __init__(self):
        self.children = {}
        self.word = None # Stores the word at the leaf node

    class Solution:
    def findWords(self, board, words):
        # 1. Build the Trie
        root = TrieNode()
        for w in words:
            node = root
            for char in w:
                if char not in node.children:
                    node.children[char] = TrieNode()
                    node = node.children[char]
                    node.word = w # End of word

                    self.res = []
                    rows, cols = len(board), len(board[0])

    def dfs(r, c, parent_node):
        letter = board[r][c]
        curr_node = parent_node.children[letter]

        # Check for match
        if curr_node.word:
            self.res.append(curr_node.word)
            curr_node.word = None # De-duplicate: Ensure we don't add it again

            # DFS Traversal logic
            board[r][c] = '#' # Mark visited

            for dr, dc in [(0,1), (0,-1), (1,0), (-1,0)]:
                nr, nc = r+dr, c+dc
                if 0 <= nr < rows and 0 <= nc < cols:
                    if board[nr][nc] in curr_node.children:
                        dfs(nr, nc, curr_node)

                        board[r][c] = letter # Backtrack

                        # 3. Optimization: Incremental Pruning
                        # If leaf node and no word, prune it to optimize future searches
                        if not curr_node.children:
                            del parent_node.children[letter]

                            # Trigger DFS
                            for r in range(rows):
                                for c in range(cols):
                                    if board[r][c] in root.children:
                                        dfs(r, c, root)

                                        return self.res

6. Testing Strategy

Test Case 1: Prefix shared

Words: ["oath", "oat"]. Board: Contains o -> a -> t -> h.

DFS reaches ‘t’. Finds “oat”. Adds to result. Removes word marker.
DFS continues to ‘h’. Finds “oath”. Adds to result.
Backtrack.

Test Case 2: No match

Words: ["abcd"]. Board: ['a', 'b', 'x', 'd'].

DFS matches a -> b.
Next neighbor x is not in Trie child of b.
Stop immediately. (This shows the power of Trie pruning vs Naive).

7. Complexity Analysis

Time: O(M \times N \times 4^L).
In worst case, we might traverse depth L for every cell.
The Trie makes this much faster on average because branches are pruned early.
Space: O(K \times L) for Trie storage (K words, length L).

8. Production Considerations

Thread Safety: In a real “Boggle Solver” server, building the Trie is a one-time startup cost. The DFS is per-request. The Trie must be read-only during requests.
Memory: If dictionary is huge (Oxford Dictionary), Trie can take GBs. We might use a DAWG (Directed Acyclic Word Graph) to compress shared suffixes (e.g., “-ing”).

9. Connections to ML Systems

This is the exact algorithm used in Typeahead Systems (ML System Design Track).

Problem: User typed “Am”.
Grid: The keyboard/input.
Trie: The database of all Amazon Products.
Task: Find the most likely completion. This DSA problem is the Search component of that system.

Also relates to Phonetic Search (Speech Tech) where we search Tries by sound.

10. Interview Strategy

Start with Naive: Briefly mention “I could just run DFS for each word”, then immediately pivot. “But that’s inefficient because…”
Draw the Trie: Show how “ant” and “and” share the “an” node.
Explain Pruning: This is the differentiator. “If I remove the leaf node after finding a word, the Trie gets smaller.” This shows senior-level optimization thinking.

11. Key Takeaways

State Machines: A Trie acts as a State Machine for the DFS. “Am I allowed to step on ‘X’?” depends on my current Trie node.
Preprocessing wins: Spend O(K) time building a Trie to save exponential time during search.
Backtracking Template: Mark -> Explore -> Unmark is the universal pattern.

FAQ

Why is a Trie essential for Word Search II?

A Trie lets you check whether the current grid path forms a valid prefix of any dictionary word in O(1) per step. Without it, you would repeat DFS for each word independently, wasting enormous time on shared prefixes like “oat” and “oath” that could be checked once.

How does incremental pruning optimize Word Search II?

After finding a word, remove it from the Trie and delete any leaf nodes with no remaining children. This progressively shrinks the Trie, reducing future DFS branches and making the algorithm faster as it discovers more words.

What is the time complexity of Word Search II with a Trie?

The worst-case time is O(M x N x 4^L) where M and N are grid dimensions and L is the max word length. However, Trie pruning makes the average case dramatically faster by cutting off invalid path explorations early.

How does Word Search II relate to typeahead search systems?

Word Search II uses the same Trie-driven search pattern found in typeahead and autocomplete systems. The Trie stores the product catalog or dictionary, user input or grid traversal drives the exploration, and prefix matching efficiently finds all valid completions.

Originally published at: arunbaby.com/dsa/0048-word-search-ii

Word Search II

TL;DR

1. Problem Statement

2. Understanding the Problem

2.1 The Native vs. Optimized Mindset

2.2 The Inversion of Control

3. Approach 1: Naive (Repeated DFS) – Don’t do this

4. Approach 2: Backtracking with Trie

Key Logic

5. Implementation

6. Testing Strategy

Test Case 1: Prefix shared

Test Case 2: No match

7. Complexity Analysis

8. Production Considerations

9. Connections to ML Systems

10. Interview Strategy

11. Key Takeaways

FAQ

Related across topics

Share on

TL;DR

1. Problem Statement

2. Understanding the Problem

2.1 The Native vs. Optimized Mindset

2.2 The Inversion of Control

3. Approach 1: Naive (Repeated DFS) – Don’t do this

4. Approach 2: Backtracking with Trie

Key Logic

5. Implementation

6. Testing Strategy

Test Case 1: Prefix shared

Test Case 2: No match

7. Complexity Analysis

8. Production Considerations

9. Connections to ML Systems

10. Interview Strategy

11. Key Takeaways

FAQ

Related across topics

Trie-Based Search Systems (Typeahead)

Phonetic Trie

Streaming Real-Time Agents

Share on