Word Ladder (BFS)
“Transforming ‘cold’ to ‘warm’ one letter at a time.”
1. Problem Statement
A transformation sequence from word beginWord to word endWord using a dictionary wordList is a sequence of words beginWord -> s1 -> s2 -> ... -> sk such that:
- Every adjacent pair of words differs by a single letter.
- Every
sifor1 <= i <= kis inwordList. Note thatbeginWorddoes not need to be inwordList. sk == endWord.
Given two words, beginWord and endWord, and a dictionary wordList, return the number of words in the shortest transformation sequence from beginWord to endWord, or 0 if no such sequence exists.
Example 1:
Input: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"]
Output: 5
Explanation: hit -> hot -> dot -> dog -> cog
Example 2:
Input: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log"]
Output: 0
Explanation: The endWord "cog" is not in wordList, therefore no valid transformation sequence exists.
2. BFS Solution (Shortest Path in Unweighted Graph)
Intuition:
- Treat each word as a node in a graph.
- An edge exists between two nodes if they differ by exactly one letter.
- The problem becomes finding the shortest path from
beginWordtoendWord. - BFS (Breadth-First Search) is the standard algorithm for finding the shortest path in an unweighted graph because it explores nodes layer by layer.
Graph Construction Strategy: There are two main ways to find neighbors of a word:
- Compare with all other words: For a word $W$, iterate through the entire
wordListand check if the difference is 1 character. This takes $O(N \cdot M)$, where $N$ is the number of words and $M$ is the word length. Doing this for every word in BFS gives $O(N^2 \cdot M)$. This is too slow if $N$ is large (e.g., 5000 words). - Generate all possible neighbors: For a word $W$, change each of its characters to ‘a’ through ‘z’. This generates $26 \cdot M$ potential words. Check if each potential word exists in
wordSet. This takes $O(26 \cdot M \cdot M)$ (hashing takes $O(M)$). This is much faster when $N$ is large but $M$ is small (e.g., $M=5$).
Algorithm:
- Convert
wordListto a setwordSetfor $O(1)$ lookups. - Initialize a queue with
(beginWord, 1). The level starts at 1 because the sequence length includes the start word. - Initialize a
visitedset to avoid cycles and redundant processing. - While the queue is not empty:
- Dequeue the current word and its level.
- If the word is
endWord, return the level. - Generate all possible neighbors by changing one character at a time.
- If a neighbor is in
wordSetand not visited, add it to the queue and mark as visited.
from collections import deque
from typing import List
class Solution:
def ladderLength(self, beginWord: str, endWord: str, wordList: List[str]) -> int:
wordSet = set(wordList)
if endWord not in wordSet:
return 0
queue = deque([(beginWord, 1)]) # (current_word, level)
visited = set([beginWord])
while queue:
word, level = queue.popleft()
if word == endWord:
return level
# Generate all possible neighbors
for i in range(len(word)):
original_char = word[i]
for char_code in range(ord('a'), ord('z') + 1):
char = chr(char_code)
if char == original_char:
continue
# Create new word: hit -> ait, bit, ... hat, hbt ...
new_word = word[:i] + char + word[i+1:]
if new_word in wordSet and new_word not in visited:
visited.add(new_word)
queue.append((new_word, level + 1))
return 0
Time Complexity Analysis:
- Preprocessing: Converting list to set takes $O(N \cdot M)$.
- BFS Traversal: In the worst case, we visit every word in the
wordList. - Neighbor Generation: For each word, we iterate through its length $M$. For each position, we try 26 characters. Creating the new string takes $O(M)$. Checking existence in the set takes $O(M)$ (average).
- Total Complexity: $O(N \cdot M^2)$.
- If $N = 5000$ and $M = 5$, $N \cdot M^2 = 5000 \cdot 25 = 125,000$ operations.
- Comparing all pairs would be $N^2 \cdot M = 25,000,000 \cdot 5 = 125,000,000$.
- The generation approach is significantly faster for typical constraints.
Space Complexity:
- $O(N \cdot M)$ to store
wordSet,visitedset, and the queue.
3. Bi-directional BFS (Optimization)
Intuition: Standard BFS searches a tree that grows exponentially. If the branching factor is $b$ and the distance to the target is $d$, BFS visits roughly $b^d$ nodes. Bi-directional BFS runs two simultaneous searches: one from the start and one from the end. They meet in the middle.
- Search 1: Start -> Middle (distance $d/2$)
- Search 2: End -> Middle (distance $d/2$)
- Total nodes visited: $b^{d/2} + b^{d/2} = 2 \cdot b^{d/2}$.
- This is exponentially smaller than $b^d$.
Algorithm:
- Maintain two sets:
beginSet(initially{beginWord}) andendSet(initially{endWord}). - Maintain a
visitedset containing all words visited by either search. - In each step, always expand the smaller set. This keeps the search balanced and minimizes the number of generated neighbors.
- For each word in the current set, generate neighbors.
- If a neighbor is found in the opposite set, the two searches have met! Return
level + 1. - Otherwise, if the neighbor is valid (in
wordSetand not visited), add it to the next layer.
class Solution:
def ladderLength(self, beginWord: str, endWord: str, wordList: List[str]) -> int:
wordSet = set(wordList)
if endWord not in wordSet:
return 0
beginSet = {beginWord}
endSet = {endWord}
visited = {beginWord, endWord}
length = 1
while beginSet and endSet:
# Always expand the smaller set to minimize work
if len(beginSet) > len(endSet):
beginSet, endSet = endSet, beginSet
newSet = set()
for word in beginSet:
for i in range(len(word)):
original_char = word[i]
for char_code in range(ord('a'), ord('z') + 1):
char = chr(char_code)
if char == original_char:
continue
new_word = word[:i] + char + word[i+1:]
# If the neighbor is in the opposite set, we connected the paths
if new_word in endSet:
return length + 1
if new_word in wordSet and new_word not in visited:
visited.add(new_word)
newSet.add(new_word)
beginSet = newSet
length += 1
return 0
Performance Comparison: Imagine a graph where each node has 10 neighbors ($b=10$) and the shortest path is 6 steps ($d=6$).
- Standard BFS: Visits $\approx 10^6 = 1,000,000$ nodes.
- Bi-directional BFS: Visits $\approx 2 \times 10^3 = 2,000$ nodes.
- Speedup: 500x faster!
4. Word Ladder II (Return All Shortest Paths)
Problem: Instead of just the length, return all shortest transformation sequences.
Example: hit -> hot -> dot -> dog -> cog AND hit -> hot -> lot -> log -> cog.
Challenge:
- We need to store the path structure.
- Standard BFS only stores the distance.
- Storing full paths in the queue consumes exponential memory.
Optimized Approach:
- BFS for Graph Building: Run BFS from
beginWordto find the shortest distance toendWord. While doing this, build a DAG (Directed Acyclic Graph) or aparentsmap whereparents[child]contains allparentsthat lead tochildwith the shortest distance.- Crucial: Only add edges from level $L$ to level $L+1$. Do not add edges within the same level or back to previous levels.
- DFS for Path Reconstruction: Run DFS (Backtracking) from
endWordback tobeginWordusing theparentsmap to reconstruct all paths.
from collections import defaultdict
class Solution:
def findLadders(self, beginWord: str, endWord: str, wordList: List[str]) -> List[List[str]]:
wordSet = set(wordList)
if endWord not in wordSet:
return []
# BFS initialization
layer = {beginWord}
parents = defaultdict(set) # word -> set of parents
wordSet.discard(beginWord) # Remove start word to avoid cycles
found = False
while layer and not found:
next_layer = set()
# Remove words in current layer from wordSet to prevent visiting them again in future layers
# But allow visiting them multiple times in the *current* layer (for multiple paths)
for word in layer:
if word in wordSet:
wordSet.remove(word)
# Actually, we need to remove words visited in the *current* layer from wordSet *after* processing the layer
# Let's refine the logic:
# 1. Generate next layer
# 2. If endWord found, stop after this layer
# 3. Remove next_layer words from wordSet
current_layer_visited = set()
for word in layer:
for i in range(len(word)):
for char_code in range(ord('a'), ord('z') + 1):
new_word = word[:i] + chr(char_code) + word[i+1:]
if new_word == endWord:
parents[endWord].add(word)
found = True
elif new_word in wordSet:
next_layer.add(new_word)
parents[new_word].add(word)
current_layer_visited.add(new_word)
wordSet -= current_layer_visited
layer = next_layer
if not found:
return []
# DFS Backtracking to reconstruct paths
res = []
def backtrack(current_word, path):
if current_word == beginWord:
res.append(path[::-1]) # Reverse path to get begin -> end
return
for parent in parents[current_word]:
path.append(parent)
backtrack(parent, path)
path.pop()
backtrack(endWord, [endWord])
return res
Complexity of Word Ladder II:
- Time: The number of shortest paths can be exponential. In the worst case, we might traverse a huge number of paths. However, the BFS part is still polynomial. The DFS part depends on the output size.
- Space: Storing the
parentsmap is proportional to the number of edges in the shortest-path DAG.
5. Deep Dive: Pre-processing for Faster Neighbor Generation
If wordList is very sparse (e.g., words are 10 chars long, but only 1000 words exist), the $26 \cdot M$ generation strategy might generate many invalid words.
We can pre-process the dictionary using a wildcard map.
Concept:
- Word:
hot - Intermediate states:
*ot,h*t,ho* - Map:
*ot->[hot, dot, lot]h*t->[hot, hit, hat]ho*->[hot, how]
Algorithm:
- Build the
all_combo_dict. - In BFS, for a word
hot, look up*ot,h*t,ho*in the dictionary. - The values are the direct neighbors.
from collections import defaultdict
class Solution:
def ladderLength(self, beginWord: str, endWord: str, wordList: List[str]) -> int:
if endWord not in wordList:
return 0
# Pre-processing
L = len(beginWord)
all_combo_dict = defaultdict(list)
for word in wordList:
for i in range(L):
all_combo_dict[word[:i] + "*" + word[i+1:]].append(word)
# BFS
queue = deque([(beginWord, 1)])
visited = {beginWord}
while queue:
current_word, level = queue.popleft()
for i in range(L):
intermediate_word = current_word[:i] + "*" + current_word[i+1:]
for neighbor in all_combo_dict[intermediate_word]:
if neighbor == endWord:
return level + 1
if neighbor not in visited:
visited.add(neighbor)
queue.append((neighbor, level + 1))
# Optimization: Clear the list to prevent reprocessing
# all_combo_dict[intermediate_word] = []
return 0
Trade-offs:
- Pros: Very fast neighbor lookup $O(M)$ instead of $O(26 \cdot M)$.
- Cons: High memory usage to store the dictionary. If words are dense (many words match
*ot), the adjacency lists become long.
6. Deep Dive: A* Search (Heuristic Search)
BFS is “blind”—it explores in all directions equally. A* Search uses a heuristic to prioritize paths that seem closer to the target.
Heuristic Function $h(n)$: We need an admissible heuristic (never overestimates the cost).
- Hamming Distance: The number of positions where characters differ.
hitvscog: 3 differences.hotvscog: 2 differences.hotis “closer” tocogthanhit.
Algorithm: Use a Priority Queue instead of a standard Queue. Priority = $g(n) + h(n)$, where:
- $g(n)$: Cost from start to current node (current level).
- $h(n)$: Estimated cost from current node to end.
import heapq
class Solution:
def ladderLength(self, beginWord: str, endWord: str, wordList: List[str]) -> int:
wordSet = set(wordList)
if endWord not in wordSet:
return 0
def heuristic(word):
return sum(c1 != c2 for c1, c2 in zip(word, endWord))
# Priority Queue: (estimated_total_cost, current_cost, word)
pq = [(heuristic(beginWord) + 1, 1, beginWord)]
visited = {beginWord: 1} # word -> cost
while pq:
_, cost, word = heapq.heappop(pq)
if word == endWord:
return cost
if cost > visited.get(word, float('inf')):
continue
for i in range(len(word)):
original_char = word[i]
for char_code in range(ord('a'), ord('z') + 1):
char = chr(char_code)
if char == original_char:
continue
new_word = word[:i] + char + word[i+1:]
if new_word in wordSet:
new_cost = cost + 1
if new_cost < visited.get(new_word, float('inf')):
visited[new_word] = new_cost
priority = new_cost + heuristic(new_word)
heapq.heappush(pq, (priority, new_cost, new_word))
return 0
Why A* isn’t always better here: In high-dimensional spaces like this (where branching factor is ~25), the heuristic (Hamming distance) is often not strong enough to prune the search space significantly compared to Bi-directional BFS. Bi-directional BFS is usually the winner for this specific problem.
7. Deep Dive: Real-world Applications
While transforming “hit” to “cog” is a puzzle, the underlying concepts have serious applications:
- Genetic Sequencing (Edit Distance):
- DNA sequences can be modeled as strings.
- Mutations are single-character changes.
- Finding the shortest path between two DNA sequences helps trace evolutionary history.
- Spell Checking:
- If a user types “wrd”, we want to find the closest valid word.
- This is a BFS of depth 1 or 2 from the typo.
- Error Correction Codes:
- Hamming codes allow detecting and correcting single-bit errors.
- This is essentially finding the nearest valid “codeword” in the graph of all possible binary strings.
- Semantic Word Embeddings:
- In NLP, we often want to move from one concept to another.
- “King” - “Man” + “Woman” = “Queen”.
- This is a continuous version of a word ladder in vector space.
8. Deep Dive: Variations and Constraints
Variation 1: Variable Word Lengths
- Rule: You can insert, delete, or replace a character.
- Graph: Neighbors include
hot->ho(delete),hot->shot(insert),hot->hat(replace). - Complexity: Branching factor increases significantly ($26 \cdot M$ replacements + $M$ deletions + $26 \cdot (M+1)$ insertions).
Variation 2: Weighted Edges
- Rule: Changing a vowel costs 2, consonant costs 1.
- Algorithm: Use Dijkstra’s Algorithm instead of BFS.
Variation 3: Constraint Satisfaction
- Rule: Must pass through a specific word (e.g.,
hit-> … ->dot-> … ->cog). - Algorithm: Run BFS twice:
hit->dotANDdot->cog. Sum the lengths.
Comparison Table
| Approach | Time Complexity | Space Complexity | Pros | Cons |
|---|---|---|---|---|
| Simple BFS | $O(M^2 N)$ | $O(MN)$ | Simple, guarantees shortest path | Slow for large graphs |
| Bi-directional BFS | $O(M^2 N^{0.5})$ | $O(MN^{0.5})$ | Fastest for 2-point search | More code, needs start/end known |
| Word Ladder II | Exponential | Exponential | Finds ALL paths | Very memory intensive |
| A* Search | Depends on heuristic | $O(MN)$ | Good for single path | Heuristic overhead, PQ overhead |
| Pre-processed Map | $O(M^2 N)$ | $O(M^2 N)$ | Fast neighbor lookup | High memory usage |
Implementation in Other Languages
C++:
class Solution {
public:
int ladderLength(string beginWord, string endWord, vector<string>& wordList) {
unordered_set<string> wordSet(wordList.begin(), wordList.end());
if (wordSet.find(endWord) == wordSet.end()) return 0;
queue<pair<string, int>> q;
q.push({beginWord, 1});
while (!q.empty()) {
auto [word, level] = q.front();
q.pop();
if (word == endWord) return level;
for (int i = 0; i < word.size(); ++i) {
char original = word[i];
for (char c = 'a'; c <= 'z'; ++c) {
if (c == original) continue;
word[i] = c;
if (wordSet.count(word)) {
q.push({word, level + 1});
wordSet.erase(word); // Mark as visited
}
}
word[i] = original;
}
}
return 0;
}
};
Java:
class Solution {
public int ladderLength(String beginWord, String endWord, List<String> wordList) {
Set<String> wordSet = new HashSet<>(wordList);
if (!wordSet.contains(endWord)) return 0;
Queue<String> queue = new LinkedList<>();
queue.offer(beginWord);
int level = 1;
while (!queue.isEmpty()) {
int size = queue.size();
for (int i = 0; i < size; i++) {
String currentWord = queue.poll();
char[] wordChars = currentWord.toCharArray();
for (int j = 0; j < wordChars.length; j++) {
char originalChar = wordChars[j];
for (char c = 'a'; c <= 'z'; c++) {
if (c == originalChar) continue;
wordChars[j] = c;
String newWord = new String(wordChars);
if (newWord.equals(endWord)) return level + 1;
if (wordSet.contains(newWord)) {
queue.offer(newWord);
wordSet.remove(newWord);
}
}
wordChars[j] = originalChar;
}
}
level++;
}
return 0;
}
}
Top Interview Questions
Q1: What if the word list is too large to fit in memory? Answer: If the list is on disk, we can’t use a hash set for $O(1)$ lookups.
- Bloom Filter: Load a Bloom Filter into memory. It can tell us if a word is definitely not in the set or probably in the set. If probably, check disk.
- Trie (Prefix Tree): Store words in a Trie to save space (shared prefixes).
- Distributed Search: Partition the words across multiple machines (sharding).
Q2: How would you handle words of different lengths? Answer: The problem definition implies equal lengths. If we allow insertions/deletions (Edit Distance = 1), we generate neighbors by:
- Substitution:
hit->hat - Insertion:
hit->hits,chit - Deletion:
hit->hi,itWe must check if these new words exist in the dictionary.
Q3: Can we use DFS? Answer: DFS is not suitable for finding the shortest path in an unweighted graph.
- DFS dives deep. It might find a path of length 100 before finding the optimal path of length 5.
- To find the shortest path with DFS, you’d have to explore all paths and compare them, which is $O(N!)$. BFS finds the shortest path as soon as it reaches the target.
Q4: What is the maximum number of edges from a word? Answer: For a word of length $M$ and an alphabet size of 26:
- Each of the $M$ positions can be changed to 25 other characters.
- Total potential neighbors = $M \times 25$.
- However, the number of valid edges depends on how many of these potential neighbors actually exist in
wordList.
Q5: How does Bi-directional BFS handle the meeting point? Answer: The meeting point is not necessarily a single node. It’s an edge.
- Search A reaches node $U$.
- Search B reaches node $V$.
- If $V$ is a neighbor of $U$, the paths connect.
- The total length is
level(U) + level(V). In my implementation, I checkif new_word in endSet, which effectively checks for this connection.
Key Takeaways
- Graph Modeling: The core skill is recognizing that words are nodes and edits are edges.
- BFS for Shortest Path: Always the first choice for unweighted shortest path problems.
- Bi-directional BFS: A critical optimization for search problems where start and end are known. It reduces the search space from $b^d$ to $2 \cdot b^{d/2}$.
- Neighbor Generation: Iterating ‘a’-‘z’ ($26M$) is usually faster than iterating the word list ($N$) when $N$ is large.
- Word Ladder II: Requires a two-phase approach: BFS to build the shortest-path graph, then DFS to extract paths.
Summary
| Aspect | Insight |
|---|---|
| Core Problem | Shortest path in a graph of words |
| Best Algorithm | Bi-directional BFS |
| Key Optimization | Generate neighbors by swapping chars, not iterating list |
| Applications | Spell checking, genetic sequencing, puzzle solving |
Originally published at: arunbaby.com/dsa/0032-word-ladder
If you found this helpful, consider sharing it with others who might benefit.