Course Schedule (Topological Sort)
“Can you finish all courses given their prerequisites?”
TL;DR
The Course Schedule problem asks whether all courses can be completed given prerequisite constraints. It reduces to cycle detection in a directed graph. Use DFS with three-color marking (white/gray/black) or Kahn’s BFS algorithm with in-degree tracking, both running in O(V+E) time. If a topological ordering exists, all courses can be completed; a back-edge to a gray node means a cycle exists and completion is impossible. Kahn’s algorithm is often preferred for its intuitive level-by-level processing. For related graph traversal problems, see the Word Ladder BFS approach or explore how topological ordering connects to dependency resolution in ML pipelines.
1. Problem Statement
There are a total of numCourses courses you have to take, labeled from 0 to numCourses - 1. You are given an array prerequisites where prerequisites[i] = [ai, bi] indicates that you must take course bi first if you want to take course ai.
Return true if you can finish all courses. Otherwise, return false.
Example 1:
Input: numCourses = 2, prerequisites = [[1,0]]
Output: true
Explanation: Take course 0, then course 1.
Example 2:
Input: numCourses = 2, prerequisites = [[1,0],[0,1]]
Output: false
Explanation: Circular dependency (cycle).
This is a cycle detection problem in a directed graph!
2. DFS Solution (Cycle Detection)
Intuition:
- Build a directed graph:
course_a -> course_bmeans “a depends on b”. - Use DFS to detect cycles.
- If there’s a cycle, courses can’t be completed.
States during DFS:
- 0 (White): Unvisited.
- 1 (Gray): Visiting (currently in DFS stack).
- 2 (Black): Visited (DFS completed).
Cycle Detection: If we encounter a Gray node during DFS, there’s a cycle.
class Solution:
def canFinish(self, numCourses: int, prerequisites: List[List[int]]) -> bool:
# Build adjacency list
graph = defaultdict(list)
for course, prereq in prerequisites:
graph[course].append(prereq)
# States: 0 = unvisited, 1 = visiting, 2 = visited
state = [0] * numCourses
def has_cycle(course):
if state[course] == 1: # Currently visiting → cycle!
return True
if state[course] == 2: # Already processed
return False
# Mark as visiting
state[course] = 1
# Visit all prerequisites
for prereq in graph[course]:
if has_cycle(prereq):
return True
# Mark as visited
state[course] = 2
return False
# Check all courses
for course in range(numCourses):
if has_cycle(course):
return False
return True
Time Complexity: \(O(V + E)\) where V = courses, E = prerequisites. Space Complexity: \(O(V + E)\) for graph + \(O(V)\) for recursion stack.
3. BFS Solution (Kahn’s Algorithm - Topological Sort)
Intuition:
- Count in-degree (number of prerequisites) for each course.
- Process courses with in-degree 0 (no prerequisites).
- Remove processed courses and update in-degrees.
- If all courses are processed, return true.
from collections import deque, defaultdict
class Solution:
def canFinish(self, numCourses: int, prerequisites: List[List[int]]) -> bool:
# Build graph and in-degree count
graph = defaultdict(list)
in_degree = [0] * numCourses
for course, prereq in prerequisites:
graph[prereq].append(course) # prereq -> course
in_degree[course] += 1
# Queue of courses with no prerequisites
queue = deque([i for i in range(numCourses) if in_degree[i] == 0])
processed = 0
while queue:
course = queue.popleft()
processed += 1
# Remove this course and update dependent courses
for dependent in graph[course]:
in_degree[dependent] -= 1
if in_degree[dependent] == 0:
queue.append(dependent)
return processed == numCourses
Time Complexity: \(O(V + E)\). Space Complexity: \(O(V + E)\).
4. Course Schedule II (Return the Order)
Problem: Return a valid course order. If impossible, return [].
class Solution:
def findOrder(self, numCourses: int, prerequisites: List[List[int]]) -> List[int]:
graph = defaultdict(list)
in_degree = [0] * numCourses
for course, prereq in prerequisites:
graph[prereq].append(course)
in_degree[course] += 1
queue = deque([i for i in range(numCourses) if in_degree[i] == 0])
order = []
while queue:
course = queue.popleft()
order.append(course)
for dependent in graph[course]:
in_degree[dependent] -= 1
if in_degree[dependent] == 0:
queue.append(dependent)
return order if len(order) == numCourses else []
Deep Dive: Topological Sort - Why It Works
Topological Ordering: A linear ordering of vertices such that for every directed edge \(u \to v\), \(u\) comes before \(v\).
Theorem: A topological ordering exists if and only if the graph is a DAG (Directed Acyclic Graph).
Proof:
- ⇒ (If topological ordering exists, then no cycles):
- Suppose there’s a cycle \(v_1 \to v_2 \to \ldots \to v_k \to v_1\).
- In a topological ordering, \(v_1\) must come before \(v_2\), \(v_2\) before \(v_3\), …, \(v_k\) before \(v_1\).
-
This implies \(v_1\) comes before \(v_1\) → Contradiction!
- ⇐ (If no cycles, then topological ordering exists):
- In a DAG, there exists at least one vertex with in-degree 0 (no incoming edges).
- Remove this vertex and repeat. This gives a topological ordering.
Deep Dive: Kahn’s Algorithm Correctness
Algorithm:
- Find all vertices with in-degree 0.
- Remove them (add to result) and update neighbors’ in-degrees.
- Repeat until no more vertices with in-degree 0.
Why it works:
- Vertices with in-degree 0 have no dependencies → safe to process.
- Removing a vertex is equivalent to marking it as “done”.
- If the graph has a cycle, there will always be vertices with in-degree > 0 (can’t find a starting point).
Invariant: At each step, processed vertices form a valid prefix of a topological ordering.
Deep Dive: DFS-based Topological Sort
Idea: Use DFS and add vertices to the result in post-order (after visiting all descendants).
def topological_sort_dfs(graph, num_vertices):
visited = [False] * num_vertices
stack = []
def dfs(v):
visited[v] = True
for neighbor in graph[v]:
if not visited[neighbor]:
dfs(neighbor)
stack.append(v) # Add after visiting all descendants
for v in range(num_vertices):
if not visited[v]:
dfs(v)
return stack[::-1] # Reverse to get topological order
Why reverse?
- DFS post-order gives vertices in decreasing finish time.
- In a topological ordering, vertices with higher finish time should come first.
Deep Dive: Minimum Number of Semesters
Problem: What’s the minimum number of semesters needed to complete all courses?
Solution: This is finding the longest path in the DAG.
def minimumSemesters(numCourses, prerequisites):
graph = defaultdict(list)
in_degree = [0] * numCourses
for course, prereq in prerequisites:
graph[prereq].append(course)
in_degree[course] += 1
queue = deque([(i, 1) for i in range(numCourses) if in_degree[i] == 0]) # (course, semester)
processed = 0
max_semester = 0
while queue:
course, semester = queue.popleft()
processed += 1
max_semester = max(max_semester, semester)
for dependent in graph[course]:
in_degree[dependent] -= 1
if in_degree[dependent] == 0:
queue.append((dependent, semester + 1))
return max_semester if processed == numCourses else -1
Time Complexity: \(O(V + E)\).
Deep Dive: Parallel Course Scheduling (Load Balancing)
Problem: You have \(K\) workers. What’s the minimum time to complete all courses?
Approach: DP on DAG
def minTimeToFinish(numCourses, prerequisites, time, K):
graph = defaultdict(list)
in_degree = [0] * numCourses
for course, prereq in prerequisites:
graph[prereq].append(course)
in_degree[course] += 1
# dp[course] = earliest time this course can start
dp = [0] * numCourses
# Topological sort with time tracking
queue = deque([i for i in range(numCourses) if in_degree[i] == 0])
while queue:
# Process K courses in parallel
current_batch = []
for _ in range(min(K, len(queue))):
if queue:
current_batch.append(queue.popleft())
for course in current_batch:
for dependent in graph[course]:
dp[dependent] = max(dp[dependent], dp[course] + time[course])
in_degree[dependent] -= 1
if in_degree[dependent] == 0:
queue.append(dependent)
return max(dp[i] + time[i] for i in range(numCourses))
Deep Dive: All Possible Topological Orderings
Problem: Print all valid course orderings.
Approach: Backtracking
def allTopologicalSorts(graph, num_vertices):
in_degree = [0] * num_vertices
for u in graph:
for v in graph[u]:
in_degree[v] += 1
result = []
def backtrack(current_order, remaining_in_degree):
# Find all vertices with in-degree 0
available = [v for v in range(num_vertices) if remaining_in_degree[v] == 0 and v not in current_order]
if not available:
if len(current_order) == num_vertices:
result.append(current_order[:])
return
for v in available:
# Choose v
current_order.append(v)
# Update in-degrees
new_in_degree = remaining_in_degree[:]
new_in_degree[v] = -1 # Mark as used
for neighbor in graph[v]:
new_in_degree[neighbor] -= 1
# Recurse
backtrack(current_order, new_in_degree)
# Unchoose
current_order.pop()
backtrack([], in_degree)
return result
Time Complexity: \(O(V! \cdot E)\) in worst case (exponential).
Deep Dive: Lexicographically Smallest Topological Sort
Problem: Among all valid orderings, return the lexicographically smallest.
Solution: Use a min-heap instead of queue in Kahn’s algorithm.
import heapq
def lexicographicallySmallestOrder(numCourses, prerequisites):
graph = defaultdict(list)
in_degree = [0] * numCourses
for course, prereq in prerequisites:
graph[prereq].append(course)
in_degree[course] += 1
# Min-heap instead of queue
heap = [i for i in range(numCourses) if in_degree[i] == 0]
heapq.heapify(heap)
order = []
while heap:
course = heapq.heappop(heap)
order.append(course)
for dependent in graph[course]:
in_degree[dependent] -= 1
if in_degree[dependent] == 0:
heapq.heappush(heap, dependent)
return order if len(order) == numCourses else []
Time Complexity: \(O((V + E) \log V)\) due to heap operations.
Deep Dive: Detecting Strongly Connected Components (Kosaraju’s Algorithm)
Strongly Connected Component (SCC): A maximal subgraph where every vertex is reachable from every other vertex.
In the context of courses: Courses in the same SCC form a circular dependency (impossible to complete).
Kosaraju’s Algorithm:
- Perform DFS on original graph, record finish times.
- Reverse the graph.
- Perform DFS on reversed graph in decreasing finish time order.
- Each DFS tree in step 3 is an SCC.
def findSCC(graph, num_vertices):
# Step 1: DFS to get finish times
visited = [False] * num_vertices
stack = []
def dfs1(v):
visited[v] = True
for neighbor in graph[v]:
if not visited[neighbor]:
dfs1(neighbor)
stack.append(v)
for v in range(num_vertices):
if not visited[v]:
dfs1(v)
# Step 2: Reverse graph
reversed_graph = defaultdict(list)
for u in graph:
for v in graph[u]:
reversed_graph[v].append(u)
# Step 3: DFS on reversed graph
visited = [False] * num_vertices
sccs = []
def dfs2(v, component):
visited[v] = True
component.append(v)
for neighbor in reversed_graph[v]:
if not visited[neighbor]:
dfs2(neighbor, component)
while stack:
v = stack.pop()
if not visited[v]:
component = []
dfs2(v, component)
sccs.append(component)
return sccs
Application: If any SCC has size > 1, there’s a cycle.
Deep Dive: Critical Path Method (CPM)
Problem: In project management, find the longest path (critical path) which determines project duration.
Example:
- Task A takes 3 days.
- Task B takes 5 days and depends on A.
- Task C takes 2 days and depends on A.
- Task D takes 4 days and depends on B and C.
Critical Path: A → B → D (3 + 5 + 4 = 12 days).
def criticalPath(tasks, dependencies):
graph = defaultdict(list)
in_degree = [0] * len(tasks)
for task, dependency in dependencies:
graph[dependency].append(task)
in_degree[task] += 1
# Earliest start time
earliest = [0] * len(tasks)
queue = deque([i for i in range(len(tasks)) if in_degree[i] == 0])
while queue:
task = queue.popleft()
for dependent in graph[task]:
earliest[dependent] = max(earliest[dependent], earliest[task] + tasks[task])
in_degree[dependent] -= 1
if in_degree[dependent] == 0:
queue.append(dependent)
# Latest start time (backward pass)
latest = [max(earliest)] * len(tasks)
in_degree = [len(graph[i]) for i in range(len(tasks))]
queue = deque([i for i in range(len(tasks)) if in_degree[i] == 0]) # Tasks with no dependents
while queue:
task = queue.popleft()
for predecessor in reversed_graph[task]:
latest[predecessor] = min(latest[predecessor], latest[task] - tasks[predecessor])
in_degree[predecessor] -= 1
if in_degree[predecessor] == 0:
queue.append(predecessor)
# Critical tasks have earliest == latest (no slack)
critical_tasks = [i for i in range(len(tasks)) if earliest[i] == latest[i]]
return max(earliest) + max(tasks), critical_tasks
Comparison Table
| Approach | Time | Space | Best Use Case |
|---|---|---|---|
| DFS (Cycle Detection) | \(O(V + E)\) | \(O(V)\) | Simple cycle detection |
| BFS (Kahn’s) | \(O(V + E)\) | \(O(V)\) | Topological ordering |
| DFS (Post-order) | \(O(V + E)\) | \(O(V)\) | Topological ordering |
| Min-Heap Kahn’s | \(O((V+E)\log V)\) | \(O(V)\) | Lexicographically smallest order |
| All Orderings | \(O(V! \cdot E)\) | \(O(V^2)\) | Enumerate all valid orderings |
Implementation in Other Languages
C++:
class Solution {
public:
bool canFinish(int numCourses, vector<vector<int>>& prerequisites) {
vector<vector<int>> graph(numCourses);
vector<int> indegree(numCourses, 0);
for (auto& pre : prerequisites) {
graph[pre[1]].push_back(pre[0]);
indegree[pre[0]]++;
}
queue<int> q;
for (int i = 0; i < numCourses; i++) {
if (indegree[i] == 0) q.push(i);
}
int count = 0;
while (!q.empty()) {
int course = q.front(); q.pop();
count++;
for (int next : graph[course]) {
if (--indegree[next] == 0) {
q.push(next);
}
}
}
return count == numCourses;
}
};
Java:
class Solution {
public boolean canFinish(int numCourses, int[][] prerequisites) {
List<List<Integer>> graph = new ArrayList<>();
for (int i = 0; i < numCourses; i++) {
graph.add(new ArrayList<>());
}
int[] indegree = new int[numCourses];
for (int[] pre : prerequisites) {
graph.get(pre[1]).add(pre[0]);
indegree[pre[0]]++;
}
Queue<Integer> queue = new LinkedList<>();
for (int i = 0; i < numCourses; i++) {
if (indegree[i] == 0) queue.offer(i);
}
int count = 0;
while (!queue.isEmpty()) {
int course = queue.poll();
count++;
for (int next : graph.get(course)) {
if (--indegree[next] == 0) {
queue.offer(next);
}
}
}
return count == numCourses;
}
}
Top Interview Questions
Q1: What’s the difference between DFS and BFS for topological sort? Answer: Both have \(O(V + E)\) time complexity. DFS uses post-order traversal and requires reversing the result. BFS (Kahn’s algorithm) is more intuitive and naturally produces the ordering. Choose Kahn’s for simplicity.
Q2: Can there be multiple valid topological orderings? Answer: Yes! For example, given courses [0, 1, 2] with prerequisites [[2, 0], [2, 1]], both [0, 1, 2] and [1, 0, 2] are valid orderings (0 and 1 can be taken in any order before 2).
Q3: How do you handle multiple disconnected components in the graph? Answer: Both DFS and BFS approaches naturally handle this. In DFS, we iterate through all vertices. In BFS, we start with all vertices having in-degree 0, which handles all components.
Q4: What if prerequisites have duplicates?
Answer:
Use a set to avoid duplicate edges: graph[prereq].append(course) only if course not already in graph[prereq]. Or, accept duplicates as they don’t affect correctness, just efficiency slightly.
Key Takeaways
- Cycle Detection = Impossibility: If the dependency graph has a cycle, courses cannot be completed.
- Two Approaches: DFS (3-color marking) and BFS (Kahn’s algorithm) both work in \(O(V + E)\) time.
- Topological Sort: Linear ordering of vertices respecting edge directions (only exists for DAGs).
- Applications: Build systems (Make, Gradle), dependency resolution (npm, pip), job scheduling, spreadsheet calculations.
- Critical Path: Finding the longest path in a DAG determines project completion time.
Summary
| Aspect | Insight |
|---|---|
| Core Problem | Detect cycles in a directed graph |
| Best Solution | BFS with Kahn’s algorithm (intuitive) |
| Key Insight | Process nodes with in-degree 0 first |
| Applications | Build systems, package managers, project planning |
FAQ
What is topological sort and when is it used?
Topological sort produces a linear ordering of vertices in a directed acyclic graph (DAG) such that for every edge u to v, u comes before v. It is widely used in build systems (Make, Gradle), package managers (npm, pip), job scheduling, spreadsheet dependency resolution, and course prerequisite planning. Any scenario where tasks have ordering constraints benefits from topological sort.
How do you detect a cycle in a directed graph?
Use DFS with three states: unvisited (white), visiting (gray), and visited (black). If you encounter a gray node during DFS, a back-edge exists and therefore a cycle is present. Alternatively, Kahn’s algorithm detects cycles when not all nodes are processed after the BFS completes, since nodes in a cycle will always have a non-zero in-degree.
What is the difference between DFS and BFS approaches for topological sort?
DFS uses post-order traversal and reverses the result to produce the topological ordering. BFS via Kahn’s algorithm processes nodes with in-degree zero iteratively, naturally producing the ordering without reversal. Both run in O(V+E) time and space, but Kahn’s algorithm is often considered more intuitive and easier to reason about for interview settings.
Can there be multiple valid topological orderings?
Yes. Whenever multiple nodes have in-degree zero simultaneously during Kahn’s algorithm, they can be processed in any order, leading to different valid topological orderings. To obtain the lexicographically smallest ordering, replace the queue with a min-heap. The number of valid orderings can be exponential in the worst case.
Originally published at: arunbaby.com/dsa/0031-course-schedule