6 minute read

“You can’t build the roof before you pour the foundation.”

TL;DR

Course Schedule is a cycle detection problem on a directed graph. Model courses as nodes and prerequisites as edges, then check if the graph is a DAG using Kahn’s Algorithm. Start with zero-indegree nodes, process them, decrement neighbors’ indegrees, and repeat. If all nodes are processed, no cycle exists and all courses can be completed. This is the same algorithm that powers build systems and DAG pipeline orchestrators like Airflow, and extends naturally to the Alien Dictionary problem.

1. Problem Statement

This is the canonical “Dependency Resolution” problem. There are numCourses courses you have to take, labeled from 0 to numCourses - 1. You are given an array prerequisites where prerequisites[i] = [a, b] indicates that you must take course b first if you want to take course a.

Return true if you can finish all courses. Otherwise, return false.

Example 1:

  • Input: numCourses = 2, prerequisites = [[1,0]]
  • Output: true
  • Explanation: To take course 1 you should have finished course 0. So it is possible.

Example 2:

  • Input: numCourses = 2, prerequisites = [[1,0],[0,1]]
  • Output: false
  • Explanation: To take 1 you need 0. To take 0 you need 1. Impossible (Cycle).

2. Understanding the Problem

2.1 Directed Acyclic Graphs (DAGs)

This problem models a Directed Graph.

  • Nodes = Courses.
  • Edges = Dependencies (b -> a implies b creates a).

The question “Can I finish?” is synonymous with: “Does this graph contain a Cycle?” If there is a cycle (A -> B -> A), you can never start A or B. If there is no cycle (a DAG), there always exists a valid linear ordering, called a Topological Sort.

2.2 The “Indegree” Intuition

How do you know which course to take first? Simple: Take the one with zero prerequisites.

  • If Course 0 has no prerequisites (Indegree = 0), take it.
  • Once taken, Course 0 is “done”. Now, any course that only needed Course 0 might now have its prerequisites satisfied.
  • We effectively “remove” Course 0 and its outgoing edges from the graph.
  • Repeat until empty.

3. Approach 1: DFS (Cycle Detection)

We can process each node with DFS.

  • States: Unvisited (0), Visiting (1), Visited (2).
  • If we encounter a node marked Visiting, we found a back-edge -> Cycle.

Pros: Easy to write. Cons: Harder to generate the actual sort order (requires reversing the post-order). Large recursion depth.


4. Approach 2: Kahn’s Algorithm (BFS) – The Standard

This approach directly simulates the “peeling onion” strategy.

Algorithm

  1. Build Graph: Adjacency list + Indegree array.
    • Adj[u] = [v1, v2]
    • Indegree[v1]++, Indegree[v2]++
  2. Initialize Queue: Add all nodes with Indegree == 0 to Queue.
  3. Process:
    • While Queue is not empty:
    • Pop u. Add to list of taken_courses.
    • For each child v of u:
    • Indegree[v]-- (We satisfied one prerequisite).
    • If Indegree[v] == 0, push v to Queue.
  4. Check: If count(taken_courses) == numCourses, return True. Else, False (Cycle detected, some nodes never reached Indegree 0).

5. Implementation: Kahn’s Algorithm

from collections import deque

class Solution:
    def canFinish(self, numCourses: int, prerequisites: list[list[int]]) -> bool:
        # 1. Initialize Graph
        adj = [[] for _ in range(numCourses)]
        indegree = [0] * numCourses

        # 2. Build Graph (Time: O(E), Space: O(E))
        for dest, src in prerequisites:
            # src -> dest (take src first)
            adj[src].append(dest)
            indegree[dest] += 1

            # 3. Add seeds (Time: O(V))
            queue = deque()
            for i in range(numCourses):
                if indegree[i] == 0:
                    queue.append(i)

                    processed_count = 0

                    # 4. BFS Traversal (Time: O(V + E))
                    while queue:
                        node = queue.popleft()
                        processed_count += 1

                        for neighbor in adj[node]:
                            indegree[neighbor] -= 1
                            if indegree[neighbor] == 0:
                                queue.append(neighbor)

                                # 5. Final check
                                return processed_count == numCourses

                                # Example Usage
                                sol = Solution()
                                print(sol.canFinish(2, [[1,0]])) # True
                                print(sol.canFinish(2, [[1,0], [0,1]])) # False

6. Testing Strategy

Test Case 1: Disconnected Components

0 -> 1 and 2 -> 3.

  • Queue starts with {0, 2}.
  • Pop 0 -> Add 1 to Queue.
  • Pop 2 -> Add 3 to Queue.
  • All processed. Valid.

Test Case 2: Deadlock (Cycle)

0 -> 1 -> 0.

  • Indegree: [1, 1].
  • Queue starts Empty.
  • Processed Count = 0.
  • Returns False.

7. Complexity Analysis

  • Time: O(V + E).
  • V: Initializing lists.
  • E: Building graph edges.
  • BFS loop visits every node (V) and every edge (E) exactly once.
  • Space: O(V + E).
  • To store the Adjacency List.

This is asymptotically optimal.


8. Production Considerations

This algorithm is the backbone of Build Systems (Make, Maven, Bazel).

  • Parallel Dependencies: The Queue length represents the level of parallelism. If Queue has 5 items, it means 5 tasks are ready to run simultaneously on 5 worker threads. This is how make -j8 works!

9. Connections to ML Systems

This directly maps to DAG Pipeline Orchestration (ML System Design).

  • Airflow / Kubeflow: These tools literally execute Kahn’s Algorithm.
  • Nodes = ETL Tasks (Load Data, Train Model).
  • Edges = Data dependencies.
  • The Scheduler puts tasks with Indegree 0 into the worker pool.

10. Interview Strategy

  1. Identify Graph: Start by saying “This is a dependency problem, which can be modeled as a Graph.”
  2. Define Edge Direction: Be careful! “Prerequisite [a, b] means b -> a”. Getting this backward ruins the code.
  3. Mention Cycle: Explicitly mention “If there’s a cycle, we can’t finish.”
  4. Compare DFS vs BFS: “I prefer Kahn’s (BFS) because it’s iterative (no stack overflow) and easy to extend to parallel execution logic.”

11. Key Takeaways

  1. Indegree is Key: It represents “Unsatisfied Constraints”.
  2. Topological Sort order is not unique: If queue has [A, B], AB and BA are both valid sorts.
  3. Cycle Detection: If processed < total, the remaining nodes are locked in a cycle.

FAQ

How do you detect if courses can all be completed?

Model courses as a directed graph where edges represent prerequisites. If the graph contains a cycle, completion is impossible. Use topological sort via Kahn’s Algorithm to process nodes with zero indegree. If the processed count equals total courses, all can be completed.

What is Kahn’s Algorithm and how does it work?

Kahn’s Algorithm is a BFS-based topological sort. Initialize a queue with all zero-indegree nodes. Pop each node, decrement the indegree of its neighbors, and add newly zero-indegree neighbors to the queue. If all nodes are processed, the graph has no cycles.

Why prefer BFS over DFS for topological sort?

BFS (Kahn’s) is iterative so it avoids stack overflow risks, directly produces the topological order without reversal, and naturally reveals parallelism since all items in the queue at each level can execute simultaneously.

Where is topological sort used in production systems?

Build systems like Make and Bazel, DAG pipeline orchestrators like Airflow and Kubeflow, package managers, and CI/CD dependency resolution all use topological sort to determine valid execution order from dependency graphs.


Originally published at: arunbaby.com/dsa/0049-course-schedule