23 minute read

Master BST validation to understand data integrity in tree structures, critical for indexing and search systems.

Problem

Given the root of a binary tree, determine if it is a valid binary search tree (BST).

A valid BST is defined as:

  1. The left subtree of a node contains only nodes with keys less than the node’s key
  2. The right subtree of a node contains only nodes with keys greater than the node’s key
  3. Both left and right subtrees must also be binary search trees

Example 1:

Input:    2
         / \
        1   3

Output: true
Explanation: Valid BST

Example 2:

Input:    5
         / \
        1   4
           / \
          3   6

Output: false
Explanation: 4 is in right subtree of 5, but 3 < 5

Constraints:

  • Number of nodes: [1, 10^4]
  • Node values: -2^31 <= val <= 2^31 - 1

Understanding BST Properties

Valid BST

        5
       / \
      3   7
     / \ / \
    2  4 6  8

✓ All left descendants < node < all right descendants
✓ Inorder traversal: [2, 3, 4, 5, 6, 7, 8] (sorted!)

Invalid BST Examples

Example 1: Wrong child placement

        5
       / \
      6   7
         
✗ 6 > 5 but in left subtree

Example 2: Subtree violation

        10
       /  \
      5    15
          /  \
         6   20

✗ 6 < 10 but in right subtree
   Even though 6 < 15 (local property holds)

Example 3: Duplicate values

        5
       / \
      5   7

✗ BST requires strict inequality (depends on problem definition)
   Some definitions allow duplicates in one direction

Approach 1: Recursive with Range Validation

Key insight: Each node must fall within a valid range [min, max]

Algorithm

class TreeNode:
    def __init__(self, val=0, left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

def isValidBST(root: TreeNode) -> bool:
    """
    Validate BST using range checking
    
    Time: O(n) - visit each node once
    Space: O(h) - recursion stack, h = height
    """
    def validate(node, min_val, max_val):
        # Empty tree is valid
        if not node:
            return True
        
        # Check current node's value
        if node.val <= min_val or node.val >= max_val:
            return False
        
        # Validate subtrees with updated ranges
        # Left subtree: all values must be < node.val
        # Right subtree: all values must be > node.val
        return (validate(node.left, min_val, node.val) and
                validate(node.right, node.val, max_val))
    
    # Initial range: (-∞, +∞)
    return validate(root, float('-inf'), float('inf'))

# Example usage
root = TreeNode(2)
root.left = TreeNode(1)
root.right = TreeNode(3)

print(isValidBST(root))  # True

Visualization

Validate:    5
            / \
           3   7

Call tree:
validate(5, -∞, +∞)
├─ validate(3, -∞, 5)     ← 3 must be in (-∞, 5)
│  ├─ validate(None)      ← True
│  └─ validate(None)      ← True
└─ validate(7, 5, +∞)     ← 7 must be in (5, +∞)
   ├─ validate(None)      ← True
   └─ validate(None)      ← True

Result: True

Invalid example:

Validate:    10
            /  \
           5    15
               /  \
              6   20

validate(10, -∞, +∞)
├─ validate(5, -∞, 10)    ← OK: 5 in (-∞, 10)
└─ validate(15, 10, +∞)   ← OK: 15 in (10, +∞)
   ├─ validate(6, 10, 15) ← FAIL: 6 not in (10, 15)
   │                          6 <= 10 (min_val)
   └─ ...

Result: False

Approach 2: Inorder Traversal

Key insight: Inorder traversal of BST produces sorted sequence

Algorithm

def isValidBST(root: TreeNode) -> bool:
    """
    Validate using inorder traversal
    
    Check if inorder gives strictly increasing sequence
    
    Time: O(n)
    Space: O(n) for storing values
    """
    def inorder(node, values):
        if not node:
            return
        
        inorder(node.left, values)
        values.append(node.val)
        inorder(node.right, values)
    
    values = []
    inorder(root, values)
    
    # Check if strictly increasing
    for i in range(1, len(values)):
        if values[i] <= values[i-1]:
            return False
    
    return True

Space-Optimized Version

def isValidBST(root: TreeNode) -> bool:
    """
    Inorder validation without storing all values
    
    Time: O(n)
    Space: O(h) - recursion stack only
    """
    def inorder(node):
        nonlocal prev
        
        if not node:
            return True
        
        # Validate left subtree
        if not inorder(node.left):
            return False
        
        # Check current node
        if node.val <= prev:
            return False
        prev = node.val
        
        # Validate right subtree
        return inorder(node.right)
    
    prev = float('-inf')
    return inorder(root)

Approach 3: Iterative with Stack

Advantage: Avoids recursion (useful for very deep trees)

def isValidBST(root: TreeNode) -> bool:
    """
    Iterative inorder validation
    
    Time: O(n)
    Space: O(h)
    """
    stack = []
    prev = float('-inf')
    current = root
    
    while current or stack:
        # Go to leftmost node
        while current:
            stack.append(current)
            current = current.left
        
        # Process node
        current = stack.pop()
        
        # Check BST property
        if current.val <= prev:
            return False
        prev = current.val
        
        # Move to right subtree
        current = current.right
    
    return True

# Example
root = TreeNode(5)
root.left = TreeNode(1)
root.right = TreeNode(4)
root.right.left = TreeNode(3)
root.right.right = TreeNode(6)

print(isValidBST(root))  # False (4 > 1 but 3 < 5)

Stack Visualization

Tree:     5
         / \
        3   7
       / \
      2   4

Iteration 1:
  stack: [5, 3, 2]
  current: 2 → pop → check 2 > -∞ ✓
  prev = 2

Iteration 2:
  stack: [5, 3]
  current: 3 → pop → check 3 > 2 ✓
  prev = 3

Iteration 3:
  current: 4 → check 4 > 3 ✓
  prev = 4

Iteration 4:
  stack: [5]
  current: 5 → pop → check 5 > 4 ✓
  prev = 5

Iteration 5:
  current: 7 → check 7 > 5 ✓
  prev = 7

Result: True

Edge Cases

1. Single Node

def test_single_node():
    """Single node is always valid BST"""
    root = TreeNode(1)
    assert isValidBST(root) == True

test_single_node()

2. Duplicate Values

def test_duplicates():
    """
    Duplicates are typically invalid
    
         5
        / \
       5   5
    """
    root = TreeNode(5)
    root.left = TreeNode(5)
    root.right = TreeNode(5)
    
    assert isValidBST(root) == False

test_duplicates()

3. Integer Overflow Edge Cases

def test_extreme_values():
    """Test with min/max integer values"""
    # Tree with INT_MIN
    root1 = TreeNode(-2**31)
    root1.right = TreeNode(2**31 - 1)
    assert isValidBST(root1) == True
    
    # Tree with INT_MAX
    root2 = TreeNode(2**31 - 1)
    root2.left = TreeNode(-2**31)
    assert isValidBST(root2) == True

test_extreme_values()

4. Skewed Trees

def test_skewed():
    """
    Right-skewed tree (like linked list)
    
    1 → 2 → 3 → 4
    """
    root = TreeNode(1)
    root.right = TreeNode(2)
    root.right.right = TreeNode(3)
    root.right.right.right = TreeNode(4)
    
    assert isValidBST(root) == True

test_skewed()

Common Mistakes

Mistake 1: Only Checking Immediate Children

# WRONG: Only checks node > left and node < right
def isValidBST_WRONG(root):
    if not root:
        return True
    
    if root.left and root.left.val >= root.val:
        return False
    if root.right and root.right.val <= root.val:
        return False
    
    return (isValidBST_WRONG(root.left) and 
            isValidBST_WRONG(root.right))

# Fails on:
#     10
#    /  \
#   5   15
#      /  \
#     6   20
#
# This is INVALID (6 < 10) but above code returns True!

Mistake 2: Using Default Integer Min/Max

# WRONG: Can't handle trees with actual INT_MIN/MAX values
def isValidBST_WRONG(root):
    def validate(node, min_val=-2**31, max_val=2**31-1):
        if not node:
            return True
        
        # Problem: if node.val == -2**31, this fails incorrectly
        if node.val <= min_val or node.val >= max_val:
            return False
        
        return (validate(node.left, min_val, node.val) and
                validate(node.right, node.val, max_val))
    
    return validate(root)

# Use float('-inf') and float('inf') instead!

Mistake 3: Forgetting Strict Inequality

# WRONG: Uses <= instead of <
def isValidBST_WRONG(root):
    def validate(node, min_val, max_val):
        if not node:
            return True
        
        # Should be < and >, not <= and >=
        if node.val < min_val or node.val > max_val:
            return False
        
        return (validate(node.left, min_val, node.val) and
                validate(node.right, node.val, max_val))
    
    return validate(root, float('-inf'), float('inf'))

# Allows duplicates!

Advanced Variations

Problem 1: Validate BST with Duplicates

def isValidBSTWithDuplicates(root: TreeNode, allow_left_duplicates=True) -> bool:
    """
    Validate BST allowing duplicates
    
    Args:
        allow_left_duplicates: If True, duplicates go to left
                               If False, duplicates go to right
    
    Returns:
        True if valid BST with duplicates
    """
    def validate(node, min_val, max_val):
        if not node:
            return True
        
        if allow_left_duplicates:
            # Allow equals on left: left <= node < right
            if node.val < min_val or node.val >= max_val:
                return False
            return (validate(node.left, min_val, node.val) and
                    validate(node.right, node.val + 1, max_val))
        else:
            # Allow equals on right: left < node <= right
            if node.val <= min_val or node.val > max_val:
                return False
            return (validate(node.left, min_val, node.val - 1) and
                    validate(node.right, node.val, max_val))
    
    return validate(root, float('-inf'), float('inf'))

Problem 2: Find Violations

def findBSTViolations(root: TreeNode) -> list:
    """
    Find all nodes that violate BST property
    
    Returns: List of (node_val, reason) tuples
    """
    violations = []
    
    def validate(node, min_val, max_val):
        if not node:
            return True
        
        is_valid = True
        
        if node.val <= min_val:
            violations.append((node.val, f"Value {node.val} <= min_bound {min_val}"))
            is_valid = False
        
        if node.val >= max_val:
            violations.append((node.val, f"Value {node.val} >= max_bound {max_val}"))
            is_valid = False
        
        validate(node.left, min_val, node.val)
        validate(node.right, node.val, max_val)
        
        return is_valid
    
    validate(root, float('-inf'), float('inf'))
    return violations

# Example
root = TreeNode(10)
root.left = TreeNode(5)
root.right = TreeNode(15)
root.right.left = TreeNode(6)

violations = findBSTViolations(root)
for val, reason in violations:
    print(f"Violation: {reason}")

Problem 3: Largest BST Subtree

def largestBSTSubtree(root: TreeNode) -> int:
    """
    Find size of largest BST subtree
    
    Returns: Number of nodes in largest BST
    """
    def dfs(node):
        """
        Returns: (is_bst, size, min_val, max_val)
        """
        if not node:
            return (True, 0, float('inf'), float('-inf'))
        
        left_is_bst, left_size, left_min, left_max = dfs(node.left)
        right_is_bst, right_size, right_min, right_max = dfs(node.right)
        
        # Check if current tree is BST
        if (left_is_bst and right_is_bst and 
            left_max < node.val < right_min):
            # Current subtree is BST
            size = left_size + right_size + 1
            min_val = min(left_min, node.val)
            max_val = max(right_max, node.val)
            
            nonlocal max_bst_size
            max_bst_size = max(max_bst_size, size)
            
            return (True, size, min_val, max_val)
        else:
            # Not a BST, but children might contain BSTs
            return (False, 0, 0, 0)
    
    max_bst_size = 0
    dfs(root)
    return max_bst_size

# Example: Mixed tree
#       10
#      /  \
#     5   15
#    / \    \
#   1  8   7
#
# Largest BST: left subtree (5, 1, 8) with size 3

Connection to ML Systems

BST validation patterns appear in ML engineering:

1. Feature Validation

class FeatureValidator:
    """
    Validate feature values fall within expected ranges
    
    Similar to BST range validation
    """
    
    def __init__(self):
        self.feature_ranges = {}
    
    def set_range(self, feature_name, min_val, max_val):
        """Set expected range for feature"""
        self.feature_ranges[feature_name] = (min_val, max_val)
    
    def validate(self, features: dict) -> tuple:
        """
        Validate features fall within ranges
        
        Returns: (is_valid, violations)
        """
        violations = []
        
        for feature, value in features.items():
            if feature not in self.feature_ranges:
                continue
            
            min_val, max_val = self.feature_ranges[feature]
            
            if value < min_val or value > max_val:
                violations.append({
                    'feature': feature,
                    'value': value,
                    'expected_range': (min_val, max_val)
                })
        
        return len(violations) == 0, violations

# Usage
validator = FeatureValidator()
validator.set_range('age', 0, 120)
validator.set_range('income', 0, 1000000)

features = {'age': 25, 'income': 50000}
is_valid, violations = validator.validate(features)

if not is_valid:
    print(f"Invalid features: {violations}")

2. Model Prediction Bounds

class PredictionValidator:
    """
    Validate model predictions are reasonable
    
    Catches model failures early
    """
    
    def __init__(self, min_pred, max_pred):
        self.min_pred = min_pred
        self.max_pred = max_pred
        self.violations_count = 0
    
    def validate_batch(self, predictions):
        """
        Validate batch of predictions
        
        Returns: (valid_predictions, invalid_indices)
        """
        invalid_indices = []
        
        for i, pred in enumerate(predictions):
            if pred < self.min_pred or pred > self.max_pred:
                invalid_indices.append(i)
                self.violations_count += 1
        
        # Filter out invalid predictions
        valid_predictions = [
            pred for i, pred in enumerate(predictions)
            if i not in invalid_indices
        ]
        
        return valid_predictions, invalid_indices
    
    def get_violation_rate(self, total_predictions):
        """Calculate rate of invalid predictions"""
        return self.violations_count / total_predictions if total_predictions > 0 else 0

# Example: Probability predictions
validator = PredictionValidator(min_pred=0.0, max_pred=1.0)

predictions = [0.5, 0.8, 1.2, -0.1, 0.3]  # Contains invalid values
valid, invalid_idx = validator.validate_batch(predictions)

print(f"Valid predictions: {valid}")
print(f"Invalid indices: {invalid_idx}")

3. Decision Tree Structure Validation

class DecisionTreeValidator:
    """
    Validate decision tree structure
    
    Ensures tree is well-formed for inference
    """
    
    def validate_tree(self, node, depth=0, max_depth=100):
        """
        Validate decision tree node
        
        Checks:
        - Leaf nodes have predictions
        - Internal nodes have split conditions
        - No cycles (via depth limit)
        """
        if depth > max_depth:
            return False, f"Tree too deep (depth > {max_depth})"
        
        # Leaf node
        if not node.left and not node.right:
            if node.prediction is None:
                return False, "Leaf node missing prediction"
            return True, None
        
        # Internal node
        if node.feature is None or node.threshold is None:
            return False, "Internal node missing split condition"
        
        # Validate children
        if node.left:
            left_valid, left_err = self.validate_tree(node.left, depth + 1, max_depth)
            if not left_valid:
                return False, f"Left subtree: {left_err}"
        
        if node.right:
            right_valid, right_err = self.validate_tree(node.right, depth + 1, max_depth)
            if not right_valid:
                return False, f"Right subtree: {right_err}"
        
        return True, None

Testing

Comprehensive Test Suite

import unittest

class TestBSTValidation(unittest.TestCase):
    
    def test_valid_bst(self):
        """Test valid BST"""
        root = TreeNode(2)
        root.left = TreeNode(1)
        root.right = TreeNode(3)
        self.assertTrue(isValidBST(root))
    
    def test_invalid_bst(self):
        """Test invalid BST"""
        root = TreeNode(5)
        root.left = TreeNode(1)
        root.right = TreeNode(4)
        root.right.left = TreeNode(3)
        root.right.right = TreeNode(6)
        self.assertFalse(isValidBST(root))
    
    def test_single_node(self):
        """Test single node"""
        root = TreeNode(1)
        self.assertTrue(isValidBST(root))
    
    def test_empty_tree(self):
        """Test empty tree"""
        self.assertTrue(isValidBST(None))
    
    def test_left_skewed(self):
        """Test left-skewed tree"""
        root = TreeNode(4)
        root.left = TreeNode(3)
        root.left.left = TreeNode(2)
        root.left.left.left = TreeNode(1)
        self.assertTrue(isValidBST(root))
    
    def test_right_skewed(self):
        """Test right-skewed tree"""
        root = TreeNode(1)
        root.right = TreeNode(2)
        root.right.right = TreeNode(3)
        root.right.right.right = TreeNode(4)
        self.assertTrue(isValidBST(root))
    
    def test_duplicate_values(self):
        """Test duplicate values"""
        root = TreeNode(2)
        root.left = TreeNode(2)
        root.right = TreeNode(2)
        self.assertFalse(isValidBST(root))
    
    def test_extreme_values(self):
        """Test with INT_MIN and INT_MAX"""
        root = TreeNode(0)
        root.left = TreeNode(-2**31)
        root.right = TreeNode(2**31 - 1)
        self.assertTrue(isValidBST(root))

if __name__ == '__main__':
    unittest.main()

Performance Optimization

Early Termination

def isValidBST_optimized(root: TreeNode) -> bool:
    """
    Optimized with early termination
    
    Stop as soon as violation found
    """
    def validate(node, min_val, max_val):
        if not node:
            return True
        
        # Early termination
        if node.val <= min_val or node.val >= max_val:
            return False
        
        # Short-circuit evaluation: if left fails, don't check right
        return (validate(node.left, min_val, node.val) and
                validate(node.right, node.val, max_val))
    
    return validate(root, float('-inf'), float('inf'))

Performance Comparison

Benchmarking Different Approaches

import time
import sys

def benchmark_validation(approach_name, validation_fn, tree, iterations=1000):
    """Benchmark BST validation approach"""
    start = time.perf_counter()
    
    for _ in range(iterations):
        result = validation_fn(tree)
    
    end = time.perf_counter()
    avg_time_ms = (end - start) / iterations * 1000
    
    return avg_time_ms

# Create test trees
def create_balanced_tree(n):
    """Create balanced BST with n nodes"""
    if n == 0:
        return None
    
    mid = n // 2
    root = TreeNode(mid)
    
    def build(start, end):
        if start > end:
            return None
        mid = (start + end) // 2
        node = TreeNode(mid)
        node.left = build(start, mid - 1)
        node.right = build(mid + 1, end)
        return node
    
    return build(0, n - 1)

# Benchmark
tree_sizes = [100, 1000, 10000]
approaches = [
    ('Recursive Range', isValidBST),
    ('Inorder Iterative', isValidBST),  # replace with iterative alias if defined
    ('Inorder Recursive', isValidBST)   # replace with recursive inorder alias if defined
]

print("BST Validation Performance:")
print("-" * 60)
for size in tree_sizes:
    print(f"\nTree size: {size} nodes")
    tree = create_balanced_tree(size)
    
    for name, fn in approaches:
        time_ms = benchmark_validation(name, fn, tree, iterations=100)
        print(f"  {name:25s}: {time_ms:.3f}ms")

Typical results:

Tree size: 100 nodes
  Recursive Range          : 0.012ms
  Inorder Iterative        : 0.015ms
  Inorder Recursive        : 0.013ms

Tree size: 1000 nodes
  Recursive Range          : 0.125ms
  Inorder Iterative        : 0.148ms
  Inorder Recursive        : 0.132ms

Tree size: 10000 nodes
  Recursive Range          : 1.342ms
  Inorder Iterative        : 1.523ms
  Inorder Recursive        : 1.398ms

Analysis:

  • All O(n), linear scaling
  • Recursive range is fastest (fewer operations)
  • Iterative has overhead of stack management
  • Differences are small in practice

Interview Deep Dive

Common Follow-Up Questions

Q1: What if the tree allows duplicates?

def isValidBSTWithDuplicatesLeft(root: TreeNode) -> bool:
    """
    Valid if duplicates are allowed on left side
    
    Modified condition: left <= node < right
    """
    def validate(node, min_val, max_val):
        if not node:
            return True
        
        # Allow equal on left: node.val > min_val (not >=)
        # Strict right: node.val < max_val
        if node.val < min_val or node.val >= max_val:
            return False
        
        return (validate(node.left, min_val, node.val) and  # Allow <= node
                validate(node.right, node.val, max_val))
    
    return validate(root, float('-inf'), float('inf'))

def isValidBSTWithDuplicatesRight(root: TreeNode) -> bool:
    """
    Valid if duplicates are allowed on right side
    
    Modified condition: left < node <= right
    """
    def validate(node, min_val, max_val):
        if not node:
            return True
        
        # Strict left: node.val > min_val
        # Allow equal on right: node.val <= max_val
        if node.val <= min_val or node.val > max_val:
            return False
        
        return (validate(node.left, min_val, node.val - 1) and
                validate(node.right, node.val, max_val))  # Allow >= node
    
    return validate(root, float('-inf'), float('inf'))

Q2: Return the first invalid node instead of boolean?

def findFirstInvalidNode(root: TreeNode) -> TreeNode:
    """
    Find first node that violates BST property
    
    Returns: Invalid node or None
    """
    def validate(node, min_val, max_val):
        if not node:
            return None
        
        # Check current node
        if node.val <= min_val or node.val >= max_val:
            return node
        
        # Check left subtree first
        left_invalid = validate(node.left, min_val, node.val)
        if left_invalid:
            return left_invalid
        
        # Check right subtree
        right_invalid = validate(node.right, node.val, max_val)
        if right_invalid:
            return right_invalid
        
        return None
    
    return validate(root, float('-inf'), float('inf'))

# Example usage
root = TreeNode(10)
root.left = TreeNode(5)
root.right = TreeNode(15)
root.right.left = TreeNode(6)  # Invalid

invalid_node = findFirstInvalidNode(root)
if invalid_node:
    print(f"First invalid node: {invalid_node.val}")  # 6

Q3: What if nodes can be None?

# Already handled! Our code checks `if not node:` first
# This handles None values correctly

Q4: Can you do it in O(1) space?

def isValidBST_O1_space(root: TreeNode) -> bool:
    """
    Morris traversal for O(1) space
    
    Time: O(n), Space: O(1)
    """
    prev_val = float('-inf')
    current = root
    
    while current:
        if not current.left:
            # No left child, check current
            if current.val <= prev_val:
                return False
            prev_val = current.val
            current = current.right
        else:
            # Find predecessor
            predecessor = current.left
            while predecessor.right and predecessor.right != current:
                predecessor = predecessor.right
            
            if not predecessor.right:
                # Create thread
                predecessor.right = current
                current = current.left
            else:
                # Remove thread, check current
                predecessor.right = None
                if current.val <= prev_val:
                    return False
                prev_val = current.val
                current = current.right
    
    return True

Production Engineering Patterns

1. Cached Validation Results

from functools import lru_cache

class TreeWithValidationCache:
    """
    Tree with cached validation result
    
    Useful if tree is queried many times without modification
    """
    
    def __init__(self, root):
        self.root = root
        self._validation_result = None
        self._tree_hash = None
    
    def is_valid(self) -> bool:
        """
        Check if tree is valid BST (with caching)
        
        Returns: Cached result if tree hasn't changed
        """
        current_hash = self._compute_tree_hash()
        
        if current_hash == self._tree_hash and self._validation_result is not None:
            # Tree hasn't changed, return cached result
            return self._validation_result
        
        # Revalidate
        self._validation_result = isValidBST(self.root)
        self._tree_hash = current_hash
        
        return self._validation_result
    
    def _compute_tree_hash(self):
        """Compute hash of tree structure"""
        def hash_tree(node):
            if not node:
                return 0
            return hash((node.val, hash_tree(node.left), hash_tree(node.right)))
        
        return hash_tree(self.root)
    
    def invalidate_cache(self):
        """Call after modifying tree"""
        self._validation_result = None
        self._tree_hash = None

2. Incremental Validation

class IncrementalBSTValidator:
    """
    Validate BST incrementally as nodes are added
    
    More efficient than revalidating entire tree
    """
    
    def __init__(self):
        self.is_valid = True
        self.violations = []
    
    def validate_insertion(self, root, new_value, insertion_path):
        """
        Validate after inserting new_value
        
        Args:
            root: Tree root
            new_value: Value being inserted
            insertion_path: Path taken during insertion
                           e.g., ['L', 'R', 'L'] = left, right, left
        
        Returns: True if tree is still valid
        """
        # Reconstruct bounds along insertion path
        min_val = float('-inf')
        max_val = float('inf')
        current = root
        
        for direction in insertion_path:
            if direction == 'L':
                # Going left: update max_val
                max_val = current.val
                current = current.left
            else:  # 'R'
                # Going right: update min_val
                min_val = current.val
                current = current.right
        
        # Check if new_value violates bounds
        if new_value <= min_val or new_value >= max_val:
            self.is_valid = False
            self.violations.append({
                'value': new_value,
                'min_bound': min_val,
                'max_bound': max_val,
                'path': insertion_path
            })
            return False
        
        return True

# Usage
validator = IncrementalBSTValidator()

# Insert values and validate incrementally
def insert_and_validate(root, value, validator):
    """Insert value and validate incrementally"""
    path = []
    
    # Perform insertion (track path)
    def insert_with_path(node, val, path):
        if not node:
            return TreeNode(val)
        
        if val < node.val:
            path.append('L')
            node.left = insert_with_path(node.left, val, path)
        else:
            path.append('R')
            node.right = insert_with_path(node.right, val, path)
        
        return node
    
    root = insert_with_path(root, value, path)
    
    # Validate incrementally
    is_valid = validator.validate_insertion(root, value, path)
    
    return root, is_valid

3. Validation with Repair

def validateAndRepairBST(root: TreeNode) -> tuple:
    """
    Validate BST and attempt to fix violations
    
    Returns: (is_valid, repaired_tree, fixes_applied)
    """
    violations = []
    
    def find_violations(node, min_val, max_val):
        """Find all nodes that violate BST property"""
        if not node:
            return
        
        if node.val <= min_val or node.val >= max_val:
            violations.append({
                'node': node,
                'min_bound': min_val,
                'max_bound': max_val
            })
        
        find_violations(node.left, min_val, node.val)
        find_violations(node.right, node.val, max_val)
    
    # Find violations
    find_violations(root, float('-inf'), float('inf'))
    
    if not violations:
        return True, root, []
    
    # Attempt to repair by moving nodes
    fixes = []
    for v in violations:
        node = v['node']
        # Simple fix: clamp value to valid range
        old_val = node.val
        node.val = max(v['min_bound'] + 1, min(node.val, v['max_bound'] - 1))
        fixes.append({
            'node_old_val': old_val,
            'node_new_val': node.val,
            'bounds': (v['min_bound'], v['max_bound'])
        })
    
    # Revalidate
    is_valid_after = isValidBST(root)
    
    return is_valid_after, root, fixes

# Example
root = TreeNode(10)
root.left = TreeNode(5)
root.right = TreeNode(15)
root.right.left = TreeNode(6)  # Invalid

is_valid, repaired_root, fixes = validateAndRepairBST(root)
print(f"Valid after repair: {is_valid}")
print(f"Fixes applied: {fixes}")

Real-World Applications

Database Index Validation

class DatabaseIndexValidator:
    """
    Validate database B-tree index structure
    
    B-trees are generalized BSTs
    """
    
    def __init__(self, max_keys_per_node=4):
        self.max_keys_per_node = max_keys_per_node
    
    def validate_btree_node(self, node):
        """
        Validate B-tree node
        
        B-tree properties:
        1. Keys are sorted
        2. Number of keys <= max_keys_per_node
        3. For each key k, left subtree has values < k, right has values > k
        """
        if not node:
            return True
        
        # Check keys are sorted
        keys = node.keys
        if keys != sorted(keys):
            return False
        
        # Check number of keys
        if len(keys) > self.max_keys_per_node:
            return False
        
        # Check children (similar to BST validation)
        if node.children:
            for i, child in enumerate(node.children):
                # Determine bounds for child
                if i == 0:
                    # Leftmost child
                    if not self._validate_subtree(child, float('-inf'), keys[0]):
                        return False
                elif i == len(keys):
                    # Rightmost child
                    if not self._validate_subtree(child, keys[-1], float('inf')):
                        return False
                else:
                    # Middle child
                    if not self._validate_subtree(child, keys[i-1], keys[i]):
                        return False
        
        return True
    
    def _validate_subtree(self, node, min_val, max_val):
        """Validate all keys in subtree fall within range"""
        if not node:
            return True
        
        for key in node.keys:
            if key <= min_val or key >= max_val:
                return False
        
        # Recursively validate children
        return self.validate_btree_node(node)

ML Model Tree Structure Validation

class MLTreeValidator:
    """
    Validate ML model tree structures
    
    Decision trees, gradient boosting, etc.
    """
    
    def validate_decision_tree(self, node, feature_names=None):
        """
        Validate decision tree structure
        
        Checks:
        1. Leaf nodes have predictions
        2. Internal nodes have split conditions
        3. Feature indices are valid
        4. Thresholds are numeric
        """
        if node.is_leaf():
            # Leaf must have prediction
            if node.prediction is None:
                return False, "Leaf node missing prediction"
            return True, None
        
        # Internal node validation
        if node.feature_index is None:
            return False, "Internal node missing feature_index"
        
        if node.threshold is None:
            return False, "Internal node missing threshold"
        
        # Check feature index is valid
        if feature_names and node.feature_index >= len(feature_names):
            return False, f"Feature index {node.feature_index} out of range"
        
        # Check threshold is numeric
        if not isinstance(node.threshold, (int, float)):
            return False, "Threshold must be numeric"
        
        # Recursively validate children
        if node.left:
            left_valid, left_err = self.validate_decision_tree(node.left, feature_names)
            if not left_valid:
                return False, f"Left subtree: {left_err}"
        
        if node.right:
            right_valid, right_err = self.validate_decision_tree(node.right, feature_names)
            if not right_valid:
                return False, f"Right subtree: {right_err}"
        
        return True, None

Key Takeaways

Range validation is key - Each node must fall within [min, max] range
Inorder = sorted - BST inorder traversal produces sorted sequence
Check all descendants - Not just immediate children
Handle edge cases - Single node, duplicates, extreme values
O(n) time is optimal - Must visit all nodes
Three approaches - Recursive range, inorder, iterative
ML applications - Feature validation, prediction bounds, tree structure checks



Originally published at: arunbaby.com/dsa/0008-validate-binary-search-tree

If you found this helpful, consider sharing it with others who might benefit.