Binary Search
Master binary search to understand logarithmic algorithms and efficient searching, foundational for optimization and search systems.
Problem
Given an array of integers nums
which is sorted in ascending order, and an integer target
, write a function to search target
in nums
. If target
exists, then return its index. Otherwise, return -1
.
You must write an algorithm with O(log n) runtime complexity.
Example 1:
Input: nums = [-1,0,3,5,9,12], target = 9
Output: 4
Explanation: 9 exists in nums and its index is 4
Example 2:
Input: nums = [-1,0,3,5,9,12], target = 2
Output: -1
Explanation: 2 does not exist in nums so return -1
Constraints:
- 1 <= nums.length <= 10^4
- -10^4 < nums[i], target < 10^4
- All integers in
nums
are unique nums
is sorted in ascending order
Understanding Binary Search
The Core Idea
Binary search repeatedly divides the search space in half:
Array: [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
Target: 11
Step 1: Check middle (9)
[1, 3, 5, 7, 9] | 11 | [11, 13, 15, 17, 19]
11 > 9 → search right half
Step 2: Check middle of right half (15)
[11, 13] | 15 | [17, 19]
11 < 15 → search left half
Step 3: Check middle of left half (11)
Found! Index = 5
Key insight: Each comparison eliminates half of the remaining elements.
Why O(log n)?
With each step, we cut the search space in half:
- n elements → n/2 → n/4 → n/8 → … → 1
- Number of steps = log₂(n)
Array size Steps needed
10 4
100 7
1,000 10
1,000,000 20
1,000,000,000 30
Approach 1: Iterative Binary Search
Most common and recommended approach
def binarySearch(nums: list[int], target: int) -> int:
"""
Iterative binary search
Time: O(log n)
Space: O(1)
Args:
nums: Sorted array
target: Value to find
Returns:
Index of target or -1 if not found
"""
left = 0
right = len(nums) - 1
while left <= right:
# Calculate middle (avoids overflow)
mid = left + (right - left) // 2
if nums[mid] == target:
return mid
elif nums[mid] < target:
# Target is in right half
left = mid + 1
else:
# Target is in left half
right = mid - 1
# Target not found
return -1
# Test cases
print(binarySearch([-1, 0, 3, 5, 9, 12], 9)) # 4
print(binarySearch([-1, 0, 3, 5, 9, 12], 2)) # -1
print(binarySearch([5], 5)) # 0
Why mid = left + (right - left) // 2
?
# Simple but can overflow with large indices
mid = (left + right) // 2 # BAD: left + right might overflow
# Safe version
mid = left + (right - left) // 2 # GOOD: no overflow
# Example where overflow matters (in languages like Java):
# left = 2^30, right = 2^30
# left + right = 2^31 (overflow in 32-bit int!)
# left + (right - left) // 2 = safe
Loop Invariant
The target, if it exists, is always in the range [left, right]:
Initial: left=0, right=n-1
Target ∈ [0, n-1] ✓
After each iteration:
- If nums[mid] < target: left = mid + 1
Target > nums[mid], so Target ∈ [mid+1, right] ✓
- If nums[mid] > target: right = mid - 1
Target < nums[mid], so Target ∈ [left, mid-1] ✓
Termination: left > right
Search space is empty, target not found ✓
Approach 2: Recursive Binary Search
More elegant, uses call stack
def binarySearchRecursive(nums: list[int], target: int) -> int:
"""
Recursive binary search
Time: O(log n)
Space: O(log n) - recursion stack
"""
def search(left: int, right: int) -> int:
# Base case: search space is empty
if left > right:
return -1
# Calculate middle
mid = left + (right - left) // 2
# Found target
if nums[mid] == target:
return mid
# Recursively search appropriate half
if nums[mid] < target:
return search(mid + 1, right) # Search right
else:
return search(left, mid - 1) # Search left
return search(0, len(nums) - 1)
# Test
print(binarySearchRecursive([1, 2, 3, 4, 5], 3)) # 2
Recursion Tree
search([1,2,3,4,5,6,7,8,9], target=7)
├─ mid=5 (value=5), 7>5
└─ search([6,7,8,9])
├─ mid=7 (value=7)
└─ Found! Return 6
Approach 3: Python’s bisect Module
Production-ready implementation
import bisect
def binarySearchBuiltin(nums: list[int], target: int) -> int:
"""
Using Python's bisect module
bisect.bisect_left returns the insertion point
"""
idx = bisect.bisect_left(nums, target)
if idx < len(nums) and nums[idx] == target:
return idx
return -1
# Alternative: find insertion point
def findInsertPosition(nums: list[int], target: int) -> int:
"""Find position where target should be inserted"""
return bisect.bisect_left(nums, target)
# Test
nums = [1, 3, 5, 7, 9]
print(binarySearchBuiltin(nums, 5)) # 2
print(findInsertPosition(nums, 6)) # 3 (insert between 5 and 7)
Binary Search Variants
Variant 1: Find First Occurrence
def findFirst(nums: list[int], target: int) -> int:
"""
Find first occurrence of target
Example: [1, 2, 2, 2, 3], target=2 → return 1
"""
left, right = 0, len(nums) - 1
result = -1
while left <= right:
mid = left + (right - left) // 2
if nums[mid] == target:
result = mid
right = mid - 1 # Continue searching left
elif nums[mid] < target:
left = mid + 1
else:
right = mid - 1
return result
# Test
print(findFirst([1, 2, 2, 2, 3], 2)) # 1
Variant 2: Find Last Occurrence
def findLast(nums: list[int], target: int) -> int:
"""
Find last occurrence of target
Example: [1, 2, 2, 2, 3], target=2 → return 3
"""
left, right = 0, len(nums) - 1
result = -1
while left <= right:
mid = left + (right - left) // 2
if nums[mid] == target:
result = mid
left = mid + 1 # Continue searching right
elif nums[mid] < target:
left = mid + 1
else:
right = mid - 1
return result
# Test
print(findLast([1, 2, 2, 2, 3], 2)) # 3
Variant 3: Search in Rotated Sorted Array
def searchRotated(nums: list[int], target: int) -> int:
"""
Search in rotated sorted array
Example: [4,5,6,7,0,1,2], target=0 → return 4
The array was originally [0,1,2,4,5,6,7] and rotated
"""
left, right = 0, len(nums) - 1
while left <= right:
mid = left + (right - left) // 2
if nums[mid] == target:
return mid
# Determine which half is sorted
if nums[left] <= nums[mid]:
# Left half is sorted
if nums[left] <= target < nums[mid]:
right = mid - 1 # Target in left half
else:
left = mid + 1 # Target in right half
else:
# Right half is sorted
if nums[mid] < target <= nums[right]:
left = mid + 1 # Target in right half
else:
right = mid - 1 # Target in left half
return -1
# Test
print(searchRotated([4,5,6,7,0,1,2], 0)) # 4
Variant 4: Find Peak Element
def findPeakElement(nums: list[int]) -> int:
"""
Find peak element (greater than neighbors)
Example: [1,2,3,1] → return 2 (index of 3)
Time: O(log n)
"""
left, right = 0, len(nums) - 1
while left < right:
mid = left + (right - left) // 2
if nums[mid] > nums[mid + 1]:
# Peak is in left half (or mid is peak)
right = mid
else:
# Peak is in right half
left = mid + 1
return left
# Test
print(findPeakElement([1, 2, 3, 1])) # 2
print(findPeakElement([1, 2, 1, 3, 5])) # 4 (index of 5)
Variant 5: Square Root (Binary Search on Answer)
def mySqrt(x: int) -> int:
"""
Find square root (floor value)
Example: x=8 → return 2 (since 2² = 4 < 8 < 9 = 3²)
Binary search on the answer!
"""
if x < 2:
return x
left, right = 1, x // 2
while left <= right:
mid = left + (right - left) // 2
square = mid * mid
if square == x:
return mid
elif square < x:
left = mid + 1
else:
right = mid - 1
return right # Floor value
# Test
print(mySqrt(4)) # 2
print(mySqrt(8)) # 2
print(mySqrt(16)) # 4
Common Mistakes
Mistake 1: Wrong Loop Condition
# WRONG: Misses single element case
while left < right: # Should be left <= right
mid = left + (right - left) // 2
# ...
# Example where it fails:
nums = [5], target = 5
# left=0, right=0, loop doesn't execute!
Mistake 2: Infinite Loop
# WRONG: Can cause infinite loop
while left < right:
mid = (left + right) // 2
if nums[mid] < target:
left = mid # Should be mid + 1!
else:
right = mid - 1
Mistake 3: Integer Overflow (Other Languages)
# In Python, no overflow, but in Java/C++:
mid = (left + right) / 2 # Can overflow
# Better:
mid = left + (right - left) / 2
Mistake 4: Off-by-One Errors
# WRONG: Initial right value
right = len(nums) # Should be len(nums) - 1
# Or need to adjust loop condition:
while left < right: # Not left <= right
Edge Cases
Test Suite
import unittest
class TestBinarySearch(unittest.TestCase):
def test_empty_array(self):
"""Test with empty array"""
self.assertEqual(binarySearch([], 5), -1)
def test_single_element_found(self):
"""Test single element - found"""
self.assertEqual(binarySearch([5], 5), 0)
def test_single_element_not_found(self):
"""Test single element - not found"""
self.assertEqual(binarySearch([5], 3), -1)
def test_first_element(self):
"""Test target is first element"""
self.assertEqual(binarySearch([1, 2, 3, 4, 5], 1), 0)
def test_last_element(self):
"""Test target is last element"""
self.assertEqual(binarySearch([1, 2, 3, 4, 5], 5), 4)
def test_middle_element(self):
"""Test target is middle element"""
self.assertEqual(binarySearch([1, 2, 3, 4, 5], 3), 2)
def test_not_found_smaller(self):
"""Test target smaller than all elements"""
self.assertEqual(binarySearch([5, 6, 7, 8], 2), -1)
def test_not_found_larger(self):
"""Test target larger than all elements"""
self.assertEqual(binarySearch([1, 2, 3, 4], 10), -1)
def test_not_found_middle(self):
"""Test target in middle but not present"""
self.assertEqual(binarySearch([1, 3, 5, 7, 9], 6), -1)
def test_negative_numbers(self):
"""Test with negative numbers"""
self.assertEqual(binarySearch([-5, -3, -1, 0, 2], -3), 1)
def test_duplicates(self):
"""Test with duplicates (finds any occurrence)"""
result = binarySearch([1, 2, 2, 2, 3], 2)
self.assertIn(result, [1, 2, 3])
def test_large_array(self):
"""Test with large array"""
nums = list(range(1000000))
self.assertEqual(binarySearch(nums, 999999), 999999)
if __name__ == '__main__':
unittest.main()
Performance Analysis
Time Complexity Proof
T(n) = T(n/2) + O(1)
By Master Theorem:
a = 1, b = 2, f(n) = O(1)
log_b(a) = log_2(1) = 0
f(n) = O(n^0) = O(1)
Therefore: T(n) = O(log n)
Comparison with Linear Search
import time
import random
def benchmark_search():
"""Compare binary vs linear search"""
sizes = [100, 1000, 10000, 100000, 1000000]
print("Size Binary Linear")
print("-" * 40)
for size in sizes:
nums = list(range(size))
target = size - 1 # Worst case for linear
# Binary search
start = time.perf_counter()
for _ in range(1000):
binarySearch(nums, target)
binary_time = time.perf_counter() - start
# Linear search
start = time.perf_counter()
for _ in range(1000):
nums.index(target)
linear_time = time.perf_counter() - start
print(f"{size:7d} {binary_time:8.4f}s {linear_time:8.4f}s")
# Example output:
# Size Binary Linear
# ----------------------------------------
# 100 0.0003s 0.0010s
# 1000 0.0004s 0.0095s
# 10000 0.0005s 0.0950s
# 100000 0.0006s 0.9500s
# 1000000 0.0007s 9.5000s
Connection to ML Systems
Binary search patterns appear throughout ML engineering:
1. Hyperparameter Tuning
import copy
def find_optimal_learning_rate(model, train_fn, validate_fn,
min_lr=1e-6, max_lr=1.0):
"""
Binary search for optimal learning rate
Similar to binary search on answer space
"""
best_lr = min_lr
best_loss = float('inf')
left, right = min_lr, max_lr
while right - left > 1e-7:
mid = (left + right) / 2
# Train with this learning rate
model_copy = copy.deepcopy(model)
train_fn(model_copy, learning_rate=mid)
loss = validate_fn(model_copy)
if loss < best_loss:
best_loss = loss
best_lr = mid
# Adjust search space based on loss gradient
# (simplified - real implementation would be more sophisticated)
if loss > best_loss * 1.1:
right = mid
else:
left = mid
return best_lr
2. Threshold Optimization
def find_optimal_threshold(y_true, y_pred_proba, metric='f1'):
"""
Ternary search for optimal classification threshold (smooth metric)
Finds threshold that maximizes given metric
"""
from sklearn.metrics import f1_score, precision_score, recall_score
def evaluate_threshold(threshold):
y_pred = (y_pred_proba >= threshold).astype(int)
if metric == 'f1':
return f1_score(y_true, y_pred)
elif metric == 'precision':
return precision_score(y_true, y_pred)
elif metric == 'recall':
return recall_score(y_true, y_pred)
# Ternary search across [0, 1]
left, right = 0.0, 1.0
best_threshold = 0.5
best_score = evaluate_threshold(0.5)
# Sample points and use binary search logic
for _ in range(20): # 20 iterations for precision
mid1 = left + (right - left) / 3
mid2 = right - (right - left) / 3
score1 = evaluate_threshold(mid1)
score2 = evaluate_threshold(mid2)
if score1 > best_score:
best_score = score1
best_threshold = mid1
if score2 > best_score:
best_score = score2
best_threshold = mid2
# Ternary search logic
if score1 > score2:
right = mid2
else:
left = mid1
return best_threshold, best_score
# Usage
y_true = np.array([0, 1, 1, 0, 1, 0, 1, 1])
y_pred_proba = np.array([0.2, 0.8, 0.6, 0.3, 0.9, 0.1, 0.7, 0.55])
threshold, score = find_optimal_threshold(y_true, y_pred_proba, metric='f1')
print(f"Optimal threshold: {threshold:.3f}, F1 score: {score:.3f}")
3. Model Version Search
import bisect
class ModelVersionSelector:
"""
Binary search through model versions to find regression point
Similar to git bisect
"""
def __init__(self, versions):
"""
Args:
versions: List of model versions sorted by timestamp
"""
self.versions = versions
def find_regression(self, test_fn):
"""
Find first version where test fails
Args:
test_fn: Function that tests model, returns True if passes
Returns:
First failing version or None
"""
left, right = 0, len(self.versions) - 1
first_bad = None
while left <= right:
mid = left + (right - left) // 2
version = self.versions[mid]
print(f"Testing version {version}...")
if test_fn(version):
# This version is good, search right
left = mid + 1
else:
# This version is bad, could be first bad
first_bad = version
right = mid - 1
return first_bad
# Usage
versions = ['v1.0', 'v1.1', 'v1.2', 'v1.3', 'v1.4', 'v1.5']
def test_model(version):
"""Test if model version performs well"""
# Load model and run tests
accuracy = evaluate_model(version)
return accuracy >= 0.95
selector = ModelVersionSelector(versions)
bad_version = selector.find_regression(test_model)
print(f"Regression introduced in: {bad_version}")
Production Patterns
1. Bisect for Bucketing
class FeatureBucketer:
"""
Bucket continuous features using binary search
Faster than linear scan for many buckets
"""
def __init__(self, bucket_boundaries):
"""
Args:
bucket_boundaries: Sorted list of bucket boundaries
e.g., [0, 10, 50, 100, 500, 1000]
"""
self.boundaries = sorted(bucket_boundaries)
def get_bucket(self, value):
"""
Find bucket for value using binary search
Time: O(log b) where b = number of buckets
Returns:
Bucket index
"""
import bisect
return bisect.bisect_left(self.boundaries, value)
def bucket_features(self, values):
"""Bucket array of values"""
return [self.get_bucket(v) for v in values]
# Usage
bucketer = FeatureBucketer([0, 1000, 5000, 10000, 50000, 100000])
# Bucket user income
incomes = [500, 2000, 7500, 15000, 75000, 200000]
buckets = bucketer.bucket_features(incomes)
print(buckets) # [0, 1, 2, 3, 4, 5]
2. Cache with Binary Search
import bisect
class SortedCache:
"""
Cache with binary search for fast lookups
Useful when keys are numeric and can be sorted
"""
def __init__(self, max_size=1000):
self.keys = []
self.values = []
self.max_size = max_size
def get(self, key):
"""
Get value with binary search
Time: O(log n)
"""
idx = bisect.bisect_left(self.keys, key)
if idx < len(self.keys) and self.keys[idx] == key:
return self.values[idx]
return None
def put(self, key, value):
"""
Insert key-value pair maintaining sorted order
Time: O(n) for insertion, but O(log n) lookups
"""
idx = bisect.bisect_left(self.keys, key)
if idx < len(self.keys) and self.keys[idx] == key:
# Key exists, update value
self.values[idx] = value
else:
# Insert new key-value pair
self.keys.insert(idx, key)
self.values.insert(idx, value)
# Evict if over capacity (remove oldest)
if len(self.keys) > self.max_size:
self.keys.pop(0)
self.values.pop(0)
# Usage for caching predictions
cache = SortedCache()
def get_prediction_cached(user_id, model):
"""Get prediction with caching"""
# Check cache
cached = cache.get(user_id)
if cached is not None:
return cached
# Compute prediction
prediction = model.predict([user_id])
# Cache result
cache.put(user_id, prediction)
return prediction
Advanced Applications
1. Exponential Search
When search space is unbounded
def exponential_search(arr, target):
"""
Exponential search for unbounded/infinite arrays
Step 1: Find range where target might exist
Step 2: Binary search in that range
Time: O(log n) where n is position of target
"""
if not arr:
return -1
# Check first element
if arr[0] == target:
return 0
# Find range by repeatedly doubling index
i = 1
while i < len(arr) and arr[i] <= target:
i *= 2
# Binary search in range [i//2, min(i, len(arr)-1)]
left = i // 2
right = min(i, len(arr) - 1)
while left <= right:
mid = left + (right - left) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
left = mid + 1
else:
right = mid - 1
return -1
# Test
print(exponential_search([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 7)) # 6
2. Interpolation Search
Faster than binary search for uniformly distributed data
def interpolation_search(arr, target):
"""
Interpolation search
Instead of middle, guess position based on value
Time: O(log log n) for uniformly distributed data
O(n) worst case
"""
left, right = 0, len(arr) - 1
while left <= right and arr[left] <= target <= arr[right]:
# Calculate interpolated position
if arr[left] == arr[right]:
if arr[left] == target:
return left
return -1
# Interpolation formula
pos = left + int(
(target - arr[left]) / (arr[right] - arr[left]) * (right - left)
)
if arr[pos] == target:
return pos
elif arr[pos] < target:
left = pos + 1
else:
right = pos - 1
return -1
# Test - works best on uniformly distributed data
arr = [1, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
print(interpolation_search(arr, 70)) # 7
3. Ternary Search
For unimodal functions (single peak/valley)
def ternary_search(f, left, right, epsilon=1e-9):
"""
Ternary search for finding maximum of unimodal function
Divides search space into 3 parts
Time: O(log3 n) ≈ O(log n)
Args:
f: Unimodal function
left: Left bound
right: Right bound
epsilon: Precision
Returns:
x that maximizes f(x)
"""
while right - left > epsilon:
# Two midpoints
mid1 = left + (right - left) / 3
mid2 = right - (right - left) / 3
if f(mid1) < f(mid2):
# Maximum is in [mid1, right]
left = mid1
else:
# Maximum is in [left, mid2]
right = mid2
return (left + right) / 2
# Example: Find maximum of parabola
def f(x):
return -(x - 5)**2 + 10 # Maximum at x=5
max_x = ternary_search(f, 0, 10)
print(f"Maximum at x = {max_x:.6f}, f(x) = {f(max_x):.6f}")
Binary Search in 2D
Search in 2D Matrix
def searchMatrix(matrix, target):
"""
Search in row-wise and column-wise sorted matrix
Example:
[
[1, 4, 7, 11],
[2, 5, 8, 12],
[3, 6, 9, 16],
[10, 13, 14, 17]
]
Approach: Start from top-right corner
- If target < current: go left
- If target > current: go down
Time: O(m + n)
"""
if not matrix or not matrix[0]:
return False
m, n = len(matrix), len(matrix[0])
row, col = 0, n - 1
while row < m and col >= 0:
if matrix[row][col] == target:
return True
elif matrix[row][col] > target:
col -= 1 # Go left
else:
row += 1 # Go down
return False
# Test
matrix = [
[1, 4, 7, 11],
[2, 5, 8, 12],
[3, 6, 9, 16],
[10, 13, 14, 17]
]
print(searchMatrix(matrix, 5)) # True
print(searchMatrix(matrix, 20)) # False
K-th Smallest in Sorted Matrix
def kthSmallest(matrix, k):
"""
Find k-th smallest element in sorted matrix
Binary search on value range, not index!
Time: O(n * log(max - min))
"""
n = len(matrix)
left, right = matrix[0][0], matrix[n-1][n-1]
def count_less_equal(mid):
"""Count elements <= mid"""
count = 0
row, col = n - 1, 0
while row >= 0 and col < n:
if matrix[row][col] <= mid:
count += row + 1
col += 1
else:
row -= 1
return count
# Binary search on value
while left < right:
mid = left + (right - left) // 2
if count_less_equal(mid) < k:
left = mid + 1
else:
right = mid
return left
# Test
matrix = [
[1, 5, 9],
[10, 11, 13],
[12, 13, 15]
]
print(kthSmallest(matrix, 8)) # 13
Interview Strategies
Problem Recognition
When to use binary search:
- Sorted array - Classic binary search
- Monotonic function - Search on answer space
- Find boundary - First/last occurrence
- Minimize/maximize - Optimization problems
- Search space reducible - Can eliminate half each time
Common Patterns
Pattern 1: Find exact value
while left <= right:
mid = left + (right - left) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
left = mid + 1
else:
right = mid - 1
return -1
Pattern 2: Find lower bound (first >= target)
while left < right:
mid = left + (right - left) // 2
if arr[mid] < target:
left = mid + 1
else:
right = mid
return left
Pattern 3: Find upper bound (first > target)
while left < right:
mid = left + (right - left) // 2
if arr[mid] <= target:
left = mid + 1
else:
right = mid
return left
Debug Checklist
# Before submitting, check:
✓ Loop condition: left <= right or left < right?
✓ Mid calculation: avoiding overflow?
✓ Update: left = mid + 1 or left = mid?
✓ Update: right = mid - 1 or right = mid?
✓ Return value: left, right, or -1?
✓ Edge cases: empty array, single element?
✓ Infinite loop: does search space always shrink?
Production Code Examples
Binary Search with Logging
import logging
from typing import List, Optional
class ProductionBinarySearch:
"""
Production-ready binary search with logging and error handling
"""
def __init__(self):
self.logger = logging.getLogger(__name__)
def search(self, nums: List[int], target: int) -> Optional[int]:
"""
Search for target in sorted array
Returns:
Index of target or None if not found
Raises:
ValueError: If array is not sorted
"""
# Validate input
if not nums:
self.logger.warning("Empty array provided")
return None
# Check if sorted (optional, expensive for large arrays)
if not self._is_sorted(nums):
self.logger.error("Array is not sorted")
raise ValueError("Array must be sorted")
# Binary search
left, right = 0, len(nums) - 1
iterations = 0
while left <= right:
iterations += 1
mid = left + (right - left) // 2
self.logger.debug(
f"Iteration {iterations}: left={left}, right={right}, "
f"mid={mid}, value={nums[mid]}"
)
if nums[mid] == target:
self.logger.info(
f"Found target {target} at index {mid} "
f"after {iterations} iterations"
)
return mid
elif nums[mid] < target:
left = mid + 1
else:
right = mid - 1
self.logger.info(
f"Target {target} not found after {iterations} iterations"
)
return None
def _is_sorted(self, nums: List[int]) -> bool:
"""Check if array is sorted"""
for i in range(1, len(nums)):
if nums[i] < nums[i-1]:
return False
return True
# Usage
searcher = ProductionBinarySearch()
result = searcher.search([1, 3, 5, 7, 9], 7)
Thread-Safe Binary Search Cache
import threading
from bisect import bisect_left
class ThreadSafeSortedCache:
"""
Thread-safe cache with binary search lookups
Useful for high-concurrency environments
"""
def __init__(self):
self.keys = []
self.values = []
self.lock = threading.RLock()
def get(self, key):
"""
Thread-safe get with binary search
Time: O(log n)
"""
with self.lock:
if not self.keys:
return None
idx = bisect_left(self.keys, key)
if idx < len(self.keys) and self.keys[idx] == key:
return self.values[idx]
return None
def put(self, key, value):
"""
Thread-safe put maintaining sorted order
Time: O(n) for insertion
"""
with self.lock:
idx = bisect_left(self.keys, key)
if idx < len(self.keys) and self.keys[idx] == key:
# Update existing key
self.values[idx] = value
else:
# Insert new key-value
self.keys.insert(idx, key)
self.values.insert(idx, value)
def range_query(self, start_key, end_key):
"""
Get all values in key range [start_key, end_key]
Time: O(log n + k) where k is number of results
"""
with self.lock:
start_idx = bisect_left(self.keys, start_key)
end_idx = bisect_right(self.keys, end_key)
return list(zip(
self.keys[start_idx:end_idx],
self.values[start_idx:end_idx]
))
# Usage
cache = ThreadSafeSortedCache()
# Thread-safe operations
cache.put(10, "value10")
cache.put(5, "value5")
cache.put(15, "value15")
print(cache.get(10)) # "value10"
print(cache.range_query(5, 15)) # All values in range
Real-World Case Studies
Case Study 1: Netflix Content Search
class NetflixContentSearch:
"""
Simplified Netflix content search using binary search
Real system uses more sophisticated indexing
"""
def __init__(self, content_library):
"""
Args:
content_library: List of (timestamp, content_id) tuples
Sorted by timestamp
"""
self.timestamps = [item[0] for item in content_library]
self.content_ids = [item[1] for item in content_library]
def find_content_at_time(self, target_time):
"""
Find content released at or just before target_time
Example: User wants to see content from 2020
Returns most recent content <= 2020
"""
idx = bisect.bisect_right(self.timestamps, target_time) - 1
if idx >= 0:
return self.content_ids[idx]
return None
def find_content_in_range(self, start_time, end_time):
"""
Find all content released in time range
Example: All shows from 2019-2021
"""
start_idx = bisect.bisect_left(self.timestamps, start_time)
end_idx = bisect.bisect_right(self.timestamps, end_time)
return self.content_ids[start_idx:end_idx]
# Usage
library = [
(2018, "content1"),
(2019, "content2"),
(2020, "content3"),
(2021, "content4"),
(2022, "content5")
]
search = NetflixContentSearch(library)
print(search.find_content_at_time(2020)) # content3
print(search.find_content_in_range(2019, 2021)) # [content2, content3, content4]
Case Study 2: Database Query Optimization
class DatabaseIndexSearch:
"""
Binary search on database index
Similar to B-tree index lookups
"""
def __init__(self, index_pages):
"""
Args:
index_pages: List of (key, page_id) tuples
"""
self.index = sorted(index_pages, key=lambda x: x[0])
def find_page(self, search_key):
"""
Find page containing search_key
Returns page_id for further disk I/O
"""
left, right = 0, len(self.index) - 1
result_page = None
while left <= right:
mid = left + (right - left) // 2
key, page_id = self.index[mid]
if key <= search_key:
result_page = page_id
left = mid + 1
else:
right = mid - 1
return result_page
def estimate_io_cost(self, search_key):
"""
Estimate I/O cost of query
Binary search reduces disk reads from O(n) to O(log n)
"""
page_id = self.find_page(search_key)
# In real database, would calculate actual I/O cost
index_ios = int(np.log2(len(self.index))) + 1
data_ios = 1 # One page read for data
return {
'index_ios': index_ios,
'data_ios': data_ios,
'total_ios': index_ios + data_ios,
'page_id': page_id
}
# Usage
index = [(10, 'page1'), (20, 'page2'), (30, 'page3'), (40, 'page4')]
db = DatabaseIndexSearch(index)
cost = db.estimate_io_cost(25)
print(f"Query cost: {cost['total_ios']} I/O operations")
Performance Profiling
Benchmark Suite
import time
import random
import numpy as np
import matplotlib.pyplot as plt
def benchmark_search_algorithms():
"""
Comprehensive benchmark of search algorithms
"""
sizes = [100, 1000, 10000, 100000, 1000000]
results = {
'binary': [],
'linear': [],
'exponential': [],
'interpolation': []
}
for size in sizes:
# Create sorted array
arr = list(range(size))
target = size - 1 # Worst case
# Benchmark binary search
start = time.perf_counter()
for _ in range(1000):
binarySearch(arr, target)
results['binary'].append(time.perf_counter() - start)
# Benchmark linear search
start = time.perf_counter()
for _ in range(1000):
arr.index(target)
results['linear'].append(time.perf_counter() - start)
# Benchmark exponential search
start = time.perf_counter()
for _ in range(1000):
exponential_search(arr, target)
results['exponential'].append(time.perf_counter() - start)
# Benchmark interpolation search
start = time.perf_counter()
for _ in range(1000):
interpolation_search(arr, target)
results['interpolation'].append(time.perf_counter() - start)
# Plot results
plt.figure(figsize=(10, 6))
for algo, times in results.items():
plt.plot(sizes, times, marker='o', label=algo)
plt.xlabel('Array Size')
plt.ylabel('Time (seconds for 1000 iterations)')
plt.title('Search Algorithm Performance Comparison')
plt.legend()
plt.grid(True)
plt.xscale('log')
plt.yscale('log')
plt.savefig('search_benchmark.png')
return results
Key Takeaways
✅ O(log n) is powerful - Scales to billions of elements
✅ Requires sorted data - Worth the sorting cost for multiple searches
✅ Two pointers pattern - Left and right converge to answer
✅ Watch for edge cases - Empty arrays, single elements, boundaries
✅ Many variants - First/last occurrence, rotated arrays, search on answer
✅ ML applications - Hyperparameter tuning, threshold optimization, bucketing
✅ Production use - Feature bucketing, caching, version selection
Related Problems
- Search Insert Position - Find insertion index
- Find First and Last Position - Range search
- Search in Rotated Sorted Array - Modified binary search
- Find Minimum in Rotated Sorted Array - Find pivot
- Find Peak Element - Local maximum
- Sqrt(x) - Binary search on answer
- Koko Eating Bananas - Binary search on speed
Originally published at: arunbaby.com/dsa/0009-binary-search
If you found this helpful, consider sharing it with others who might benefit.