Ranking Systems at Scale

23 minute read

How does Google search 50 billion pages in 0.1 seconds? The answer is the “Ranking Funnel”.

1. Problem Definition

Goal: Given a user query (or context) and a massive corpus of items (1B+), return the top (K) most relevant items sorted by relevance.

Constraints:

Latency: < 200ms (P99).
Throughput: 100k QPS.
Freshness: New items should be searchable within minutes.

2. High-Level Architecture: The Funnel

We cannot score 1 billion items with a heavy BERT model. We use a Multi-Stage Cascade.

[Corpus: 1 Billion Items]
       |
       v
[Stage 1: Retrieval / Candidate Generation] -> Selects Top 10,000
       |  (Fast, Simple, High Recall)
       v
[Stage 2: L1 Ranking (Lightweight)] --------> Selects Top 500
       |  (Two-Tower, Logistic Regression)
       v
[Stage 3: L2 Ranking (Heavyweight)] --------> Selects Top 10
       |  (BERT, LambdaMART, DLRM)
       v
[Stage 4: Re-Ranking / Blending] -----------> Final Output
          (Diversity, Business Logic)

3. Stage 1: Retrieval (Candidate Generation)

Objective: High Recall. Don’t miss the relevant items. Methods:

Inverted Index (BM25): Standard keyword search.
Vector Search (ANN): Embed query and items. Use Faiss/ScaNN to find nearest neighbors.
Collaborative Filtering: “Users who bought X also bought Y”.

4. Stage 2: L1 Ranking (The Two-Tower Model)

Objective: Filter 10k -> 500. Model: Two-Tower Neural Network (Bi-Encoder).

   [User Features]          [Item Features]
         |                        |
    [Deep Net]               [Deep Net]
         |                        |
    [User Vector U]          [Item Vector V]
         \                      /
          \                    /
           \                  /
            Dot Product (U . V)
                    |
                  Score

Why Two-Tower?

Cacheable: Item vectors (V) can be precomputed and stored in a Vector DB.
Fast: Inference is just a Dot Product.

5. Stage 3: L2 Ranking (The Heavy Lifter)

Objective: Precision. Order the top 500 perfectly. Model: Cross-Encoder (BERT) or LambdaMART (GBDT).

Feature Interaction: Unlike Two-Tower, Cross-Encoders feed User and Item features together into the network.

Input: [User_Age, Item_Price, User_History, Item_Title]
The network learns interactions: “Young users like cheap items”.

Loss Functions:

Pointwise: RMSE / LogLoss. (Predict “Will click?”).
Pairwise: RankNet / LambdaRank. (Predict “Is A better than B?”).
Listwise: LambdaMART / Softmax. (Optimize NDCG directly).

6. Stage 4: Re-Ranking & Blending

The raw scores aren’t enough. We need:

Diversity: Don’t show 10 shoes. Show shoes, socks, and laces. (MMR Algorithm).
Freshness: Boost new items.
Business Logic: Boost sponsored items (Ads).

Deep Dive: Approximate Nearest Neighbors (ANN)

In Stage 1 (Retrieval), we need to find the top 1000 items closest to query vector (Q) among 1 billion item vectors (V). Brute force scan is (O(N)). Too slow. We use ANN algorithms.

1. HNSW (Hierarchical Navigable Small World)

Structure: A multi-layered graph.
Top Layer: Sparse long-range links (Highways).
Bottom Layer: Dense short-range links.
Search: Start at the top, greedily move closer to the target, drop down a layer, repeat.
Pros: Extremely fast, high recall.
Cons: High memory usage (stores the graph).

2. IVF (Inverted File Index)

Training: Cluster the vector space into (C) centroids (using K-Means).
Indexing: Assign each item to its closest centroid.
Search: Find the closest centroids to (Q), then scan only the items in those clusters.
Pros: Low memory (can be compressed with Product Quantization).

3. ScaNN (Google)

Optimizes the Anisotropic Quantization loss.
State-of-the-art performance for inner-product search (which is what we need for Dot Product).

Deep Dive: Training the Two-Tower Model

How do we train the L1 Ranker? We don’t have explicit “negative” labels (items the user didn’t like). We only have “clicks” (positives).

Negative Sampling: For every positive pair ((U, I^+)), we sample (K) negative items (I^-).

Random Negatives: Pick random items from the catalog. (Easy, but too easy for the model).
Hard Negatives: Pick items that the model thought were good but the user didn’t click. (Harder, learns better boundaries).
In-Batch Negatives: Use the positives from other users in the same batch as negatives for the current user. (Efficient, GPU friendly).

Loss Function (Softmax Cross-Entropy): [ L = -\log \frac{\exp(U \cdot I^+)}{\exp(U \cdot I^+) + \sum \exp(U \cdot I^-)} ]

Deep Dive: Learning to Rank (LTR) Loss Functions

In Stage 3 (L2 Ranking), we care about the order.

1. Pointwise (RMSE / Sigmoid)

Treats each item independently.
“Predict the probability of click for Item A”.
Problem: Doesn’t care if Item A > Item B, only about the absolute score.

2. Pairwise (RankNet / LambdaRank)

Takes pairs ((A, B)) where A is clicked and B is not.
Minimizes loss if (Score(A) < Score(B)).
Problem: Optimizes the number of inversions, not the position (NDCG). An error at rank 100 is treated same as error at rank 1.

3. Listwise (LambdaMART)

LambdaMART is a Gradient Boosted Decision Tree (GBDT) method.
It modifies the gradients of the pairwise loss by weighting them with the change in NDCG ((\Delta NDCG)).
Intuition: If swapping A and B causes a huge drop in NDCG, the gradient should be huge.
Status: Still the SOTA for tabular features in L2 ranking.

Deep Dive: DLRM (Deep Learning Recommendation Model)

Facebook open-sourced DLRM. It’s the standard for L2 Ranking.

Architecture:

Sparse Features (Categorical): Mapped to dense embeddings.
Dense Features (Numerical): Processed by an MLP (Bottom MLP).
Interaction: Dot Product between embeddings and MLP output.
Top MLP: Takes the interaction output and predicts CTR.

Why DLRM? It explicitly models the interaction between sparse features (User ID, Item ID) and dense features (Age, Price). It is optimized for training on massive clusters (Model Parallelism for embeddings, Data Parallelism for MLPs).

Deep Dive: The Feature Store (Feast)

In Stage 4 (Feature Fetching), we need low latency. We cannot run SQL queries on a Data Warehouse (Snowflake/BigQuery) in real-time.

The Feature Store Solution:

Offline Store: (S3/Parquet). Used for training. Contains months of history.
Online Store: (Redis/DynamoDB). Used for inference. Contains only the latest values.
Consistency: The Feature Store ensures that the Offline and Online stores are in sync (Point-in-time correctness).

Workflow:

Data Engineering pipeline computes user_last_50_clicks.
Pushes to Offline Store (for training tomorrow’s model).
Pushes to Online Store (for serving today’s traffic).

Deep Dive: A/B Testing for Ranking

How do we know the new model is better? We run an Interleaving Experiment.

Standard A/B Test:

Group A sees Model A results.
Group B sees Model B results.
Compare CTR.
Problem: High variance. Users in Group A might just be happier people.

Interleaving:

Show every user a mix of Model A and Model B results.
Result = [A1, B1, A2, B2, ...]
If user clicks A1, Model A gets a point.
If user clicks B1, Model B gets a point.
Pros: Removes user variance. 100x more sensitive than standard A/B tests.

Deep Dive: Real-time Inference (Triton)

Serving a BERT model in 50ms is hard. NVIDIA Triton Inference Server:

Dynamic Batching: Groups incoming requests into a batch to maximize GPU utilization.
Model Ensembling: Runs Preprocessing (Python) -> Model (TensorRT) -> Postprocessing (Python) in a single pipeline.
Concurrent Execution: Runs multiple models on the same GPU.

Deep Dive: DLRM (Deep Learning Recommendation Model)

Facebook open-sourced DLRM. It’s the standard for L2 Ranking.

Architecture:

Sparse Features (Categorical): Mapped to dense embeddings.
Dense Features (Numerical): Processed by an MLP (Bottom MLP).
Interaction: Dot Product between embeddings and MLP output.
Top MLP: Takes the interaction output and predicts CTR.

Deep Dive: The Feature Store (Feast)

In Stage 4 (Feature Fetching), we need low latency. We cannot run SQL queries on a Data Warehouse (Snowflake/BigQuery) in real-time.

The Feature Store Solution:

Offline Store: (S3/Parquet). Used for training. Contains months of history.
Online Store: (Redis/DynamoDB). Used for inference. Contains only the latest values.
Consistency: The Feature Store ensures that the Offline and Online stores are in sync (Point-in-time correctness).

Workflow:

Data Engineering pipeline computes user_last_50_clicks.
Pushes to Offline Store (for training tomorrow’s model).
Pushes to Online Store (for serving today’s traffic).

Deep Dive: A/B Testing for Ranking

How do we know the new model is better? We run an Interleaving Experiment.

Standard A/B Test:

Group A sees Model A results.
Group B sees Model B results.
Compare CTR.
Problem: High variance. Users in Group A might just be happier people.

Interleaving:

Show every user a mix of Model A and Model B results.
Result = [A1, B1, A2, B2, ...]
If user clicks A1, Model A gets a point.
If user clicks B1, Model B gets a point.
Pros: Removes user variance. 100x more sensitive than standard A/B tests.

Deep Dive: Real-time Inference (Triton)

Serving a BERT model in 50ms is hard. NVIDIA Triton Inference Server:

Dynamic Batching: Groups incoming requests into a batch to maximize GPU utilization.
Model Ensembling: Runs Preprocessing (Python) -> Model (TensorRT) -> Postprocessing (Python) in a single pipeline.
Concurrent Execution: Runs multiple models on the same GPU.

Deep Dive: DLRM (Deep Learning Recommendation Model)

Facebook open-sourced DLRM. It’s the standard for L2 Ranking.

Architecture:

Sparse Features (Categorical): Mapped to dense embeddings.
Dense Features (Numerical): Processed by an MLP (Bottom MLP).
Interaction: Dot Product between embeddings and MLP output.
Top MLP: Takes the interaction output and predicts CTR.

Deep Dive: The Feature Store (Feast)

In Stage 4 (Feature Fetching), we need low latency. We cannot run SQL queries on a Data Warehouse (Snowflake/BigQuery) in real-time.

The Feature Store Solution:

Offline Store: (S3/Parquet). Used for training. Contains months of history.
Online Store: (Redis/DynamoDB). Used for inference. Contains only the latest values.
Consistency: The Feature Store ensures that the Offline and Online stores are in sync (Point-in-time correctness).

Workflow:

Data Engineering pipeline computes user_last_50_clicks.
Pushes to Offline Store (for training tomorrow’s model).
Pushes to Online Store (for serving today’s traffic).

Deep Dive: A/B Testing for Ranking

How do we know the new model is better? We run an Interleaving Experiment.

Standard A/B Test:

Group A sees Model A results.
Group B sees Model B results.
Compare CTR.
Problem: High variance. Users in Group A might just be happier people.

Interleaving:

Show every user a mix of Model A and Model B results.
Result = [A1, B1, A2, B2, ...]
If user clicks A1, Model A gets a point.
If user clicks B1, Model B gets a point.
Pros: Removes user variance. 100x more sensitive than standard A/B tests.

Deep Dive: Real-time Inference (Triton)

Serving a BERT model in 50ms is hard. NVIDIA Triton Inference Server:

Dynamic Batching: Groups incoming requests into a batch to maximize GPU utilization.
Model Ensembling: Runs Preprocessing (Python) -> Model (TensorRT) -> Postprocessing (Python) in a single pipeline.
Concurrent Execution: Runs multiple models on the same GPU.

Deep Dive: DLRM (Deep Learning Recommendation Model)

Facebook open-sourced DLRM. It’s the standard for L2 Ranking.

Architecture:

Sparse Features (Categorical): Mapped to dense embeddings.
Dense Features (Numerical): Processed by an MLP (Bottom MLP).
Interaction: Dot Product between embeddings and MLP output.
Top MLP: Takes the interaction output and predicts CTR.

Deep Dive: The Feature Store (Feast)

In Stage 4 (Feature Fetching), we need low latency. We cannot run SQL queries on a Data Warehouse (Snowflake/BigQuery) in real-time.

The Feature Store Solution:

Offline Store: (S3/Parquet). Used for training. Contains months of history.
Online Store: (Redis/DynamoDB). Used for inference. Contains only the latest values.
Consistency: The Feature Store ensures that the Offline and Online stores are in sync (Point-in-time correctness).

Workflow:

Data Engineering pipeline computes user_last_50_clicks.
Pushes to Offline Store (for training tomorrow’s model).
Pushes to Online Store (for serving today’s traffic).

Deep Dive: A/B Testing for Ranking

How do we know the new model is better? We run an Interleaving Experiment.

Standard A/B Test:

Group A sees Model A results.
Group B sees Model B results.
Compare CTR.
Problem: High variance. Users in Group A might just be happier people.

Interleaving:

Show every user a mix of Model A and Model B results.
Result = [A1, B1, A2, B2, ...]
If user clicks A1, Model A gets a point.
If user clicks B1, Model B gets a point.
Pros: Removes user variance. 100x more sensitive than standard A/B tests.

Deep Dive: Real-time Inference (Triton)

Serving a BERT model in 50ms is hard. NVIDIA Triton Inference Server:

Dynamic Batching: Groups incoming requests into a batch to maximize GPU utilization.
Model Ensembling: Runs Preprocessing (Python) -> Model (TensorRT) -> Postprocessing (Python) in a single pipeline.
Concurrent Execution: Runs multiple models on the same GPU.

Deep Dive: DLRM (Deep Learning Recommendation Model)

Facebook open-sourced DLRM. It’s the standard for L2 Ranking.

Architecture:

Sparse Features (Categorical): Mapped to dense embeddings.
Dense Features (Numerical): Processed by an MLP (Bottom MLP).
Interaction: Dot Product between embeddings and MLP output.
Top MLP: Takes the interaction output and predicts CTR.

Deep Dive: The Feature Store (Feast)

In Stage 4 (Feature Fetching), we need low latency. We cannot run SQL queries on a Data Warehouse (Snowflake/BigQuery) in real-time.

The Feature Store Solution:

Offline Store: (S3/Parquet). Used for training. Contains months of history.
Online Store: (Redis/DynamoDB). Used for inference. Contains only the latest values.
Consistency: The Feature Store ensures that the Offline and Online stores are in sync (Point-in-time correctness).

Workflow:

Data Engineering pipeline computes user_last_50_clicks.
Pushes to Offline Store (for training tomorrow’s model).
Pushes to Online Store (for serving today’s traffic).

Deep Dive: A/B Testing for Ranking

How do we know the new model is better? We run an Interleaving Experiment.

Standard A/B Test:

Group A sees Model A results.
Group B sees Model B results.
Compare CTR.
Problem: High variance. Users in Group A might just be happier people.

Interleaving:

Show every user a mix of Model A and Model B results.
Result = [A1, B1, A2, B2, ...]
If user clicks A1, Model A gets a point.
If user clicks B1, Model B gets a point.
Pros: Removes user variance. 100x more sensitive than standard A/B tests.

Deep Dive: Real-time Inference (Triton)

Serving a BERT model in 50ms is hard. NVIDIA Triton Inference Server:

Dynamic Batching: Groups incoming requests into a batch to maximize GPU utilization.
Model Ensembling: Runs Preprocessing (Python) -> Model (TensorRT) -> Postprocessing (Python) in a single pipeline.
Concurrent Execution: Runs multiple models on the same GPU.

Deep Dive: Graph Neural Networks (GNNs) for Ranking

Pinterest uses PinSage (GraphSAGE).

Graph: Users and Items are nodes. Interactions (Clicks/Pins) are edges.
Idea: An item is defined by the users who pinned it. A user is defined by the items they pinned.
Convolution: Aggregate features from neighbors (and neighbors of neighbors).
Benefit: Captures higher-order structure. “Users who bought X also bought Y” is 1-hop. GNNs capture 2-hop and 3-hop signals.

Deep Dive: Session-Based Recommendation

What if the user is anonymous (incognito)? We have no history. We only have the current session: [Item A, Item B, Item C, ...]. Model: GRU4Rec (Gated Recurrent Unit).

Input: Sequence of item embeddings.
Output: Predicted next item.
Mechanism: The hidden state evolves with each click, capturing the “Current Intent” (e.g., “Shopping for shoes”).
Loss: Pairwise Ranking Loss (BPR).

Deep Dive: Calibration (Why Probabilities Matter)

In Ranking, we often just care about the order. But in Ads Ranking, the absolute probability matters. Why?

Bid: Advertiser pays $1.00 per click.
Expected Revenue: ( \text{Bid} \times P(\text{Click}) ).
If model predicts 0.2 but real probability is 0.1, we over-estimate revenue and show the wrong ad.

Calibration Techniques:

Platt Scaling: Fit a Logistic Regression on the output logits.
Isotonic Regression: Fit a non-decreasing free-form line. (More flexible, requires more data).
Reliability Diagram: Plot “Predicted Probability” vs “Actual Frequency”. Ideally, it’s a diagonal line (y=x).

Deep Dive: Real-time Bidding (RTB)

In Ad Tech, ranking happens in an auction.

User visits page.
Request sent to Ad Exchange.
Exchange asks DSPs (Demand Side Platforms): “How much for this user?”
DSP runs Ranking Model: Predicts CTR and CVR (Conversion Rate).
Bid Calculation: (\text{Bid} = \text{CVR} \times \text{Value} \times \text{Pacing_Factor}).
Auction: Highest bid wins.
Latency: All this must happen in < 100ms.

Deep Dive: Multi-Task Learning (MTL)

We often want to optimize multiple objectives:

CTR: Click-Through Rate.
CVR: Conversion Rate (Purchase).
Dwell Time: Time spent.

Shared-Bottom Architecture:

Shared Layers: Learn generic features (User embedding, Item embedding).
Tower Layers: Specific to each task.
- CTR Tower -> Sigmoid.
- CVR Tower -> Sigmoid.
Loss: ( L = w_1 L_{CTR} + w_2 L_{CVR} ).
Benefit: Transfer learning. Learning to predict clicks helps predict conversions (and vice versa).

Deep Dive: Causal Inference in Ranking

The “Cold Start” problem is real. If we never show a new item, we never get data on it. We treat this as a Multi-Armed Bandit problem.

1. Epsilon-Greedy

With probability (1-\epsilon), show the best item (Exploit).
With probability (\epsilon), show a random item (Explore).
Pros: Simple.
Cons: “Random” items might be terrible.

2. Upper Confidence Bound (UCB)

Calculate the mean CTR (\mu) and the variance (\sigma).
Score = (\mu + \alpha \cdot \sigma).
Intuition: Boost items we are uncertain about (high variance).
As we get more data, (\sigma) decreases, and we rely more on (\mu).

3. Thompson Sampling

Model the CTR as a Beta distribution (Beta(\alpha, \beta)).
Sample a value from this distribution for each item. Rank by sampled value.
Pros: Mathematically optimal for minimizing regret.

Deep Dive: Bias and Fairness in Ranking

Ranking systems can amplify bias.

Popularity Bias: The rich get richer. Popular items get shown more, get more clicks, and become even more popular.
Position Bias: Top items get clicked because they are top.

Mitigation:

Inverse Propensity Weighting (IPW):
- Weight clicks by the inverse probability of the user seeing the item.
- If item X was at rank 10 (low probability of being seen) but got clicked, it’s a very strong signal.
Fairness Constraints:
- Ensure that the top 10 results contain at least 2 items from “Small Creators”.
- This is a constrained optimization problem.

Deep Dive: Online Learning (FTRL)

In fast-moving domains (News, Ads), a model trained yesterday is stale. We need Online Learning.

FTRL (Follow The Regularized Leader):

An optimization algorithm designed for sparse data and online updates.
It updates the weights after every batch of clicks.
Used heavily in Ad Click Prediction (Logistic Regression).
Architecture:
- Wide Part: FTRL (Memorization).
- Deep Part: DNN (Generalization).
- Wide & Deep Learning (Google).

Deep Dive: Evaluation Metrics Math (NDCG)

Why do we use NDCG instead of Accuracy? Because order matters.

CG (Cumulative Gain): [ CG_p = \sum_{i=1}^p rel_i ] Sum of relevance scores. (Does not care about order).

DCG (Discounted Cumulative Gain): [ DCG_p = \sum_{i=1}^p \frac{rel_i}{\log_2(i+1)} ] Penalizes relevant items appearing lower in the list.

IDCG (Ideal DCG): The DCG of the perfect ordering.

NDCG (Normalized DCG): [ NDCG_p = \frac{DCG_p}{IDCG_p} ] Values are between 0 and 1. Comparable across queries.

Deep Dive: Feature Engineering for Ranking

Features are the lifeblood of ranking.

1. Counting Features

“How many times has this user clicked this category?”
“How many times has this item been bought in the last hour?”
Implementation: Count-Min Sketch or Redis counters.

2. Crossing Features

User_Country x Item_Country: (Match vs Mismatch).
User_Gender x Item_Category.

3. Sequence Features

“Last 50 items viewed”.
Processed by a Transformer or GRU to generate a dynamic user embedding.

System Design: The Latency Budget

We have 200ms total. How do we spend it?

Component	Budget	Notes
Network Overhead	30ms	Round trip to client.
Query Understanding	20ms	BERT is slow; use DistilBERT or caching.
Retrieval (ANN)	20ms	Parallelize across shards.
L1 Ranking	30ms	Dot products are fast.
Feature Fetching	40ms	The bottleneck! Fetching features for 500 items from Feature Store (Cassandra/Redis).
L2 Ranking	50ms	Heavy model inference.
Re-Ranking	10ms	Logic heavy, compute light.

Optimization:

Parallelism: Run Retrieval and L1 for different sources (Ads, Organic) in parallel.
Feature Caching: Cache hot item features in local memory.

Key Takeaways & Features

Feature Type	Examples
User	Age, Gender, Past Clicks, Search History
Item	Title, Price, Category, CTR (Click-Through Rate)
Context	Time of Day, Device, Location
Interaction	User-Category Affinity, Last-Viewed-Item Similarity

Positional Bias: Users click the top result simply because it’s at the top. Fix: Train with “Position” as a feature, but set Position=0 during inference (Counterfactual Inference).

8. Evaluation Metrics

Offline:
- NDCG@K: Normalized Discounted Cumulative Gain. (Gold Standard).
- MRR: Mean Reciprocal Rank.
Online:
- CTR: Click-Through Rate.
- Conversion Rate: Purchases / Clicks.
- Dwell Time: Time spent on the page.

9. Failure Modes

The Cold Start Problem: New items have no interaction data.
- Fix: Use Content-based retrieval (Embeddings) + Exploration (Bandits).
Feedback Loops: The model only learns from what it shows. It never learns about items it didn’t show.
- Fix: Epsilon-Greedy Exploration (Show random items 1% of the time).

10. Summary

Stage	Model	Input Size	Latency
Retrieval	ANN / Inverted Index	1 Billion	10ms
L1 Ranker	Two-Tower / LR	10,000	20ms
L2 Ranker	BERT / GBDT	500	50ms
Re-Ranker	Heuristics	50	5ms

Originally published at: arunbaby.com/ml-system-design/0028-ranking-systems-at-scale

If you found this helpful, consider sharing it with others who might benefit.

1. Problem Definition

2. High-Level Architecture: The Funnel

3. Stage 1: Retrieval (Candidate Generation)

4. Stage 2: L1 Ranking (The Two-Tower Model)

5. Stage 3: L2 Ranking (The Heavy Lifter)

6. Stage 4: Re-Ranking & Blending

Deep Dive: Approximate Nearest Neighbors (ANN)

1. HNSW (Hierarchical Navigable Small World)

2. IVF (Inverted File Index)

3. ScaNN (Google)

Deep Dive: Training the Two-Tower Model

Deep Dive: Learning to Rank (LTR) Loss Functions

1. Pointwise (RMSE / Sigmoid)

2. Pairwise (RankNet / LambdaRank)

3. Listwise (LambdaMART)

Deep Dive: DLRM (Deep Learning Recommendation Model)

Deep Dive: The Feature Store (Feast)

Deep Dive: A/B Testing for Ranking

Deep Dive: Real-time Inference (Triton)

Deep Dive: DLRM (Deep Learning Recommendation Model)

Deep Dive: The Feature Store (Feast)

Deep Dive: A/B Testing for Ranking

Deep Dive: Real-time Inference (Triton)

Deep Dive: DLRM (Deep Learning Recommendation Model)

Deep Dive: The Feature Store (Feast)

Deep Dive: A/B Testing for Ranking

Deep Dive: Real-time Inference (Triton)

Deep Dive: DLRM (Deep Learning Recommendation Model)

Deep Dive: The Feature Store (Feast)

Deep Dive: A/B Testing for Ranking

Deep Dive: Real-time Inference (Triton)

Deep Dive: DLRM (Deep Learning Recommendation Model)

Deep Dive: The Feature Store (Feast)

Deep Dive: A/B Testing for Ranking

Deep Dive: Real-time Inference (Triton)

Deep Dive: Graph Neural Networks (GNNs) for Ranking

Deep Dive: Session-Based Recommendation

Deep Dive: Calibration (Why Probabilities Matter)

Deep Dive: Real-time Bidding (RTB)

Deep Dive: Multi-Task Learning (MTL)

Deep Dive: Causal Inference in Ranking

1. Epsilon-Greedy

2. Upper Confidence Bound (UCB)

3. Thompson Sampling

Deep Dive: Bias and Fairness in Ranking

Deep Dive: Online Learning (FTRL)

Deep Dive: Evaluation Metrics Math (NDCG)

Deep Dive: Feature Engineering for Ranking

1. Counting Features

2. Crossing Features

3. Sequence Features

System Design: The Latency Budget

Top Interview Questions

Key Takeaways & Features

8. Evaluation Metrics

9. Failure Modes

10. Summary

Related across topics

Kth Smallest Element in a BST

Voice Search Ranking

Database Interaction Agents: Mastering Text-to-SQL & Beyond

Share on