Slide 0

Slide 1

Machine Learning @ Netflix
(and some lessons learned)
Yves Raimond (@moustaki)
Research/Engineering Manager
Search & Recommendations
Algorithm Engineering

Slide 2

Netflix evolution

Slide 3

Netflix scale
●
●
●
●
●
> 69M members
> 50 countries
> 1000 device types
> 3B hours/month
36% of peak US downstream traffic

Slide 4

Recommendations @ Netflix
●
Goal: Help members find content
to watch and enjoy to maximize
satisfaction and retention
●
Over 80% of what people watch
comes from our recommendations
●
Top Picks, Because you Watched,
Trending Now, Row Ordering,
Evidence, Search, Search
Recommendations, Personalized
Genre Rows, ...

Slide 5

Models & Algorithms
▪
Regression (Linear, logistic, elastic net)
▪
SVD and other Matrix Factorizations
▪
Factorization Machines
▪
Restricted Boltzmann Machines
▪
Deep Neural Networks
▪
Markov Models and Graph Algorithms
▪
Clustering
▪
Latent Dirichlet Allocation
▪
Gradient Boosted Decision Trees/Random Forests
▪
Gaussian Processes
▪
…

Slide 6

Some lessons learned

Slide 7

Build the offline experimentation
framework first

Slide 8

When tackling a new problem
●
●
●
What offline metrics can we compute that capture what online improvements we’
re actually trying to achieve?
How should the input data to that evaluation be constructed (train, validation,
test)?
How fast and easy is it to run a full cycle of offline experimentations?
○
●
Minimize time to first metric
How replicable is the evaluation? How shareable are the results?
○
○
Provenance (see Dagobah)
Notebooks (see Jupyter, Zeppelin, Spark Notebook)

Slide 9

When tackling an old problem
●
Same…
○
Were the metrics designed when first running experimentation in that space still appropriate now?

Slide 10

Think about distribution from the
outermost layers

Slide 11

1. For each combination of hyper-parameter
(e.g. grid search, random search, gaussian processes…)
2. For each subset of the training data
a.
b.
Multi-core learning (e.g. HogWild)
Distributed learning (e.g. ADMM, distributed L-BFGS, …)

Slide 12

When to use distributed learning?
●
The impact of communication overhead when building distributed ML
algorithms is non-trivial
●
Is your data big enough that the distribution offsets the communication overhead?

Slide 13

Example: Uncollapsed Gibbs sampler for LDA
(more details here)

Slide 14

Design production code to be
experimentation-friendly

Slide 15

Example development process
Idea
Offline
Modeling
(R, Python,
MATLAB, …)
Data
Iterate
Missing postprocessing logic
Data
discrepancies
Production environment
(A/B test)
Final
model
Actual
output
Performance
issues
Implement in
production
system (Java,
C++, …)
Code
discrepancies

Slide 16

Avoid dual implementations
Experiment
code
Production
code
Experiment
Production
Shared Engine

Slide 17

To be continued...

Slide 18

We’re hiring!
Yves Raimond (@moustaki)

Slide 19