Netflix System Design Deep-dive

Apr 12, 2024
Explore our Netflix system design guide. Discover architecture, key algorithms, and scalability tactics for building effective streaming systems.
In this comprehensive guide, we'll embark on the journey of designing a Netflix-like video streaming service, tailored specifically for system design interview preparation. However, it's important to recognize that the intricacies of such a project extend far beyond what can be addressed in a single interview session.
Instead, this resource serves as a robust toolkit to equip you with the knowledge and insights necessary to excel in system design interviews.

What is unique about Netflix?

notion image
Netflix is a streaming service that requires a subscription, enabling users to watch TV shows and movies on any internet-connected device. It's accessible on various platforms, including the web, iOS, Android, and TV.

System Analysis

Before diving into system design to meet the use case, I recommend briefly analyzing the system's fundamental nature. Understanding this allows for prioritizing system attributes accordingly, as the system's nature directly affects the architecture. For instance, it influences the need to scale databases, run multiple service instances, and optimize caching strategies.

1. Read- or write-heavy?

Streaming applications are typical read-heavy systems. As millions of users consume the the available content which just a fraction of users uploads. So is Netflix.

2. Monolithic vs. Distributed Architecture

Like all high-level design interview question on famous systems, it’s a distributed architecture. You won’t scale the system to millions of users by relying on a monolithic architecture, running on a single server.

3. Availability or Global Consistency?

Assuming Netflix operates as a distributed system, network partitions are inevitable. In such cases, it's preferable for Netflix to maintain availability, even if it means an inconsistent state that can be resolved once the network issue is fixed. Prioritizing availability ensures that users can continue enjoying their content while inconsistencies are resolved later.
These initial thoughts on the system's nature will guide the entire design process. Whether it should be monolithic or distributed directly influences architectural style, database distribution, and data consistency models. For instance, a distributed architecture for Netflix prioritizes availability over strict consistency, ensuring seamless streaming during network partitions. It also affects infrastructure choices like cloud-native vs. on-premises, load balancing, and caching strategies.


Our system should meet the following requirements:

Functional Requirements

  • Users should have the capability to stream videos.
  • The content team should have the ability to upload new videos, including movies, TV show episodes, and other types of content.
  • Users should be able to find videos by searching for titles or tags.

Alternative Features

When suggesting the scope of your Netflix system design, consider to include not more than 2-3 features besides the core. There are various commonly suggested features :
  • Certain content should be geo-blocked.
  • Resume video playback from the point the user left off.
  • Record metrics and analytics of videos.
  • Support recommendations based on viewing history.
  • Provide multi-language subtitles and audio tracks.
  • Enable multiple user profiles under one account.
  • Introduce offline viewing by allowing users to download content.
  • Include social features like sharing, rating, or commenting on content.
  • Provide push notifications for new episodes or recommendations.
  • Support integration with smart home devices (e.g., voice assistants).
In an interview, it's best to suggest the features you're most comfortable discussing in depth. Here, I'll explain only the three features that have already been selected in the functional requirements.

Non-Functional Requirements

  • Achieve high reliability, preventing any uploads from being lost.
  • Ensure high availability with minimal latency.
  • The system must be scalable and efficient.

Capacity Estimation

Let's start with the estimation and constraints.
Before diving into detailed capacity estimations during an interview, clarify the interviewer's expectations. Recently, interviewers tend to focus on estimates that directly influence design decisions, rather than requiring comprehensive estimations.


This will be a read-heavy system, let us assume we have 1 billion total users with 30 million daily active users (DAU), and on average each user watches 1 video a day. On average each of them watches 1 video per day and each video is 60 minutes.
pick the arbitrary number of 1 request per minute. That means, an average video triggers 60 requests till completely streamed.
Now we can multiply this number with the total amount of daily active users.
The slightly rounded result is that our system needs to handle 2 billion read-requests per day on average. Next you should convert this number to requests per second. Therefore, you simply divide the 2 billion by the shorthand of 10 to the power of five. That's the rounded amount of seconds a day has. You end up with 20 thousand read requests per second.
Assuming a read-write ratio of 1 to 100, that means we can simply divide the reads by 100. This results in 20 write requests per second.
Unlock your potential in system design interviews with my essential Capacity Estimation Cheat Sheet! Download now to master calculation shorthands and get a comprehensive overview of all the necessary formulas.


To estimate storage we continue with the estimate of 20 writes per second we got while calculating throughput.
Next, we need to consider the size of each write. Assume a size of 50 MB per 1 minute of uploaded or streamed video content, which is a widely accepted assumption. Based on 20 write requests per second, with each request being 50 MB, we can estimate that the system requires an additional 1 Gigabyte per second.
To quickly estimate the additional storage required over the next 5 years, multiply your storage per second by which is the rounded result of
You can find these shorthands on the cheat sheet, I linked above.
In Petabytes:


To calculate the bandwidth per second, you need to take the 20k read requests per second that we also estimated before multiply it by the request size of 50 MB:
Considering the very high number of read requests, you can simplify your calculations by ignoring the write requests due to their relatively insignificant size.
It's crucial to explain this reasoning to the interviewer.
Once multiplied, it makes sense to convert the Megabytes to Terabytes. There is a shorthand for this on the cheat sheet. The final result is a required maximum bandwidth of 10 Terabytes per second.

Data Model Design

Entities & Attributes

The first task is to define the entities and the associated attributes. You can use the list of requirements as a starting point, but not all necessary entities and attributes may not be explicitly mentioned. Be carefully though, to not add too many additional entities or attributes that are not necessary for the functional requirements you defined previously.
notion image


The four necessary entities are
  • Videos
  • Thumbnails
  • Metadata
  • Users


For metadata it would make sense to add attributes like
  • videoID
  • title
  • summary
  • length
  • rating
  • uploadedBy, an attribute that links to the admin that uploaded the video
  • thumbnail URL, a link to the location where the thumbnail is stored.
The user has attributes like
  • a unique ID
  • a name or other personal details
  • role - admin: yes/no


You can identify four major relationships.
  • Users upload Video metadata
  • Video Metadata link video files
  • Video Metadata link Image metadata
  • Image metadata link image files
Users can also search for videos but we get to that in a minute.
notion image

Databases & Carnality

Select and justify the most suitable databases for the different portions of data.
notion image

Relational Database

Let's start simple, user data goes into a relational database. Why? The data is well suited for a tabular format. The limiting factors of relational database don't hit hard for a user database, for example, we won't require complex join operations etc. we just need a simple but reliable solution to keep track of user data. The ACID guarantees for transactions helps tremendously with that.

Object Storage

Both video and thumbnail files can be stored in an object store. Especially because files are not likely to be updated frequently, an object store is a good option because it allows to keep metadata close to the data they belong too. The video metadata would include an additional attribute which links to the associated thumbnails.

Search Engine Database

To implement the search feature, you should introduce a search engine database. It makes sense to visualize an additional table that's storing the metadata you want to index, in order to make the meta data accessible. The value might simply be a link to the address in the Object storage where the whole metadata is stored.

API Design

Let us do a basic API design for our services:

Upload a video

Given a byte stream, this API enables video to be uploaded to our service.
Title (string): Title of the new video.
Description (string): Description of the new video.
Data (byte[]): Byte stream of the video data.
Tags (string[]): Tags for the video (optional).
Result (boolean): Represents whether the operation was successful or not.

Streaming a video

This API allows our users to stream a video with the preferred codec and resolution.
Video ID (UUID): ID of the video that needs to be streamed.
Codec (Enum<string>): Required codec of the requested video, such as h.265h.264VP9, etc.
Resolution (Tuple<int>): Resolution of the requested video.
Offset (int): Offset of the video stream in seconds to stream data from any point in the video (optional).
Stream (VideoStream): Data stream of the requested video.

Search for a video

This API will enable our users to search for a video based on its title or tags.
Query (string): Search query from the user.
Next Page (string): Token for the next page, this can be used for pagination (optional).
Videos (Video[]): All the videos available for a particular search query.

System Design - Core Features

notion image
The first task is to design the core feature required for the video upload feature of Netflix.
As we analyzed early on, Netflix needs to be a distributed system in order to scale to its required size. We will be using a microservices architecture since this will make it easier to horizontally scale and decouple our services. Each service will have ownership of its own data model.
Let's divide our system into core and supporting services. We start by designing the services for the core value proposition first, as this is the centerpiece of your interview performance. If some supporting features are not as well defined due to lack of time, this is not as critical.

Upload Service

notion image
This service will handle the video uploads and processing. It will be discussed in detail separately.
You start off with a client-facing application for the admins who are supposed to upload new video content. Once that's in place you would add a load balancer to distribute the incoming traffic evenly across multiple instances of the upload service.
The upload service would expose the uploadVideo endpoint as defined in the API design step. Besides the raw video file the admins also uploads all relevant meta data and their userID.
What happens next is an important design decision, does the upload service starts processing each uploaded chunk directly, or does it store them in a dedicated object store till the whole file is safely uploaded before processing it?
Depending on your justification, both are feasible options. I am deciding to go without an extra storage, because I assume we can be sure, anything that starts to be uploaded is in fact intended to become available sooner or later, so spending resources directly on each uploaded chunk is likely to be worth it.
That means, we push the incoming chunk of 50MB directly into a message queue in order to decouple the upload service from the processing pipeline. Chunks wait in the queue till there is available resources to process them. If there are enough resources, multiple chunks can be processes in parallel due to the multi-instance approach for the worker service, which does the heavy computational lifting and than terminates.
The processed version of each chunks is stored in an object store, waiting for their peers to arrive to then become available to be streamed out again.
Note: We can add additional steps such as subtitles and thumbnails generation as part of our pipeline.

Stream Service

notion image
The stream service will handle video streaming-related functionality.
Now we start to work our way into the system, from the perspective of our users.
First we need another client application. That can be anything a web app, a native iOS app or an app running on a smart TV. Then there is a load balancer to distribute the incoming traffic across multiple instances of the video service.
The video service exposes the user facing API with the streamVideo(...) endpoint. Once a user requests to stream a video, the service accesses the video on the object store database. To improve the performance of the system, we should add a CDN which serves the video content with closer geographical proximity to avoid unnecessary latency.

System Design - Supporting Features

notion image

User Service

A user service is a fundamental component in many system architectures, responsible for managing user-related data and operations.
This service typically handles tasks such as user registration, authentication, profile management, and authorization. A robust SQL database, such as PostgreSQL or MongoDB, is a common choice to store user information, while caching layers like Redis can optimize read performance. Additionally, implementing API gateways can streamline interactions between the user service and other system components, ensuring a seamless and secure user experience.
For more details, read my in-depth article on designing a user service.

Search Service

The last task is to add all system components required to enable the search feature we still have to implement.The service is responsible for handling search-related functionality.
In the API design step we defined a third endpoint called searchVideo(...). So far we don't have any architecture in place to provide the related functionality to the end user.
Now is the time to change that, we need to another service - the search service and a search engine database.
The search service queries the database and returns the most relevant results.
We also need to add an additional step to our processing pipeline so that the metadata that's supposed to be searchable is pushed into the search engine database so it can be indexed there.
For more details, read my in-depth article on designing a search service.
Let us identify and resolve bottlenecks such as single points of failure in our design:

Design Discussion

What if one of our services crashes?

To handle service crashes and ensure high availability:
  • Service Redundancy: Deploy multiple instances of each service across different availability zones or regions. This ensures that if one instance or even an entire data center goes down, other instances can continue to serve traffic.
  • Health Checks and Auto-Recovery: Implement health checks to monitor service health and auto-recovery mechanisms to restart or replace failed instances automatically.
  • Circuit Breaker Pattern: Use the circuit breaker pattern to detect service failures and prevent cascading failures by rerouting traffic or falling back to a degraded mode.

How will we distribute our traffic between our components?

To distribute traffic efficiently and evenly:
  • Load Balancers: Deploy load balancers between clients and services to distribute incoming traffic evenly across multiple instances. This ensures no single instance is overwhelmed and improves fault tolerance.
  • Service Discovery: Implement a service discovery mechanism to dynamically route traffic to available instances. Tools like Netflix Eureka can help manage service instances and their locations.
  • Content Delivery Network (CDN): Utilize a CDN to cache and distribute static content globally, reducing the load on origin servers and improving user experience.

How can we reduce the load on our database?

To reduce the load on the database and improve performance:
  • Database Sharding: Split the database into smaller, more manageable pieces, or shards, to distribute the load. Each shard handles a subset of the data, reducing the load on any single database instance.
  • Caching Layer: Implement a caching layer using distributed caches like Redis or Memcached to store frequently accessed data in memory, reducing the number of read operations hitting the database.
  • Read Replicas: Use multiple read replicas to distribute read traffic. Write operations go to the primary database, while read operations are spread across replicas.

How to improve the availability of our cache?

To ensure high availability of the caching layer:
  • Distributed Caching: Use distributed caching solutions like Redis Cluster or AWS ElastiCache to distribute the cache across multiple nodes. This ensures that even if one node fails, the cache can still serve requests from other nodes.
  • Replication: Enable cache replication to maintain copies of cache data across multiple nodes. This ensures data availability even in case of node failures.
  • Eviction Policies: Implement efficient cache eviction policies to manage memory usage and ensure that the cache remains performant under heavy load.

How to make a system more resilient?

To make our system more resilient we can do the following:
  • Running multiple instances of each of our services.
  • Introducing load balancers between clients, servers, databases, and cache servers.
  • Using multiple read replicas for our databases.
  • Multiple instances and replicas for our distributed cache.

What about inter-service communication and service discovery?

To manage dynamic service discovery and maintain efficient communication:
  • Service Registry:
    • Purpose: Maintains a registry of service instances and their network locations.
    • Implementation: Use tools like Netflix Eureka, Consul, or etcd to handle dynamic registration and discovery of services.
  • Service Mesh:
    • Purpose: Provides an additional layer for managing service-to-service communication.
    • Implementation: Use service mesh solutions like Istio or Linkerd to handle load balancing, traffic routing, retries, and observability.
    • Advantages: Offers built-in service discovery, traffic management, resilience patterns (circuit breaking, retries), and observability (metrics, logs, traces).