TinyURL System Design Deep-dive

titleImagePath

date

Apr 11, 2024

slug

tiny-url

status

Published

What is unique about TinyURL?

TinyURL stands out by providing a straightforward and reliable way to shorten long URLs into more manageable, concise links. This makes it especially useful in contexts where space for URLs is limited, such as social media posts, SMS, and email marketing. TinyURL also offers features like link customization and detailed tracking, which help users maintain control over their links and gain valuable insights into their performance.

System Analysis

Before diving into the system design to meet the use case, I recommend briefly analyzing the system's fundamental nature. Understanding this allows for prioritizing system attributes accordingly, as the system's nature directly impacts its architecture. This analysis influences how databases are scaled, how user requests are managed, and how caching and response times are optimized.

1. Read- or Write-Heavy?

It might seem counterintuitive, but despite the technical challenges associated with creating links, TinyURL is predominantly a read-heavy system. The creation of links, while crucial, is less frequent compared to the sheer volume of clicks each link receives. Most users will never create a link but will click on one, thus making the system predominantly read-heavy.

2. Monolithic vs. Distributed Architecture

TinyURL certainly benefits from a distributed system architecture. This type of architecture allows for linear scalability which is crucial for handling the vast number of users and link redirects that the service must manage daily. Opting for a distributed system over a monolithic setup ensures TinyURL can scale effectively and efficiently.

3. Availability or Global Consistency?

As a read-heavy system, the priority for TinyURL is to ensure that links work reliably and consistently over time—users expect that links they click will function correctly, potentially over spans as long as five years. Therefore, availability is the key focus. It's an acceptable trade-off to have a slight delay in a newly created link becoming fully operational across all systems if it means achieving high availability. Ensuring that links are accessible immediately upon creation and remain so reliably is crucial for user satisfaction and trust in the service.

These considerations about the nature of the system influence every aspect of the design process for TinyURL. Deciding on a distributed architecture directly affects how the service handles user requests and manages data consistency.

Prioritizing availability over strict consistency ensures that the service remains user-friendly and robust, even during network partitions. This focus on high availability also drives decisions related to infrastructure, such as adopting cloud-native solutions, implementing effective load balancing strategies, and optimizing caching methods.

Requirements

Our system should meet the following requirements:

Functional Requirements

For TinyURL, our system design should focus on the functionality of URL shortening and redirection, and extending it by few supporting features ensuring a streamlined and efficient user experience:

Unique and Short Links: The service should generate a unique and short link for any given regular URL, optimizing ease of sharing and memorability.

Redirect: When users access a short link, the system should instantly redirect them to the original URL, ensuring seamless user navigation.

Link History - Registered users should see all links they ever created.

Link Expiration - Links should expire after a default time span of 5 years.

Alternative Features

When outlining the scope of your TinyURL system design, consider including no more than 2-3 additional features alongside the core functionalities. There are various supplementary features that could enhance the service's utility and user experience:

Customizable Link Lifespan: Allow users to set custom expiration times for links, providing flexibility for temporary promotions or time-sensitive content.

Advanced Analytics: Provide detailed metrics for each link, such as the number of clicks over time, user demographics, and device type, enabling users to optimize their marketing strategies.

Batch Link Creation: Enable users to create multiple short links simultaneously, which is useful for businesses managing multiple campaigns or promotions.

API Access for Developers: Offer a robust API that allows developers to integrate TinyURL functionality into their own applications, automating link creation and retrieval.

Privacy Options: Allow users to create private links that are not indexed by search engines or visible to anyone without direct access.

Enhanced Security Features: Introduce security measures such as CAPTCHA verification or password protection for links to prevent abuse and ensure that only intended audiences can access certain URLs.

Custom Domains: Let users use their own domain names for branding purposes, making the short links appear as part of their own website.

Social Sharing Options: Provide built-in features for sharing links directly to social media platforms, enhancing the ease of content distribution.

QR Code Generation: Automatically generate a QR code for each short link, facilitating offline to online interactions.

When preparing for an interview, it's advantageous to discuss the features you are most familiar with in detail. For this scenario, we will focus on the features already chosen as part of the functional requirements.

Non-Functional Requirements

The non-functional requirements are crucial in ensuring that the TinyURL service is reliable, scalable, and efficient:

High Reliability: The system should be robust, ensuring that no link data is lost and that every short link reliably redirects to its corresponding original URL.

High Availability: TinyURL should be designed to offer high availability, minimizing downtimes and ensuring that the service is accessible at all times.

Scalability: The system must efficiently handle a large number of requests at any given time without performance degradation, particularly during peak usage periods.

Minimal Latency: The redirection process should be near-instantaneous, providing a swift response time that ensures a seamless user experience.

Capacity Estimation

Let's start with the estimation and constraints.

🙋

Before diving into detailed capacity estimations during an interview, clarify the interviewer's expectations. Recently, interviewers tend to focus on estimates that directly influence design decisions, rather than requiring comprehensive estimations.

Throughput

Start of with estimating request per second based on the daily active users. Therefore, take 100 million active users a day and multiply them by the amount of interactions they have, which is one click on a shortened link per day.

Assumption

100 million reads a day

1 click per user a day

The result is 1000 reads per second. Now the read/write ratio becomes relevant. The ratio is 10 reads for 1 write.

That means we have a total of 1100 requests per second. 1000 reads and 100 writes. There are no peak loads to consider, so we are done with the requests estimation.

Bandwidth

To estimate the bandwidth we need the size of each request, which is 200 bytes and the amount of requests per read and writes we just estimated to be 1000 and 100.

Assumption

Avg request size: 200 bytes

Requests: read 1000 RPS, write 100 RPS

Reads Bandwidth

Writes Bandwidth

To estimate the required bandwidth for reads and writes you multiply the amount of requests per second with the size of each request. For reads we end up with 200 kb/sec. For the writes you can use the read/write ratio again to get to 20 kb per second.

Storage

Start of with reusing the write bandwidth of 20kb per second and express it as 2 * 10^1 That's the amount of data we need to store per second. Apply a short hand to scale up to 5 years, that saves one step in estimating the required storage.

Assumption

100 writes per second

200 kB per write

time horizon: 5 years

No replication factor

Storage per Second

In 5 years

Next we take the 20 kB/s and apply a shorthand from the cheat sheet to get to the amount we need within the next 5 years.

In fact, TinyURL will need 4000 GB or 4 terabyte of storage. Considering the current size of our user base, TinyURL is not a very storage-hungry system at all. We will get to much higher storage requirements once we estimate systems that deal with lots of images and video.

That's it for capacity estimation let's head on to the next step.

Data Model

Start off with taking a look at the requirements and identify entities, their attributes and the relationships between them.

Entities & Attributes

You need two entities - one for links and one fur users.

Entities

Links

User

Key Ranges

Properties

Links have...

a due date as they expire after 5 years

a key that make the short link unique.

the original URL.

Users have…

an Id and a Links property to store the handles of all short links they ever created.

Key Ranges have...

A key range which is the 5 figure key we pre-generate and store till we need it.

An attribute to track if the key range is already in use.

Relationships

The relationships between the entities are straight forward now.

Users own links and links belong to a certain key range.

Database Selection

Think about the most suitable databases for our data.

User data is a textbook example for leveraging the advantages of a relational database. In any system, you want to query user data and you rarely need all of it at once. User data is also straightforward to normalize and to thereby avoid keeping duplicate data which would blows up the system's storage footprint. Moreover, relational databases favour data consistency.

Next look at the Links table. As previously analysed. The availability of the service had higher priority than its data consistency.

We also have the non-functional requirement of low latency so redirecting users to the original URL should be very fast. And we don't have terribly complex data or relationships.

In fact, we only need to query by key. A key value store would be a great fit here! Let's also look at the key ranges table.

It would also be wise to choose a database that favours data consistency, as we don't want to accidentally use already taken key ranges twice.

From a functional perspective, the ability to filter keys by status would also help to extract only available key ranges. Both point towards a relational database.

API Design

This is going to be a quick one, TinyURL only needs to expose 3 API endpoints to provide its core functionality.

Create a Link

Our first endpoint is called createLink(originalURL) which only needs the originalURL as a parameter.

Parameters

Response

originalURL (string): The original long url.

When successful, it returns the response code 200 and the newly created short url.

Link History

Next we have the endpoint that returns all links a user ever created, I called it` getLinkHistory with the userId as as the main parameter and another one called sorting to define in which order we want the links to be returned with the latest or the oldest first.

Parameters

userId (string): The unique user ID.

sorting (string): ascending/descending

Response

As a response you will receive an array of links and a property that indicates the sorting order of the links.

Look up original URL

The last endpoint is called lookUpURL and probably the most heavily frequented of our endpoints. It takes a short URLs key as a parameter to return the original long URL.

Parameters

Response

shortURL (string): The unique short URL key

The response is the original long URL.

Design Core Features

First draw the main user flow. Start off with a web application which will be the interface between the user and the system. Next, add the URL service that will allow users to create but also to lookup already existing short URLs.

Now, add the key-value store that stores all original URLs and allows to query them by the short URL key. Add multiple instances of the URL service and a load balancer that distributes the incoming traffic evenly between the multiple instances.

Add the missing arrows pointing in the direction of the user flow. Next, also add the key generator service and a relational database which stores are pre-generated keys.

Design Supporting Features

Link History

let's start with the link history. To retrieve all links a user has ever created, first we need user profiles. Therefore add multiple instances of a user service that handles all requests related to user management and a relational database that stores all this user information, including the shortURLs each registered user has created. It's out of scope to visualize any features related with user management, so no need to think about how to register new users, manage profiles etc.

With this new components in place it's easy to reason about how a link history view would work.

First the web application would fetch the URL keys from a user's profile and then retrieve the long urls, expiration date and so on from the key value store one by one. Thanks to the fast key-value store lookups it's not an issue to look up one link after the other.

Expiring URLs

This feature requires some initial thought work. How to check if a link should be expired. Unfortunately, you can't efficiently query the key-value store for links that might be due to be deleted.

Cron jobs are a way to run services periodically.

Use those to query the key ranges database on a daily bases and filter for the ranges that were used exactly 5 years ago. Besides emptying the usedAt property, use the key range to look up all links in the key value store and delete them.

Design Discussion

The design discussion evaluates a candidate's ability to architect and scale complex systems. In this section you find a detailed list of typical questions along with solution drafts and links to in-depth articles that further elaborate on more advanced concepts.

Basic Functionality

Questions on "Basic Functionality" probe a candidate's understanding of the essential operations and core features of the system. They aim to ascertain how well the candidate grasps the fundamental processes, data flows, and user interactions that form the backbone of the system's design.

How does the URL shortening mechanism work?

Hashing Function: Convert the original URL into a shorter, unique hash code using algorithms like MD5 or SHA-256.

Base Conversion: Alternatively, increment a counter and convert the number to a base-62 [a-zA-Z0-9] format to generate a unique code.

Key-Value Store: Store the short code and its corresponding original URL in a key-value database for efficient retrieval.

Redirection Logic: Implement server logic to redirect requests from the short URL to the stored original URL based on the unique short code.

Caching: Use caching mechanisms to improve the redirection performance for frequently accessed URLs.

How are the short URLs stored in the database?

Key-Value Schema: Use a key-value pair where the key is the short URL code and the value is the original URL.

Database Choice: Employ a high-performance NoSQL database like Redis or MongoDB for rapid access and scalability.

Normalization: Normalize data by storing URLs with a unique ID and mapping these IDs in a separate table if using a relational database.

Expiration Mechanism: Optionally include an expiration field to automate the removal of old or unused URLs.

Indexing: Implement indexing on the short URL codes to speed up query performance and enhance retrieval times.

What happens when a user clicks a shortened URL?

Request Handling: The web server receives a request containing the short URL code.

Database Lookup: The server queries the database using the short code as the key to fetch the associated original URL.

Redirection: Once the original URL is retrieved, the server sends an HTTP redirect response to the user's browser, instructing it to load the original URL.

Error Handling: If the short URL does not exist or is expired, the server handles the error, often redirecting the user to an error page or a default landing page.

Scalability and Performance

Questions on "Scalability and Performance" are designed to evaluate a candidate's ability to design systems that can efficiently handle growth in user demand and data volume. They explore strategies for optimizing system resources and maintaining high performance under increasing loads.

How does the system scale with increasing numbers of users and links?

Horizontal Scaling: Add more servers to distribute the load evenly. This can involve using load balancers to direct traffic to the least busy servers

Database Scalability: Employ a distributed database system that can scale out by adding more nodes. This helps handle larger volumes of data and increases the capacity for simultaneous read/write operations.

Caching: Implement caching strategies to reduce database load by storing frequently accessed URLs in faster-access storage layers like in-memory databases (e.g., Redis).

Database Sharding: Partition the database into smaller, more manageable pieces called shards, spread across multiple servers to improve performance and reduce bottlenecks.

Content Delivery Network (CDN): Use a CDN to serve static content and cache URLs geographically closer to the user to decrease latency and server load.

Performance Optimization: Regularly optimize the code and database queries to handle operations more efficiently, reducing processing time and resource consumption.

What caching strategies are employed to improve performance?

In-Memory Caching: Utilize in-memory data stores like Redis or Memcached to cache frequently accessed URLs directly in RAM, providing fast retrieval and reducing database load.

Edge Caching: Use edge servers provided by a Content Delivery Network (CDN) to cache URLs geographically closer to the user. This reduces latency and speeds up the access time for users spread across different regions.

Time-to-Live (TTL) Settings: Assign a Time-to-Live (TTL) to cache entries so that data is automatically refreshed after a certain period. This helps in maintaining the freshness of the data, especially for URLs that may expire or change.

Cache Invalidation: Implement robust cache invalidation strategies to ensure that cached data is updated or removed when the original data changes or when URLs are deleted or modified.

Pre-Caching: Anticipate high traffic for new URLs and pre-cache these entries shortly after creation, ensuring that they are available in the cache as soon as they start receiving traffic.

Load-Aware Caching: Adaptively increase cache sizes or change caching logic based on current load and traffic patterns, optimizing resource usage during peak times.

Distributed Caching: Use a distributed caching system to scale out the cache across multiple servers, preventing cache overload on a single server and balancing the load more effectively.

How is high availability achieved in the system architecture?

Redundancy: Implement redundant instances of critical components, such as web servers and databases, to ensure there is no single point of failure. If one component fails, another can take over without disrupting the service.

Load Balancing: Use load balancers to distribute incoming traffic evenly across multiple servers. This not only helps in handling high traffic volumes efficiently but also provides failover capabilities in case one of the servers goes down.

Failover Mechanisms: Set up automatic failover processes where if a primary component fails, the system automatically switches to a standby system. This transition is seamless to the end users.

Database Replication: Utilize master-slave replication in databases to have backup copies of the data. In the event of a primary database failure, one of the replica databases can serve read queries or be promoted to the master to maintain the service's availability.

Geographical Distribution: Deploy infrastructure across multiple physical locations to mitigate the risk of a regional outage affecting the entire service. This geographic distribution ensures that even if one data center fails, others can handle the load.

Health Monitoring and Auto-Recovery: Continuously monitor the health of all system components using automated monitoring tools. Set up alerts and automated recovery scripts that can restart services or deploy additional resources when needed.

Regular Updates and Patching: Keep all system components updated and patched to minimize vulnerabilities and ensure stability. Scheduled maintenance should be performed during off-peak hours to minimize impact on availability.

Quality of Service (QoS) Rules: Implement QoS rules to prioritize critical operations and manage traffic under load, ensuring that essential services maintain high availability even during peak periods or under stress.

What are the backup and disaster recovery plans?

Regular Data Backups: Schedule regular backups of all critical data, including database contents (shortened URLs and their mappings). These backups should be automated and occur at frequent intervals to minimize data loss.

Offsite Storage: Store backup data in geographically diverse locations. This protects against data loss in the case of a physical disaster affecting the primary data center.

Backup Testing: Regularly test backup files to ensure they are recoverable and complete. This testing should be part of the routine to catch issues with the backup process early.

Disaster Recovery Plan: Maintain a detailed disaster recovery plan that outlines steps to restore operations after a system failure. This includes contact information for team members, detailed procedures for recovery, and a list of software and tools required for restoring the system.

Recovery Time Objective (RTO) and Recovery Point Objective (RPO): Define clear RTO and RPO metrics to set expectations for recovery times and acceptable data loss in case of a disaster.

High Availability Setup: Deploy critical components of the system in a high availability configuration across multiple data centers. This can involve using active-active or active-passive setups to ensure that if one site goes down, another can take over without interruption.

Failover Mechanisms: Implement automatic failover to standby systems and data replication between primary and secondary sites to ensure that services can continue with minimal interruption in the event of hardware or software failures.

Incident Response Team: Assemble a dedicated incident response team trained to handle disasters and system failures efficiently. This team should regularly participate in drills and updates to the disaster recovery plan.

Cloud-Based Solutions: Consider using cloud services that provide built-in disaster recovery and data replication features to enhance resilience and reduce the overhead of managing physical infrastructure.

Security Concerns

Questions on "Security Concerns" assess a candidate’s ability to safeguard the system against potential threats and vulnerabilities. They focus on understanding the candidate’s strategies for ensuring data integrity, preventing unauthorized access, and managing risk throughout the system’s architecture.

How does the system handle security, particularly the risk of malicious URLs?

Validation and Sanitization: Perform rigorous input validation and content sanitization to ensure URLs are free from malicious content and properly formatted to prevent security threats like XSS and SQL injection.

URL Scanning: Integrate real-time and periodic scanning of URLs using services like Google Safe Browsing to detect and blacklist malicious links.

Access Control: Enforce user authentication and implement rate limiting to manage and trace URL submissions, reducing the risk of abuse and DDoS attacks.

Secure Data Handling: Use HTTPS for secure data transmission and employ encrypted storage solutions to protect sensitive information.

Monitoring and Response: Monitor for unusual activity and maintain a rapid incident response plan to address potential security breaches or abuse effectively.

What measures are in place to prevent spamming and abuse of the service?

Rate Limiting: Implement strict rate limiting on API calls and URL submissions to prevent overuse and reduce the risk of automated spam attacks.

CAPTCHA Verification: Employ CAPTCHA challenges for creating and submitting URLs to verify human users and deter automated bots.

Account Verification: Require email or phone verification for user accounts to discourage anonymous misuse and facilitate traceability of abusive actions.

Blacklisting: Use dynamic blacklisting of IPs, domains, and users identified as sources of spam or malicious activity to immediately mitigate abuse.

User Behavior Analysis: Monitor user behavior for patterns indicative of abuse, such as rapid submission of multiple URLs, and apply automatic restrictions or account suspensions based on predefined thresholds.

API Security: Secure API endpoints with authentication tokens and encryption to prevent unauthorized access and exploitation by malicious actors.

Alternative Features

Interviewers ask about possible extensions to assess a candidate's ability to enhance and differentiate a system with creative solutions that improve user experience and functionality. This can include any of the features listed in the functional requirements as alternative features.

TinyURL System Design Deep-dive

What is unique about TinyURL?

System Analysis

1. Read- or Write-Heavy?

2. Monolithic vs. Distributed Architecture

3. Availability or Global Consistency?

Requirements

Functional Requirements

Alternative Features

Non-Functional Requirements

Capacity Estimation

Throughput

Assumption

Bandwidth

Assumption

Reads Bandwidth

Writes Bandwidth

Storage

Assumption

Storage per Second

In 5 years

Data Model

Entities & Attributes

Entities

Properties

Relationships

Database Selection

API Design

Create a Link

Parameters

Response

Link History

Parameters

Response

Look up original URL

Parameters

Response

Design Core Features

Design Supporting Features

Link History

Expiring URLs

Design Discussion

Basic Functionality

How does the URL shortening mechanism work?

How are the short URLs stored in the database?

What happens when a user clicks a shortened URL?

Scalability and Performance

How does the system scale with increasing numbers of users and links?

What caching strategies are employed to improve performance?

How is high availability achieved in the system architecture?

What are the backup and disaster recovery plans?

Security Concerns

How does the system handle security, particularly the risk of malicious URLs?

What measures are in place to prevent spamming and abuse of the service?

Alternative Features

How can the system support customizable link lifespan?

How to implement advanced analytics features?