Big Tech Coach

What is a CDN?

titleImagePath

components/CDN-Article.png

date

May 23, 2024

slug

what-is-a-cdn

status

Published

tags

summary

Discover how Content Delivery Networks (CDNs) enhance global data delivery, optimizing performance for streaming and dynamic web applications.

type

SystemComponent

systemType

probability

In the expansive landscape of system design, ensuring efficient and reliable delivery of content to end users is paramount, especially as applications scale globally. Content Delivery Networks (CDNs) are instrumental in achieving this goal by optimizing data delivery across diverse geographic locations. This introduction delves into the mechanics of CDNs, highlighting their role as a critical component in the architecture of systems that need to serve content swiftly and dependably, such as streaming services and dynamic web applications.

How Do CDNs Work?

Let's delve into how data is cached within a CDN. Essentially, the CDN functions as a gateway for all your inbound traffic, which is why, when creating a system diagram, you'd typically position the CDN between the client application and your main application server.

notion image

With that structural understanding in place, how does the caching process actually play out in the context of video streaming?

A user from a distant location sends a request to stream a movie.

This request initially reaches the CDN. If the CDN doesn't have a cached version of the initial chunk, it forwards the request to the origin server. In response, the origin server retrieves the desired content. Simultaneously, it sends a copy of this content to the CDN server nearest to that user.

Based on the specific caching policy, the CDN server either saves this copy on its disk or retains it in memory.

When another user from the same geographical area requests that particular movie, it's the CDN server – not the origin server – that fulfills the request.

notion image

While it might seem like CDNs are primarily for media files, their utility extends far beyond that.

Caching of Websites

Contrary to popular belief, CDNs are frequently used to serve website data. Consider this: when a user accesses a website, they are essentially requesting a bundle of HTML, JavaScript, and CSS files, along with various assets such as SVG icons and images. Depending on a website's complexity, these bundled files can be sizeable, significantly impacting the site's initial loading time.

And here's a crucial point to note: even minor delays in loading can have outsized impacts. Research indicates that a mere one-second delay can result in a 7% decrease in conversions and an 11% drop in page views. By reducing network latency through CDNs, commercial websites can directly boost views and, by extension, revenue.

In theory, a CDN can be employed to cache and deliver an entire website. However, as a website evolves and matures, it often features dynamic content tailored to specific regions or individual users.

Take an e-commerce platform as an example. Such a site might offer product recommendations based on a user's shopping history. Or it might host time-sensitive promotions, like flash sales. Picture a scenario where a sale is scheduled to end at 1 pm, but due to cached data, users can still access the discounted prices until 4 pm. Such oversight could result in substantial financial losses.

Addressing Cached Data Updates

While caching certainly has its advantages, updating cached data can pose a challenge, especially for data that isn't highly dynamic but still requires updates.

Consider this scenario: you're part of a team developing an exciting web application. After completing the latest development sprint, you're eager to roll out new features to your audience.

Aware of the CDN in use, you access your cloud provider's CLI to invalidate cached assets before deploying the new updates to your origin server. This should, in theory, prompt users to fetch all files directly from the origin server. But the reality is slightly more complex.

notion image

Here's why: Internet Service Providers (ISPs) often cache popular websites. This not only saves them on network costs but also eases the load on their infrastructure. Similarly, user browsers cache data to improve load times and save on bandwidth.

This multi-layered caching can mean that even if you've invalidated your CDN cache, it's uncertain if all users would experience the changes immediately.

So, given these intricacies, how should you approach updates? Instead of merely invalidating caches when making site modifications, it's essential first to ensure you're using the right cache headers. Beyond that, one effective strategy is cache busting. Let's dive deeper into how this works.

Expires and Max-age

Modern websites often set caching headers by default. However, for optimal results, these headers should be tailored according to the specific use-case at hand.

One such header is the expires header, which dictates the exact time when an object should be purged from the cache.

On the other hand, the max-age header determines the duration an object remains cached. It's commonly set to 86400 seconds (which equates to one day). So, if a website update is deployed, it's guaranteed to be visible to all users after 24 hours.

By fine-tuning these headers, there's no need to manually invalidate the CDN cache, ensuring all users see the updated content after the stipulated time. However, there's a caveat. These headers don't permit immediate content updates across all caches, making the content appear instantly for all users. This is where the technique of Cache Busting becomes invaluable.

Cache Busting

Cache Busting, while intricate, offers an efficient way to instantly roll out updates to every user.

Every time a new version of a web application is deployed, links to files and assets are modified. Since CDN caches perceive these as entirely new files due to the altered URLs, they're treated as such.

As a result, the first user accessing the domain post-update triggers the CDN to fetch the revamped bundle directly from the origin server, courtesy of the changed URL. This ensures the immediate delivery of the updated version to all users.

Remember, Cache Busting isn't just pertinent to websites. It's equally significant when caching media file chunks in a streaming context. This ensures users always get the latest, high-quality streaming content without delay.

Potential in Streaming Systems

Have you considered the immense potential for caching within the streaming system?

The metadata accompanying each media file, including details like the title, description, tags, and more, is crucial for generating each movie's unique page.

By caching these data elements, we could dramatically cut down on database reads, thereby ensuring quicker responses to user requests. Take, for instance, most streaming platforms; typically, about 10% of their content garners 90% of all views. Conversely, the remaining 90% of content only attracts 10% of views.

While caching metadata offers a tangible boost in performance, the more pressing concern remains minimizing the latency of the actual media file. Fortunately, a caching technique exists that facilitates the proximity of our media files to the end-users: the Content Delivery Network, or CDN.

notion image

The innovation I'm alluding to is known as a Content Delivery Network, commonly abbreviated as CDN. As discussed earlier in this lecture, a significant portion of the total latency is determined by the physical distance the content has to traverse from our server to the end-user. Therefore, regardless of the current throughput and available bandwidth, it's always beneficial to minimize the distance our data must cover.

This is precisely the problem CDNs are designed to address.

Advantages and Limitations

Let's conclude by summarizing the advantages and limitations of CDNs.

Advantages

Leveraging CDNs can markedly boost performance in a couple of key areas:

By serving content from proximate servers, latency is minimized, leading to an improved user experience.

It concurrently lightens the load on the system's primary servers since a significant portion of requests are managed by the CDN.

Limitations

If not configured appropriately, the content served may become outdated.

Immediate cache updates introduce an extra layer of complexity during deployment.

CDNs are primarily tailored for static content universally served to all users; hence, user-specific or time-sensitive content doesn't mesh well.

Though CDNs might seem expensive based on traffic, it's crucial to balance these costs against the potential overheads of forgoing a CDN.

Not every website benefits from a CDN. For sites where most users hail from a specific region, CDNs might offer minimal advantages. Conversely, for those serving a global audience, CDNs can be a pivotal asset.

Summary

That wraps up this lecture. You now possess a sound understanding of caching and its pivotal role in designing sophisticated systems. In the context of our ongoing streaming system architecture, integrating a CDN is non-negotiable.

But we've yet another challenge to surmount. All video metadata is stashed in the object store, complicating user searches for their desired movies. The underlying reason? Object stores utilize a hash-map data structure, much akin to key-value stores, and thus, don't support query languages.

A sprawling movie repository loses its sheen if users struggle to pinpoint their movie of choice. All our painstaking efforts to cut down latency would be for naught.

In our next lecture, we'll address this quandary and explore a formidable new system component that underpins state-of-the-art search functionalities.

/