Introduction
Netflix is one of the Big Tech companies that provides a streaming service that offers a wide variety of TV shows, movies, anime, documentaries, and more on thousands of internet-connected.
Netflix is available across many countries, and they offer different movies/shows to each country. It is important to take note of that they offer limited TV shows for some countries. For example, some Korean Dramas are only available in South Korea.
In this article, I will share my analysis of how their database would be structured. I will analyze their pages, such as:
I am a freshman at university, not a professional software engineer, so there could be a misinterpretation. It is my pleasure if you can leave some comments on how you would alter what I have done or which part I can make improvements.
Initial Thought Process
This is the brainstorming I have done. The first table to be created is the Film. Under the film, there are season table, episode table, trailers & teasers table, similar movies, etc. So, the foreign keys could be EPISODE_ID, LANG_ID, COUNTRY_ID, SEASON_ID, GENRE_ID, etc.
One thing I noticed at the brainstorming stage is that Big Tech companies like Netflix would value users' activities, such as how much time they spent on using their app. So, you will be able to notice that I included attributes for user activities.
Other than that, afilm must have a brief storyline, and each episode must have a brief storyline as well. Each film and its episode would have thumbnails as well.
Moreover, the actress can be managed by two tables. Actor and Cast Members. Each actor would have movies that they have participated, so the Actor table would have a CAST_MEMBER_ID as a foreign key.
Database Structure
Entity Relationship Diagram (ERD)
I converted the tables into the ERD to show the relationship between each entity.
Since Netflix would have many tables for their system, there are many entities, so the photo is a bit unclear to see. I am sorry about that.
What I learned
I learned how companies store users' data in their database and use it to enhance the user experience. For example, RESULT_EXISTS attribute in SearchResult table (right bottom corner) determines if the matching movie/TV show could be found. If you have used Netflix, you would encounter a situation where you cannot find a show. By analyzing the data in the RESULT_EXISTS attribute, Netflix can decide if they need to release a new show. They would want to do that if a million of users have searched it, and were not able to find the show.
Moreover, there are two attributes of the SearchResult table, which are EPISODE_CLICK_DATE and EPLISODE_EXIT_DATE. These two attributes store the data of when the user clicked on a certain episode/film and when they exited by searching. This allows Netflix to know how much time the user has remained on a certain page.
Now I can see how companies would make business decisions by looking at the data of the user activities. Nowadays, it is called big data.
Challenges
Netflix is used by people from different countries, hence, it has different a subscription price, available movie, language and policies. So, I had to create extra tables, such as country, subscriptionPrice, filmAvailableCountry, and filmLanguage.
It was challenging because I had never done anything like this before. However, it gave me a better idea of how services like Netflix would have such database structure. I can use this knowledge to create a better DB design for my next practice.
Additionals
While I was thinking about how their structure would be like, I got curious about how Netflix stores all the videos and handles a huge amount of traffic. Especially, videos require a large storage amount, and Netflix has millions of videos that have a quite long running time. So, I did some research.
Here is what I found:
Distributed File System (DFS): Netflix likely employs a distributed file system, such as the Hadoop Distributed File System (HDFS) or a custom-built solution. These file systems allow large files (such as videos) to be divided into smaller chunks and distributed across multiple servers and nodes.
Content Distribution Network (CDN): You might have heard of this term. CDNs consist of geographically distributed servers strategically placed in data centers across different regions. How this works is.. when a user requests a video, the CDN identifies the closest servers around the user. This reduces latency and improves video streaming performance.
Caching: Caching refers to saving the frequently visited or watched videos, so that the user does not need to download it every time. Caching is also often used for websites. The browser will cache the website that you frequently visit. Caching can result in reducing the load on the database and improves streaming performance by serving content from local storage.