Scale system from 0 to million users


Table of Contents
  1. Single server setup
  2. DNS Hierarchy
  3. Database
  4. Which databases to use?
  5. Vertical scaling vs horizontal scaling
  6. Load balancer
  7. Database replication
  8. what if one of the databases goes offline?
  9. Cache
  10. Considerations for using cache
  11. Content Delivery Network (CDN)
  12. Considerations for using CDN
  13. Stateful vs Stateless Web Tier Architecture
  14. Stateful Web Tier
  15. Stateless Web Tier
  16. Data centers
  17. Challenges in Multi-Data Center Setup
  18. Message queue
  19. Pros and Cons of Message
  20. Scaling with Logging, Metrics, and Aut
  21. Database scaling - Database Sharding
  22. Scaling System to Support Millions of Users

Single server setup

Below figure illustrates a single-server setup where all components, including the web application, database, and cache, are hosted on a single server.

SS

DNS Hierarchy
DNS

Database
DB

Which databases to use?

Vertical scaling vs horizontal scaling

Vertical scaling, referred to as scale up, means the process of adding more power (CPU,RAM, etc.) to your servers. Horizontal scaling, referred to as scale-out, allows you to scale by adding more servers into your pool of resources.

When traffic is low, vertical scaling is a great option, and the simplicity of vertical scaling is its main advantage.  Unfortunately, it comes with serious limitations.

 

Horizontal scaling is more desirable for large scale applications due to the limitations of vertical scaling.


Load balancer

If many users access the web server simultaneously and it reaches the web server’s load limit, users generally experience slower response or fail to connect to the server. A load balancer is the best technique to address these problems. A load balancer evenly distributes incoming traffic among web servers that are defined in a load-balanced set.

LB

In the setup shown in Figure, users connect to the load balancer's public IP, making the web servers inaccessible directly by clients for enhanced security. The load balancer communicates with web servers using private IPs, which are only accessible within the same network. This setup improves web tier availability and resolves failover issues. If one server goes offline, traffic is rerouted to another server, keeping the website online. As traffic increases, you can easily add more servers to the pool, and the load balancer will distribute requests among them automatically.


Database replication

The current design has one database, so it does not support failover and redundancy. Database replication is a common technique to address those problems.

A master database handles write operations like insert, delete, or update, while slave databases receive copies of the master’s data and only support read operations. Since most applications require more reads than writes, there are typically more slave databases than master databases. Figure illustrates a master database with multiple slave databases.

DBRep

Database replication enhances performance by distributing read operations across multiple slave nodes, increases reliability by preserving data across multiple locations, and ensures high availability by keeping the system operational even if one database goes offline.


what if one of the databases goes offline?

If a slave database fails, it's replaced with a new one, and read operations are redirected to other healthy slaves. If the master database goes offline, a slave is promoted to master, but data recovery may be needed to ensure it's up to date. Promoting a new master is complex and may require additional replication methods.

DBRep

Let us take a look at the design:


Cache

It's time to enhance load and response times by adding a cache layer and moving static content (like JavaScript, CSS, images, and videos) to a Content Delivery Network (CDN).

A cache is a temporary storage that keeps frequently accessed or resource-intensive data in memory, speeding up subsequent requests. Instead of repeatedly querying the database, cached data can be quickly retrieved, improving performance. The cache tier, a faster layer than the database, reduces database workload and can be scaled independently for better system efficiency.

Cache
Considerations for using cache

Content Delivery Network (CDN)

A CDN is a network of servers located in different regions to deliver static content like images, videos, and JavaScript files. When a user visits a website, the closest CDN server provides the content, improving load times. The closer a user is to the CDN server, the faster the website loads. For instance, users in Delhi will receive content quicker from a CDN server in Mumbai than users in Europe.

CDN
Considerations for using CDN
CDN
  1. Static assets (JS, CSS, images, etc.,) are no longer served by web servers. They are fetched from the CDN for better performance.
  2. The database load is lightened by caching data.

Stateful vs Stateless Web Tier Architecture
Stateful Web Tier
stateful
Stateless Web Tier
stateless

We move session data from the web tier to a persistent data store, such as a relational database, Memcached/Redis, or NoSQL database. NoSQL is often preferred for its scalability. Autoscaling allows web servers to be added or removed automatically based on traffic. With session data centralized, autoscaling the web tier becomes straightforward, as servers can be adjusted in response to traffic load.

stateless

Data centers

Figure shows a setup with two data centers. Normally, users are routed to the nearest data center using geoDNS, which directs traffic based on their location (e.g., x% to US-East and (100 – x)% to US-West).
If a data center goes offline, all traffic is redirected to the remaining healthy data center. For example, if US-West data center 2 is down, all traffic will be sent to the US-East data center 1.

DNS
Challenges in Multi-Data Center Setup

Message queue

A message queue is a system component that enables asynchronous communication by storing and managing messages between producers and consumers. Producers send messages to the queue, while consumers retrieve and process them independently, allowing for scalable and reliable operation in applications. This decoupling helps handle variable workloads and ensures that tasks are processed even if components are temporarily unavailable.

Pros and Cons of Message Queues
Pros
Cons

Message Queue RabbitMQ, Kafka, Amazon SQS, ActiveMQ, etc.


Scaling with Logging, Metrics, and Automation

As your website grows, investing in logging, metrics, and automation becomes essential:

logging

Figure shows an updated design that includes a message queue for better system resilience, along with logging, monitoring, metrics, and automation tools to enhance overall management.


Database scaling - Database Sharding

Database sharding is a horizontal scaling technique where a large database is divided into smaller, more manageable pieces called "shards." Each shard is an independent database that stores a subset of the data.

               

User data is allocated to a database server based on user IDs. Anytime you access data, a hash function is used to find the corresponding shard. In our example, user_id % 4 is used as the hash function. If the result equals to 0, shard 0 is used to store and fetch data. If the result equals to 1, shard 1 is used. The same logic applies to other shards.

sharding

Scaling System to Support Millions of Users
DB

Reference: System Design Interview: An Insider’s Guide by Alex Xu