Building Blocks Of System Design

Domain Name System (DNS)

DNS (Domain Name System) is a hierarchical system that translates human-readable domain names (like www.example.com) into IP addresses (like 192.0.2.1).
It functions like a phonebook of the internet, making it easier for users to access websites without remembering complex IP addresses.
DNS servers store records of domain names and their corresponding IP addresses, ensuring efficient and accurate routing of internet traffic.
DNS resolution involves several steps, including contacting root, top-level domain (TLD), and authoritative name servers to retrieve the correct IP address for a domain.

Cloud services:

Amazon Route 53 (AWS)
Google Cloud DNS
Azure DNS (Microsoft Azure)
Cloudflare DNS
Alibaba Cloud DNS

Load Balancer

A load balancer distributes incoming network traffic evenly across multiple servers.
It ensures optimal resource utilization and prevents overloading any single server.
Enhances system reliability by automatically routing traffic away from failed or unhealthy servers.
Improves application performance by reducing downtime and improving response times.
Can work at different layers (Layer 4 or Layer 7) based on traffic types, like TCP or HTTP.

Cloud services:

Amazon Elastic Load Balancing (ELB) - AWS
Google Cloud Load Balancing - Google Cloud Platform
Azure Load Balancer - Microsoft Azure
Cloudflare Load Balancing
Alibaba Cloud SLB (Server Load Balancer)

Database

Data Management: Involves storing, retrieving, modifying, and deleting data as part of various data-processing tasks.
Types of Databases: Includes relational databases (e.g., SQL databases) and non-relational databases (e.g., NoSQL databases).
Database Replication: Ensures data redundancy and availability by copying data across multiple database instances.
Partitioning: Distributes data across multiple servers or locations to optimize performance and scalability.
Distributed Databases: Analyzes data across multiple sites or nodes, allowing for improved access and performance.

Cloud services:

Amazon RDS
Google Cloud SQL
Azure Cosmos DB
AWS DynamoDB
Google Firestore
IBM Cloud Databases
AWS Aurora
MongoDB Atlas
Firebase Realtime Database

Key-Value Store

A key-value store is a type of non-relational database that uses a simple key-value pair structure for storing data. Here are some key points:

Simplicity: Key-value stores offer a straightforward way to store and retrieve data, making them easy to use.
Scalability: They are designed to scale horizontally, allowing for the handling of large volumes of data across multiple servers.
Performance: Key-value stores provide fast data access due to their minimalistic design.
Flexibility: They support various data types and can be easily configured to meet specific application needs.

Cloud services:

Amazon DynamoDB
Redis
Google Cloud Firestore
Azure Cosmos DB
IBM Cloudant

Content Delivery Network (CDN)

A CDN is a distributed network of servers that delivers content to users based on their geographic location.
It reduces latency by caching content closer to the end-user, improving load times for websites and applications.
CDNs enhance the reliability and availability of content by distributing the load across multiple servers.
They help mitigate traffic spikes and DDoS attacks by distributing the network load.
CDNs are essential for optimizing performance in media delivery, such as video streaming and gaming.

Cloud services:

Amazon CloudFront
Microsoft Azure CDN
Google Cloud CDN
Akamai
Cloudflare
Fastly
StackPath

Sequencer

Purpose: A sequencer is designed to generate unique identifiers (IDs) while ensuring the preservation of causality in distributed systems.
Techniques: It can utilize various methods for ID generation, including UUIDs (Universally Unique Identifiers) and Snowflake ID generators, which were popularized by Twitter.
Scalability: The system must be scalable to handle a high volume of requests without collisions in ID generation.
Performance: The ID generation process should be efficient to minimize latency in applications requiring rapid ID creation.
Causality Maintenance: Ensuring that the order of ID generation reflects the order of events in the system is crucial for maintaining consistency.

Cloud services:

AWS DynamoDB with Auto-Increment feature
Google Cloud Firestore
Azure Cosmos DB with Unique ID generation
Alibaba Cloud Table Store
FaunaDB with unique ID capabilities

Service Monitoring Overview

Importance: Monitoring systems are essential for analyzing the performance of distributed systems.
Alerting: They provide alerts to stakeholders in case of issues, enabling timely responses.
Components: Both server-side and client-side monitoring systems are utilized.
Tools: Popular tools include Amazon CloudWatch, Prometheus, and Grafana for effective monitoring.

Cloud services:

Amazon CloudWatch
Google Cloud Operations Suite (formerly Stackdriver)
Microsoft Azure Monitor
Prometheus
Grafana
Datadog
New Relic
Splunk
Elastic Observability
AppDynamics

Distributed Caching

Definition: Distributed caching involves multiple cache servers that work together to store and retrieve frequently accessed data, improving performance and reducing latency.
Benefits: Enhances application speed, reduces database load, and provides a scalable solution to manage large volumes of data access.
Coordination: Cache servers coordinate to ensure consistency and reliability of cached data across different nodes.
Tools: Popular tools for implementing distributed caching include in-memory data stores like Redis and Memcached.

Cloud services:

Amazon ElastiCache (Redis & Memcached)
Google Cloud Memorystore (Redis)
Azure Cache for Redis
IBM Cloud Databases for Redis
Redis Enterprise Cloud

Distributed Messaging Queue

A distributed messaging queue allows communication between producers and consumers across multiple servers.
It enhances system scalability by enabling independent scaling of components.
Reliability is improved as messages can be stored and processed even if one part of the system is down.
Common use cases include event-driven architectures, microservices, and data streaming.

Cloud services:

Amazon SQS
Apache Kafka
Google Cloud Pub/Sub
Microsoft Azure Service Bus
RabbitMQ on Cloud
IBM Cloud Message Queue
Red Hat AMQ
Redis Streams

Publish-Subscribe System

The publish-subscribe (pub-sub) system is an asynchronous messaging pattern.
It allows multiple producers (publishers) to send messages to multiple consumers (subscribers).
Pub-sub systems decouple message senders from receivers, enhancing scalability.
Commonly used in serverless and microservices architectures to enable event-driven communication.

Cloud services:

Google Cloud Pub/Sub
Amazon Simple Notification Service (SNS)
Azure Service Bus
Apache Kafka (available on various cloud platforms)
IBM Cloud Event Streams

Rate Limiter Overview

Definition: A Rate Limiter controls the number of requests a user can make to a service within a specified time frame.
Purpose: It protects services from being overwhelmed by excessive requests, ensuring fair usage and availability.
Implementation: Can be integrated into APIs using gateways that provide built-in rate-limiting features.
Benefits: Enhances security, prevents abuse, and improves service performance by managing traffic effectively.

Cloud services:

Amazon API Gateway
Google Cloud Endpoints
Azure API Management
Cloudflare Rate Limiting
IBM API Gateway

Distributed Search

Definition: A system designed to process user queries and return relevant content quickly.
Key Components: Includes crawling, indexing, and searching to ensure efficient data retrieval.
Scalability: Supports handling large volumes of data across multiple servers, enhancing performance.
Real-time Results: Provides timely responses to user queries, improving user experience.
Common Tools: Elasticsearch and Amazon OpenSearch are widely used for building distributed search systems.

Cloud services:

Amazon OpenSearch Service
Elasticsearch Service on Elastic Cloud
Azure Cognitive Search
Google Cloud Search

Distributed Logging

Definition: Distributed logging involves collecting logs from multiple services or applications running in different environments.
Importance: Efficient logging is crucial in distributed systems to monitor application performance and troubleshoot issues.
Challenges: High I/O operations can lead to bottlenecks; thus, optimizing log collection and storage is essential.
Solutions: Cloud-based logging tools streamline the logging process, ensuring scalability and reliability.
Usage: Logs can be aggregated, analyzed, and visualized for better insights into system behavior.

Cloud services:

Amazon CloudWatch Logs
Google Cloud Logging
Elasticsearch (with Kibana for visualization)
Azure Monitor Logs
Splunk

Distributed Tracing

Definition: Distributed tracing is a method used to monitor and track the flow of requests through various services in a microservices architecture.
Purpose: It helps developers identify performance bottlenecks, latency issues, and failures within complex distributed systems.
Granularity: Provides detailed insights into how long requests take at each service level, enabling effective troubleshooting and optimization.
Visualization: Generates trace visualizations to understand the path of requests and the interactions between different services.

Cloud services:

AWS X-Ray
Google Cloud Trace
Zipkin
OpenTelemetry
Azure Monitor

Distributed Task Scheduling

Distributed task scheduling involves managing and allocating resources to tasks across multiple nodes or systems.
The scheduler ensures that both task-level objectives (e.g., timely completion) and system-level goals (e.g., load balancing) are achieved.
It is essential for automating workflows, optimizing performance, and managing dependencies between tasks.
Common in cloud environments, it allows tasks to be distributed across different regions, improving scalability and fault tolerance.
Used for large-scale applications with complex workflows and data pipelines.

Cloud services:

Amazon Step Functions
Apache Airflow (Managed on Google Cloud, AWS, etc.)
Google Cloud Tasks
Azure Logic Apps
Apache Kafka (for event-driven scheduling)

Blob Store

Definition: A storage solution specifically designed for unstructured data, including multimedia files, images, videos, and binary executables.
Characteristics: Blob storage is optimized for storing large amounts of data and allows for efficient access and retrieval.
Accessibility: Typically accessed via REST APIs, making it suitable for web and mobile applications.
Scalability: Can scale to accommodate vast amounts of data without the need for extensive infrastructure management.
Use Cases: Commonly used for backup and disaster recovery, big data analytics, and content distribution.

Cloud services:

Amazon S3 (Simple Storage Service)
Microsoft Azure Blob Storage
Google Cloud Storage
IBM Cloud Object Storage
DigitalOcean Spaces