Building Blocks Of System Design

Domain Name System (DNS)
- DNS (Domain Name System) is a hierarchical system that translates human-readable domain names (like www.example.com) into IP addresses (like 192.0.2.1).
- It functions like a phonebook of the internet, making it easier for users to access websites without remembering complex IP addresses.
- DNS servers store records of domain names and their corresponding IP addresses, ensuring efficient and accurate routing of internet traffic.
- DNS resolution involves several steps, including contacting root, top-level domain (TLD), and authoritative name servers to retrieve the correct IP address for a domain.
Cloud services:
- Amazon Route 53 (AWS)
- Google Cloud DNS
- Azure DNS (Microsoft Azure)
- Cloudflare DNS
- Alibaba Cloud DNS
Load Balancer
- A load balancer distributes incoming network traffic evenly across multiple servers.
- It ensures optimal resource utilization and prevents overloading any single server.
- Enhances system reliability by automatically routing traffic away from failed or unhealthy servers.
- Improves application performance by reducing downtime and improving response times.
- Can work at different layers (Layer 4 or Layer 7) based on traffic types, like TCP or HTTP.
Cloud services:
- Amazon Elastic Load Balancing (ELB) - AWS
- Google Cloud Load Balancing - Google Cloud Platform
- Azure Load Balancer - Microsoft Azure
- Cloudflare Load Balancing
- Alibaba Cloud SLB (Server Load Balancer)
Database
- Data Management: Involves storing, retrieving, modifying, and deleting data as part of various data-processing tasks.
- Types of Databases: Includes relational databases (e.g., SQL databases) and non-relational databases (e.g., NoSQL databases).
- Database Replication: Ensures data redundancy and availability by copying data across multiple database instances.
- Partitioning: Distributes data across multiple servers or locations to optimize performance and scalability.
- Distributed Databases: Analyzes data across multiple sites or nodes, allowing for improved access and performance.
Cloud services:
- Amazon RDS
- Google Cloud SQL
- Azure Cosmos DB
- AWS DynamoDB
- Google Firestore
- IBM Cloud Databases
- AWS Aurora
- MongoDB Atlas
- Firebase Realtime Database
Key-Value Store
A key-value store is a type of non-relational database that uses a simple key-value pair structure for storing data. Here are some key points:
- Simplicity: Key-value stores offer a straightforward way to store and retrieve data, making them easy to use.
- Scalability: They are designed to scale horizontally, allowing for the handling of large volumes of data across multiple servers.
- Performance: Key-value stores provide fast data access due to their minimalistic design.
- Flexibility: They support various data types and can be easily configured to meet specific application needs.
Cloud services:
- Amazon DynamoDB
- Redis
- Google Cloud Firestore
- Azure Cosmos DB
- IBM Cloudant
Content Delivery Network (CDN)
- A CDN is a distributed network of servers that delivers content to users based on their geographic location.
- It reduces latency by caching content closer to the end-user, improving load times for websites and applications.
- CDNs enhance the reliability and availability of content by distributing the load across multiple servers.
- They help mitigate traffic spikes and DDoS attacks by distributing the network load.
- CDNs are essential for optimizing performance in media delivery, such as video streaming and gaming.
Cloud services:
- Amazon CloudFront
- Microsoft Azure CDN
- Google Cloud CDN
- Akamai
- Cloudflare
- Fastly
- StackPath
Sequencer
- Purpose: A sequencer is designed to generate unique identifiers (IDs) while ensuring the preservation of causality in distributed systems.
- Techniques: It can utilize various methods for ID generation, including UUIDs (Universally Unique Identifiers) and Snowflake ID generators, which were popularized by Twitter.
- Scalability: The system must be scalable to handle a high volume of requests without collisions in ID generation.
- Performance: The ID generation process should be efficient to minimize latency in applications requiring rapid ID creation.
- Causality Maintenance: Ensuring that the order of ID generation reflects the order of events in the system is crucial for maintaining consistency.
Cloud services:
- AWS DynamoDB with Auto-Increment feature
- Google Cloud Firestore
- Azure Cosmos DB with Unique ID generation
- Alibaba Cloud Table Store
- FaunaDB with unique ID capabilities
Service Monitoring Overview
- Importance: Monitoring systems are essential for analyzing the performance of distributed systems.
- Alerting: They provide alerts to stakeholders in case of issues, enabling timely responses.
- Components: Both server-side and client-side monitoring systems are utilized.
- Tools: Popular tools include Amazon CloudWatch, Prometheus, and Grafana for effective monitoring.
Cloud services:
- Amazon CloudWatch
- Google Cloud Operations Suite (formerly Stackdriver)
- Microsoft Azure Monitor
- Prometheus
- Grafana
- Datadog
- New Relic
- Splunk
- Elastic Observability
- AppDynamics
Distributed Caching
- Definition: Distributed caching involves multiple cache servers that work together to store and retrieve frequently accessed data, improving performance and reducing latency.
- Benefits: Enhances application speed, reduces database load, and provides a scalable solution to manage large volumes of data access.
- Coordination: Cache servers coordinate to ensure consistency and reliability of cached data across different nodes.
- Tools: Popular tools for implementing distributed caching include in-memory data stores like Redis and Memcached.
Cloud services:
- Amazon ElastiCache (Redis & Memcached)
- Google Cloud Memorystore (Redis)
- Azure Cache for Redis
- IBM Cloud Databases for Redis
- Redis Enterprise Cloud
Distributed Messaging Queue
- A distributed messaging queue allows communication between producers and consumers across multiple servers.
- It enhances system scalability by enabling independent scaling of components.
- Reliability is improved as messages can be stored and processed even if one part of the system is down.
- Common use cases include event-driven architectures, microservices, and data streaming.
Cloud services:
- Amazon SQS
- Apache Kafka
- Google Cloud Pub/Sub
- Microsoft Azure Service Bus
- RabbitMQ on Cloud
- IBM Cloud Message Queue
- Red Hat AMQ
- Redis Streams
Publish-Subscribe System
- The publish-subscribe (pub-sub) system is an asynchronous messaging pattern.
- It allows multiple producers (publishers) to send messages to multiple consumers (subscribers).
- Pub-sub systems decouple message senders from receivers, enhancing scalability.
- Commonly used in serverless and microservices architectures to enable event-driven communication.
Cloud services:
- Google Cloud Pub/Sub
- Amazon Simple Notification Service (SNS)
- Azure Service Bus
- Apache Kafka (available on various cloud platforms)
- IBM Cloud Event Streams
Rate Limiter Overview
- Definition: A Rate Limiter controls the number of requests a user can make to a service within a specified time frame.
- Purpose: It protects services from being overwhelmed by excessive requests, ensuring fair usage and availability.
- Implementation: Can be integrated into APIs using gateways that provide built-in rate-limiting features.
- Benefits: Enhances security, prevents abuse, and improves service performance by managing traffic effectively.
Cloud services:
- Amazon API Gateway
- Google Cloud Endpoints
- Azure API Management
- Cloudflare Rate Limiting
- IBM API Gateway
Distributed Search
- Definition: A system designed to process user queries and return relevant content quickly.
- Key Components: Includes crawling, indexing, and searching to ensure efficient data retrieval.
- Scalability: Supports handling large volumes of data across multiple servers, enhancing performance.
- Real-time Results: Provides timely responses to user queries, improving user experience.
- Common Tools: Elasticsearch and Amazon OpenSearch are widely used for building distributed search systems.
Cloud services:
- Amazon OpenSearch Service
- Elasticsearch Service on Elastic Cloud
- Azure Cognitive Search
- Google Cloud Search
Distributed Logging
- Definition: Distributed logging involves collecting logs from multiple services or applications running in different environments.
- Importance: Efficient logging is crucial in distributed systems to monitor application performance and troubleshoot issues.
- Challenges: High I/O operations can lead to bottlenecks; thus, optimizing log collection and storage is essential.
- Solutions: Cloud-based logging tools streamline the logging process, ensuring scalability and reliability.
- Usage: Logs can be aggregated, analyzed, and visualized for better insights into system behavior.
Cloud services:
- Amazon CloudWatch Logs
- Google Cloud Logging
- Elasticsearch (with Kibana for visualization)
- Azure Monitor Logs
- Splunk
Distributed Tracing
- Definition: Distributed tracing is a method used to monitor and track the flow of requests through various services in a microservices architecture.
- Purpose: It helps developers identify performance bottlenecks, latency issues, and failures within complex distributed systems.
- Granularity: Provides detailed insights into how long requests take at each service level, enabling effective troubleshooting and optimization.
- Visualization: Generates trace visualizations to understand the path of requests and the interactions between different services.
Cloud services:
- AWS X-Ray
- Google Cloud Trace
- Zipkin
- OpenTelemetry
- Azure Monitor
Distributed Task Scheduling
- Distributed task scheduling involves managing and allocating resources to tasks across multiple nodes or systems.
- The scheduler ensures that both task-level objectives (e.g., timely completion) and system-level goals (e.g., load balancing) are achieved.
- It is essential for automating workflows, optimizing performance, and managing dependencies between tasks.
- Common in cloud environments, it allows tasks to be distributed across different regions, improving scalability and fault tolerance.
- Used for large-scale applications with complex workflows and data pipelines.
Cloud services:
- Amazon Step Functions
- Apache Airflow (Managed on Google Cloud, AWS, etc.)
- Google Cloud Tasks
- Azure Logic Apps
- Apache Kafka (for event-driven scheduling)
Blob Store
- Definition: A storage solution specifically designed for unstructured data, including multimedia files, images, videos, and binary executables.
- Characteristics: Blob storage is optimized for storing large amounts of data and allows for efficient access and retrieval.
- Accessibility: Typically accessed via REST APIs, making it suitable for web and mobile applications.
- Scalability: Can scale to accommodate vast amounts of data without the need for extensive infrastructure management.
- Use Cases: Commonly used for backup and disaster recovery, big data analytics, and content distribution.
Cloud services:
- Amazon S3 (Simple Storage Service)
- Microsoft Azure Blob Storage
- Google Cloud Storage
- IBM Cloud Object Storage
- DigitalOcean Spaces