Why there is server between client and encoder

Security & Authentication – Clients should not directly access internal encoding infrastructure; the server enforces auth, rate-limiting, validation, and quota checks.
Load Balancing & Orchestration – Server decides which encoder worker should process the video; encoders usually run in a distributed cluster.
Metadata Handling – Server extracts and stores metadata (title, description, tags, thumbnails) in a Metadata DB while passing video to encoder.
Asynchronous Processing – Upload is long-running, so the server enqueues the job in a message queue and returns success immediately; encoder workers pick jobs asynchronously.
Error Handling & Retries – If encoding fails, the server can retry, redirect to another encoder, or notify the client.
Separation of Concerns – Server acts as API + control plane; encoder acts as data plane (CPU/GPU processing); both can scale independently.
Audit & Analytics – Server logs uploads and processing details for monitoring, billing, and abuse prevention.

API design

1. Upload Video API

Method: POST /api/v1/videos
Headers:
- Authorization: Bearer <token>
- Content-Type: multipart/form-data

Body:

{
    "title": "My Travel Vlog",
    "description": "Exploring Bali",
    "tags": ["travel", "vlog", "bali"],
    "file": <binary_video_file>
    }

Response:

{
    "videoId": "vid_12345",
    "status": "processing"
    }

2. Stream Video API

Method: GET /api/v1/videos/{videoId}/stream
Headers:
- Authorization: Bearer <token> (optional for public videos)
Query Params: quality=720p, format=HLS
Response: Returns .m3u8 playlist or video chunks (served via CDN)

3. Search Videos API

Method: GET /api/v1/videos/search
Query Params:
- q=travel vlog
- page=1
- limit=20

Response:

{
    "results": [
        {
        "videoId": "vid_12345",
        "title": "My Travel Vlog",
        "thumbnailUrl": "https://cdn.example.com/thumbs/vid_12345.jpg",
        "views": 100000
        }
    ]
    }

4. Like / Dislike Video API

Method:
- POST /api/v1/videos/{videoId}/like
- POST /api/v1/videos/{videoId}/dislike
Headers: Authorization: Bearer <token>

Response:

{
    "videoId": "vid_12345",
    "likes": 1023,
    "dislikes": 45
    }

5. Comment on Video API

Method: POST /api/v1/videos/{videoId}/comments
Headers: Authorization: Bearer <token>

Body:

{
    "comment": "Amazing video! Keep it up 👏"
    }

Response:

{
    "commentId": "cmt_98765",
    "videoId": "vid_12345",
    "userId": "usr_1111",
    "comment": "Amazing video! Keep it up 👏",
    "createdAt": "2025-08-18T10:00:00Z"
    }

6. Get Thumbnails API

Method: GET /api/v1/videos/{videoId}/thumbnails

Response:

{
            "thumbnails": [
                    {"resolution": "120x90", "url": "https://cdn.example.com/thumbs/vid_12345_120x90.jpg"},
                    {"resolution": "480x360", "url": "https://cdn.example.com/thumbs/vid_12345_480x360.jpg"},
                    {"resolution": "1280x720", "url": "https://cdn.example.com/thumbs/vid_12345_1280x720.jpg"}
                ]
            }

Detail Design and Flow Using AWS

📌 Overview

High-level YouTube-like architecture implemented on AWS
Goals: durability, scalability, low latency, global delivery
Core components: CloudFront, ALB, Web/App servers, S3, MediaConvert, DynamoDB, RDS

🏗️ Components & Responsibilities

User → Browser or mobile client
Amazon CloudFront (CDN) → Edge caching for low latency
ALB (Application Load Balancer) → Routes traffic, SSL termination
Web Servers → Serve UI, session, static content
Application Servers → Business logic, uploads, metadata
S3 Upload Bucket → Temporary raw video uploads
AWS MediaConvert → Transcoding to multiple bitrates
S3 Blob Storage → Stores transcoded video assets
DynamoDB → Video metadata, encoding status
RDS/DynamoDB → User profiles, watch history

▶️ Watch Flow

User requests a video (client → CloudFront)
CloudFront checks cache:
- Hit → Serve directly from edge
- Miss → Forward to origin (S3 or ALB)
App server fetches metadata (DynamoDB/RDS)
Generates signed URLs / playback manifest
Client streams video via CloudFront (adaptive bitrate)

⬆️ Upload Flow

User uploads video (client → ALB → App server)
App server generates pre-signed S3 upload URL
User uploads raw video → S3 Upload Bucket
App server updates metadata (status = uploaded)
Triggers MediaConvert job → encodes video → stores in S3 Blob Storage
Metadata updated (status = ready, add final URLs)
CloudFront serves transcoded video to users

💾 Data & Storage Patterns

S3 → Large objects, lifecycle to Glacier for cold storage
DynamoDB → Metadata (videoId → JSON metadata)
RDS → Relational data (users, transactions, watch history)
Use signed URLs & IAM roles for secure access

⚡ Scalability & Reliability

Autoscale App/Web servers (EKS, EC2 ASG)
CloudFront reduces latency and origin load
S3 multipart/resumable uploads for large files
Asynchronous transcoding with queues (SQS/EventBridge)
RDS with Multi-AZ, read replicas, backups

🔒 Security & Monitoring

HTTPS everywhere (TLS via CloudFront/ALB)
IAM roles (no static keys)
Audit: CloudTrail + GuardDuty
Monitoring: CloudWatch metrics, logs, alarms
Protection: WAF, rate limiting, bot detection

📌 Fulfilling requirements using AWS

⚡ Low Latency / Smooth Streaming

CloudFront (CDN) → Edge caching near ISPs ensures low latency video delivery.
S3 + EFS + DynamoDB → Different storage for blobs, metadata, thumbnails for optimized access.
ElastiCache (Redis/Memcached) → Distributed caching for metadata & frequently accessed content.
CloudFront Memory Cache + Regional Edge Caches → Serve most-viewed content quickly.

📈 Scalability

Auto Scaling Groups (EC2/EKS) → Horizontally scale web & app servers.
Aurora / DynamoDB → Scale beyond traditional MySQL limits (Aurora auto-scaling, DynamoDB partitions).
S3 → Virtually unlimited object storage, automatically scales with demand.

✅ Availability

Multi-AZ RDS / DynamoDB Global Tables → Redundancy across AZs & regions.
S3 Cross-Region Replication → Ensures data durability & disaster recovery.
Route 53 + Global Accelerator → Traffic steering to healthy regions.
ALB/NLB → Exclude unhealthy servers automatically.

🔒 Reliability

DynamoDB Partitioning → Data sharding avoids bottlenecks.
Redundant Hardware in AWS AZs → Built-in fault tolerance.
CloudWatch + Health Checks → Heartbeat monitoring, auto-removal of faulty nodes.
Consistent Hashing via DynamoDB / ElastiCache → Smooth scaling, balanced load distribution.

Followup Questions and Answers

1. Why do we need a server between the client and the encoder?

The server acts as a secure gateway performing authentication, authorization, and throttling.
It orchestrates encoding jobs asynchronously, extracts and stores metadata, manages retries upon failure, and decouples encoding workload from client requests for better scalability.

2. How does the video upload process work end-to-end?

Users upload videos via the API to the Application Server, which generates a pre-signed S3 upload URL.
The raw video is uploaded to an S3 Upload Bucket asynchronously.
The Application Server updates metadata to 'uploaded' state, triggers AWS MediaConvert for transcoding, and stores encoded output in Blob Storage.
Finally, metadata is updated with ready status and CDN URLs.

3. How is scalability achieved in this architecture?

Using AWS Auto Scaling Groups for Web and Application servers, partitioned and globally distributed databases (DynamoDB, Aurora), and a highly scalable object store (S3).
CDN (CloudFront) caches content globally, reducing origin load and latency.
Asynchronous processing via SQS/EventBridge further decouples services, allowing independent scaling.

4. How is low latency and smooth streaming ensured?

CloudFront's edge locations cache video chunks close to users, minimizing latency.
Adaptive bitrate streaming is supported through AWS MediaConvert, allowing clients to switch video quality based on network conditions.
ElastiCache (Redis/Memcached) speeds up metadata retrieval.

5. What databases are used and why?

DynamoDB for metadata and encoding status because of its low latency and scalable key-value capabilities.
Amazon RDS/Aurora for relational data such as user profiles and transactions, benefiting from ACID compliance and multi-AZ reliability.

6. How is data durability and availability ensured?

S3 offers 11 nines durability with cross-region replication for disaster recovery.
RDS operates in multi-AZ mode with automated backups and failovers.
DynamoDB global tables replicate data across regions.
Health checks and auto-scaling maintain availability.

7. How is security managed in this architecture?

All API endpoints use HTTPS with TLS.
IAM roles grant least privilege access to resources.
Pre-signed URLs provide time-limited access for uploads.
AWS WAF protects APIs.
CloudTrail and GuardDuty monitor and alert for suspicious activities.

8. How does the system handle video encoding failures?

Encoding jobs are processed asynchronously with reliable queues.
On failure, jobs can be retried or redirected to alternate encoding workers.
The Application Server monitors job statuses and notifies clients or triggers alerts for manual intervention.

9. Why use both S3 Upload Bucket and Blob Storage?

The Upload Bucket temporarily holds raw user uploads separate from encoded video files.
This separation helps manage lifecycle policies, access controls, and provides controlled ingestion for the encoding pipeline.
Blob Storage holds encoded, optimized video chunks ready for streaming.

10. How are user interactions like likes, comments handled?

These are typically stored in DynamoDB for low latency writes and reads.
Transactional operations requiring strong consistency might use RDS.
Caching with ElastiCache improves performance for frequently accessed data.

11. What kind of CDN caching strategy is used?

CloudFront uses edge caching with regional edge caches.
Popular videos are quickly available without hitting origin.
Cache invalidation policy ensures updated content availability.
Signed URLs restrict unauthorized CDN access.

12. How does the system support different video qualities and formats?

AWS MediaConvert transcodes uploaded videos into multiple resolutions and streaming formats (HLS/DASH).
Video manifests (.m3u8) allow client players to select appropriate streams dynamically based on bandwidth.

Design Youtube/Netflix

Table of Contents

Functional Requirements

Nonfunctional Requirements

Resource Estimation

Storage Estimation

Bandwidth Estimation

Number of Servers Estimation

Building Blocks

High Level Design - youtube

Why there is server between client and encoder

API design

1. Upload Video API

2. Stream Video API

3. Search Videos API

4. Like / Dislike Video API

5. Comment on Video API

6. Get Thumbnails API

Detail Design and Flow Using AWS

📌 Overview

📌 Fulfilling requirements using AWS

⚡ Low Latency / Smooth Streaming

📈 Scalability

✅ Availability

🔒 Reliability

Followup Questions and Answers

1. Why do we need a server between the client and the encoder?

2. How does the video upload process work end-to-end?

3. How is scalability achieved in this architecture?

4. How is low latency and smooth streaming ensured?

5. What databases are used and why?

6. How is data durability and availability ensured?

7. How is security managed in this architecture?

8. How does the system handle video encoding failures?

9. Why use both S3 Upload Bucket and Blob Storage?

10. How are user interactions like likes, comments handled?

11. What kind of CDN caching strategy is used?

12. How does the system support different video qualities and formats?