Chat Application : WhatsApp


Table of Contents
  1. Functional Requirements
  2. Non-Functional Requirements
  3. Estimation
  4. High Level Design
  5. Followup Questions and Answers
  6. Intersting Facts about whatsapp

Functional Requirements

Non-Functional Requirements

Estimation

As of early 2025, WhatsApp reportedly handles 100 billion messages per day.This figure includes text messages, media, and voice/video calls.

Storage Estimation: Number of server assumption:
High Level Design
whatsapp

  1. Client Layer

  2. Connection with a WebSocket server:

  3. Message Flow (1:1 chat)
  4. The system performs the following steps to send messages from user A to user B:


  5. Send or receive media files:

  6. Support for group messages:

Non-functional Requirements

Followup Questions and Answers
1. Why do we need a WebSocket Manager when WebSocket servers can talk to each other?
  1. Trap: Interviewer checks if you understand scaling and mapping of billions of connections.
  2. Answer: Without a centralized manager, servers don’t know which user is connected to which server. A WebSocket Manager (with Redis/Consistent Hashing) efficiently maintains user-to-server mapping, enabling quick message routing. Direct server-to-server queries would add latency and complexity at scale.

2. How do you ensure message ordering across servers?
  1. Trap: Interviewer wants to see if you rely only on Kafka or think about sequencing.
  2. Answer: Use a sequencer (centralized or distributed with logical clocks) to assign increasing message IDs. Kafka preserves ordering within a partition, so partition messages by (chatId or groupId). This ensures strict FIFO order for each conversation or group.

3. What happens if the WebSocket Manager itself fails?
  1. Trap: Checks if you thought about single point of failure.
  2. Answer: WebSocket Manager must be replicated with leader-election (e.g., Raft/ZooKeeper/Etcd). Clients cache recent mappings, so even if the manager is temporarily unavailable, servers can still route messages until recovery.

4. How do you scale group chats with millions of users (like WhatsApp Broadcast)?
  1. Trap: They want to test group fan-out design.
  2. Answer: Don’t push to every user directly from one server. Instead:
    • Use Kafka (or Pulsar) for fan-out via partitions.
    • Each user’s WebSocket server consumes messages relevant to its connected users.
    • For very large groups, use sharding + multicast trees to reduce fan-out cost.

5. Why store messages in MySQL and not just Kafka?
  1. Trap: Checks if you confuse persistence with pub-sub.
  2. Answer: Kafka is a log, not a long-term store. MySQL (or Cassandra) provides durable, queryable history for compliance, replay, and recovery. Kafka retains only for short TTL, while MySQL ensures permanent storage (e.g., 30 days or more).

6. How do you deliver messages to offline users?
  1. Trap: Interviewer checks push notification handling.
  2. Answer: Store undelivered messages in durable DB (MySQL/Cassandra). When the user reconnects, fetch from DB and deliver. If still offline, trigger push notification via APNS/FCM. Expire after policy (30 days).

7. How do you achieve low latency across continents?
  1. Trap: Latency ≠ just WebSocket optimization.
  2. Answer: Deploy geo-distributed WebSocket servers. Use GeoDNS / Anycast for nearest-region routing. Redis clusters and Kafka clusters replicated across regions (with conflict resolution) ensure messages stay close to users.

8. How do you ensure exactly-once delivery?
  1. Trap: Messaging systems usually give at-least-once or at-most-once.
  2. Answer: Use idempotent message IDs + deduplication at receiver side. Store delivery receipts. If a duplicate arrives, ignore based on messageId. This simulates exactly-once semantics on top of at-least-once infra.

9. Why use Redis for caching user-server mappings?
  1. Trap: Checks if you know alternatives.
  2. Answer: Redis provides in-memory, distributed, and fast lookups (sub-ms). Alternatives like Consistent Hashing or gossip protocols exist, but Redis supports TTLs, pub-sub, and replication, making it ideal.

10. How do you handle media messages differently from text?
  1. Trap: They check if you know bandwidth/storage issues.
  2. Answer: Media files are uploaded to Blob storage/CDN. Only references (fileId, hash) are shared via WebSocket/DB. This avoids duplicating heavy payloads in message queues or DB.

11. How do you prevent one slow consumer in a group from delaying others?
  1. Trap: Classic backpressure problem.
  2. Answer: Use Kafka consumer groups per user. Each user’s queue is independent. Slow consumer does not affect others because partition offset is tracked separately.

12. How do you ensure security (end-to-end encryption) with this architecture?
  1. Trap: Tests if you confuse encryption in transit with E2E.
  2. Answer: Messages are encrypted on sender’s device using recipient’s public key (Signal Protocol). WebSocket servers, Kafka, MySQL only see ciphertext. Only recipient’s device can decrypt, ensuring E2E security.

Intersting Facts about WhatsApp
History

In January 2008, Jan Koum, who worked at Yahoo, tried to get a job at Facebook but didn’t make it. Instead of giving up, he kept going. The next year, he got an iPhone and saw how powerful the new App Store could be. This gave him an idea. Along with some old friends from Yahoo, he created WhatsApp, a simple app to send messages without the heavy costs of SMS. WhatsApp became super popular, with a million people signing up every single day!


Tech Stack:

Don’t reinvent the wheel everytime:

Ejabberd is an open-source real-time messaging server written in Erlang. And they built WhatsApp on top of ejabberd. Also they rewrote some of the ejabberd core components to meet their needs.
Besides WhatsApp leveraged third-party services such as Google Push to provide push notifications.


Scalability

They kept the team size small - 32 engineers.

WhatsApp is one of the most successful instant messengers in the market. In 2014, Facebook acquired WhatsApp for a whopping 19 billion USD. According to Forbes, Jan Koum has a net worth of 14 billion USD in 2023.

This lean yet powerful technology stack lets WhatsApp handle billions of messages daily with great speed, reliability, and uptime.


Resources: