. Design Real-Time Chat Application (Slack/Discord) - Frontend System Design Interview Guide

Hard

Design a production-ready real-time chat application like Slack, Discord, or Microsoft Teams with channels, direct messages, presence indicators, and rich media support.

Backend as Black Box: Assume you have APIs for messages and real-time updates. Focus on the frontend architecture.

Key Challenges

This problem explores real-time communication challenges:

  • Message Delivery: Send, receive, and sync messages reliably
  • Message States: Sent → Delivered → Read with visual indicators
  • Presence System: Online/Away/Offline status with typing indicators
  • Offline Support: Queue messages, sync on reconnect
  • Rich Content: Attachments, embeds, reactions, threads
Quick Links:

When users send messages, they expect instant delivery, real-time presence updates, and reliable offline support—even with thousands of users per channel. This solution designs a real-time chat application that handles message delivery states, presence systems, and offline queuing while maintaining clear boundaries between server state and UI state. The key insight: a good chat app uses optimistic updates for instant feedback, cursor pagination for stable history, and graceful reconnection handling.

HLD interview focus: Requirements, architecture, tradeoffs, data flow, and scaling decisions. Any implementation snippets shown are optional unless explicitly asked.

I'll start by defining what makes a great chat experience—what questions does a user need answered as they send and receive messages? Then I'll design the architecture that handles message delivery states, presence systems, and offline queuing. Finally, I'll design clear boundaries between server state and UI state.

Why this approach?

Most candidates build a "message list with WebSocket." Strong candidates build a "chat experience"—a system that handles delivery states, presence updates, offline queuing, and graceful reconnection. The difference is thinking about the entire messaging lifecycle, not just real-time updates.

Think of this like building Slack or Discord. You don't just show messages—you handle delivery states, typing indicators, presence, offline queuing, and thread replies. Same principles apply here.

Before designing anything, let's define what success looks like. When users send messages, they need instant delivery, real-time presence updates, and reliable offline support.

Requirements Exploration Questions

Discovery
What types of conversations?
  • Direct messages (1:1)
  • Group chats (small groups)
  • Channels (large, many members)
What message types?
  • Text messages
  • Rich text (formatting, links)
  • File attachments (images, docs)
  • Reactions and threads
What real-time features?
  • Message delivery status
  • Typing indicators
  • Presence (online/away/offline)
  • Read receipts

Functional Requirements

Must Have

MVP (Core Features - What I'd Design First):

  • Send and receive messages in real-time
  • Message status (sending → sent → delivered → read)
  • Channels and direct messages (1:1 and groups)
  • Presence indicators (online/away/offline)
  • Typing indicators with debouncing
  • Offline message queue and sync
  • Clear loading/empty/error states
  • Basic accessibility and keyboard support

Advanced Features (Add If Time Permits):

  • Edit and delete messages
  • Thread replies and reactions
  • File attachments and rich media
  • Message search and history
  • Custom status and presence

Non-Functional Requirements

Quality Bar

Performance:

  • Message appears < 100ms (optimistic)
  • Sync latency < 200ms
  • Support 10K+ users per channel
  • Scroll 100K+ messages smoothly

Reliability:

  • No message loss
  • Offline message queue
  • Reconnection with sync
  • Message ordering guaranteed

Scalability:

  • Handle 1000+ messages/sec per channel
  • Presence updates for 10K+ users
  • Efficient message history loading

Accessibility:

  • Keyboard navigation
  • Screen reader support
  • High contrast mode

Security & Compliance:

  • Strict authn/authz checks on every write path
  • Input validation plus XSS/CSRF protections
  • TLS in transit and secure session/token handling

Observability:

  • Track p95 latency, error rate, and retry rate
  • Log critical client/server sync failures
  • Alert on sustained degradation and queue backlog growth
WebSocket vs REST
FeatureWebSocketREST
New messages✓ Real-time push✗ Polling
Typing indicators✓ Immediate✗ Too slow
Presence updates✓ Push✗ Polling
Message historyUnnecessary✓ Paginated
SearchUnnecessary✓ Server-side
File upload✗ Use REST✓ Multipart

Best approach: WebSocket for real-time + REST for historical data

Tradeoffs & Comparisons

Entity and interface contract shape with cache/reconciliation model for a frontend system design interview (backend treated as a black box).

1) Component prop interfaces (boundaries)

Define clear boundaries between thread rendering, composer interactions, and presence UI.

  • ChatShellProps: active workspace/channel context and high-level navigation callbacks
  • MessageListProps: normalized message ids + render state for virtualized history
  • MessageComposerProps: draft value, send handler, attachment and mention entry points
  • PresenceRailProps: online status and typing participants for the current channel

2) Hook interfaces (consumption contracts)

Use hook return contracts to describe React consumption without binding to transport internals.

Core Data Structures

Message:

interface Message {
  id: string;              // Client-generated UUID
  channelId: string;
  senderId: string;
  content: string;
  createdAt: Date;
  status: MessageStatus;
  
  // Delivery tracking
  deliveredTo: string[];   // User IDs who received
  readBy: string[];        // User IDs who read
  
  // Rich content
  attachments?: Attachment[];
  replyTo?: string;        // Parent message ID for threads
  reactions?: Reaction[];
  edited?: boolean;
  editedAt?: Date;
}

type MessageStatus = 
  | 'sending'     // Optimistic, not yet confirmed
  | 'sent'        // Server acknowledged receipt
  | 'delivered'   // Recipient(s) received
  | 'read'        // Recipient(s) opened
  | 'failed';     // Send failed

Channel:

interface Channel {
  id: string;
  type: 'channel' | 'dm' | 'group';
  name: string;
  members: string[];
  lastMessage?: Message;
  unreadCount: number;
  lastReadAt: Date;
}

Presence:

interface UserPresence {
  userId: string;
  status: 'online' | 'away' | 'offline';
  lastSeen: Date;
  customStatus?: string;
}

Event Types

type ChatEvent =
  // Messages
  | { type: 'MESSAGE_SENT'; payload: Message }
  | { type: 'MESSAGE_DELIVERED'; payload: { messageId: string; userId: string } }
  | { type: 'MESSAGE_READ'; payload: { messageId: string; userId: string } }
  | { type: 'MESSAGE_EDITED'; payload: { messageId: string; content: string } }
  | { type: 'MESSAGE_DELETED'; payload: { messageId: string } }
  | { type: 'REACTION_ADDED'; payload: { messageId: string; reaction: string; userId: string } }
  
  // Presence
  | { type: 'USER_ONLINE'; payload: { userId: string } }
  | { type: 'USER_OFFLINE'; payload: { userId: string } }
  | { type: 'USER_AWAY'; payload: { userId: string } }
  | { type: 'TYPING_START'; payload: { userId: string; channelId: string } }
  | { type: 'TYPING_STOP'; payload: { userId: string; channelId: string } };

Optimistic Update Flow

1. User clicks Send
┌─────────────────────────┐
│ Generate client-side ID │  ← UUID for idempotency
└───────────┬─────────────┘
┌─────────────────────────┐
│ Add to UI immediately   │  ← status: 'sending'
└───────────┬─────────────┘
┌─────────────────────────┐
│ Send via WebSocket      │  ← Include client ID
└───────────┬─────────────┘
     ┌──────┴──────┐
     │             │
     ▼             ▼
  Success       Failure
     │             │
     ▼             ▼
  'sent'       'failed'
  + server ID   + retry option

Client cache shape (recommended)

  • entitiesById: Record<ID, Entity>
  • orderedIds: ID[] for rendering order
  • pageInfo/cursor metadata for pagination or range loading

Deep dive: Data Normalization

Consistency & reconciliation rules

  • Make writes idempotent where retries are possible.
  • Apply realtime updates with version/event ordering checks.
  • Prefer server-authoritative reconciliation after optimistic mutations.

Tradeoffs & Comparisons

Component Boundaries

Structured
ChatShell

Owns channel selection, layout orchestration, and cross-panel callbacks.

MessageList

Renders virtualized history from normalized ids and emits anchor/load-more events.

MessageComposer

Handles draft editing, attachment intake, and send intent.

PresenceRail

Displays online state, typing indicators, and participant metadata.

chat-component-interfaces.ts
export interface ChatShellProps {
  workspaceId: string;
  activeChannelId: string;
  onSelectChannel: (channelId: string) => void;
}

export interface MessageListProps {
  messageIds: string[];
  messagesById: Record<string, Message>;
  hasOlder: boolean;
  onLoadOlder: () => void;
  onJumpToMessage: (messageId: string) => void;
}

export interface MessageComposerProps {
  draft: string;
  setDraft: (value: string) => void;
  onSend: (payload: { text: string; attachmentIds?: string[]; replyToId?: string }) => Promise<void>;
  onAttach: (files: File[]) => Promise<string[]>;
}

export interface PresenceRailProps {
  members: PresenceMember[];
  typingUserIds: string[];
}
chat-hook-contracts.ts
export interface UseChannelMessagesResult {
  messageIds: string[];
  messagesById: Record<string, Message>;
  hasOlder: boolean;
  loadOlder: () => void;
  isLoading: boolean;
}

export interface UseSendMessageResult {
  send: (payload: { text: string; attachmentIds?: string[]; replyToId?: string }) => Promise<void>;
  pendingCount: number;
}

export function useChannelMessages(_channelId: string): UseChannelMessagesResult {
  throw new Error('Contract-only snippet');
}

export function useSendMessage(_channelId: string): UseSendMessageResult {
  throw new Error('Contract-only snippet');
}

React interfaces & integration patterns (props, hooks, callbacks).

This section covers API contracts and React consumption patterns.

API contracts (Backend as black box)

REST API Endpoints:

Channels & Messages:

GET /api/channels
Response: { channels: Channel[]; }

GET /api/channels/:channelId/messages?cursor={cursor}&limit={limit}
Response: {
  messages: Message[];
  nextCursor: string | null;
  hasMore: boolean;
}

POST /api/channels/:channelId/messages
{
  content: string;
  replyTo?: string;               // Parent message ID
  attachments?: string[];        // Attachment IDs
}

Response: { message: Message; }

PUT /api/messages/:messageId
{
  content: string;
}

Response: { message: Message; }

DELETE /api/messages/:messageId
Response: { success: boolean; }

Presence & Typing:

GET /api/channels/:channelId/presence
Response: {
  users: UserPresence[];
  typing: TypingUser[];
}

POST /api/channels/:channelId/typing
{
  action: 'start' | 'stop';
}

Response: { success: boolean; }

File Upload:

POST /api/upload
FormData: { file: File; }

Response: {
  attachmentId: string;
  url: string;
  filename: string;
  size: number;
  mimeType: string;
}

WebSocket Protocol:

// Connection
{
  type: 'CONNECT';
  token: string;                  // Auth token
}

// Message Events
{
  type: 'MESSAGE_SENT';
  payload: Message;
}

{
  type: 'MESSAGE_DELIVERED';
  payload: { messageId: string; userId: string; }
}

{
  type: 'MESSAGE_READ';
  payload: { messageId: string; userId: string; }
}

// Presence Events
{
  type: 'USER_ONLINE' | 'USER_OFFLINE' | 'USER_AWAY';
  payload: { userId: string; }
}

{
  type: 'TYPING_START' | 'TYPING_STOP';
  payload: { userId: string; channelId: string; }
}

Type definitions used in contracts

interface Channel {
  id: string;
  name: string;
  type: 'dm' | 'group' | 'channel';
}

interface Message {
  id: string;
  channelId: string;
  senderId: string;
  content: string;
  createdAt: string;
  status: 'sending' | 'sent' | 'delivered' | 'read' | 'failed';
}

interface UserPresence {
  userId: string;
  status: 'online' | 'away' | 'offline';
  lastSeen?: string;
}

interface TypingUser {
  userId: string;
  channelId: string;
}

3) Integration patterns (React wiring)

  • Data down, events up: list/composer components emit intent; hooks own side effects.
  • Optimistic messaging: insert local message with pending state, reconcile on ack/failure.
  • Realtime patching: update delivery/read states in-place by message id.
  • Scroll continuity: preserve viewport anchor while loading older history.

Integration Patterns

Structured
Optimistic delivery

Render pending message immediately, rollback or reconcile on server response.

Patch-by-id updates

Apply delivery/read/reaction updates directly to cached entities.

Reconnect-safe queues

Replay unsent actions after reconnect using idempotency keys.

Viewport stability

Keep scroll anchor stable while prepending older messages.

Virtual Scrolling for Message History

Problem: Channel with 100K messages = massive DOM = jank

Solution: Only render visible messages + buffer

┌────────────────────────────────┐
│  Old messages (not rendered)   │  ← Saved in memory
├────────────────────────────────┤
│  Buffer zone (10 messages)     │  ← Smooth scrolling
├────────────────────────────────┤
│  ██████████████████████████████│
│  ██ VISIBLE VIEWPORT ██████████│  ← Actually in DOM
│  ██████████████████████████████│
├────────────────────────────────┤
│  Buffer zone (10 messages)     │  ← Smooth scrolling
├────────────────────────────────┤
│  New messages (not rendered)   │  ← Lazy load on scroll
└────────────────────────────────┘

Key behaviors:

  • Start at bottom (newest messages)
  • Load older on scroll up
  • Auto-scroll when at bottom + new message
  • "Jump to bottom" button when scrolled up

WebSocket Reconnection Strategy

Connection lost
┌─────────────────────────┐
│ Attempt 1: Wait 1s      │
└───────────┬─────────────┘
            │ Failed
┌─────────────────────────┐
│ Attempt 2: Wait 2s      │
└───────────┬─────────────┘
            │ Failed
┌─────────────────────────┐
│ Attempt 3: Wait 4s      │  ← Exponential backoff
└───────────┬─────────────┘
            │ Failed
       ... continue ...
┌─────────────────────────┐
│ Max: Wait 30s           │  ← Cap at 30 seconds
+ jitter (0-5s random)  │  ← Prevent thundering herd
└─────────────────────────┘

On reconnect:

  1. Re-authenticate
  2. Fetch missed messages (since lastMessageId)
  3. Update presence
  4. Resume subscriptions

Offline Message Queue

User goes offline
┌─────────────────────────────────┐
│ User sends message              │
│                                 │
1. Add to UI (status: sending)2. Queue in IndexedDB           │
3. Show offline indicator       │
└─────────────────────────────────┘
        │ Network returns
┌─────────────────────────────────┐
│ Process queue in order          │
│                                 │
1. Send oldest first            │
2. Update status on confirm     │
3. Handle duplicates (by ID)4. Clear from queue             │
└─────────────────────────────────┘

Queue structure:

interface QueuedMessage {
  id: string;           // Client-generated
  channelId: string;
  content: string;
  queuedAt: Date;
  retryCount: number;
}

Performance Targets

MetricTargetTechnique
Message send< 100ms perceivedOptimistic update
Message receive< 200msWebSocket push
Channel switch< 300msCached messages
History load< 500msCursor pagination
Reconnect< 3sExponential backoff
Memory (10K msgs)< 50MBVirtual scroll

Why This Design Works

Structured
Event-Driven Architecture
  • Clear message flow - Events represent all state changes
  • Easy to extend - Add new event types for new features
  • Debuggable - Log events to trace issues
Client-Generated IDs
  • Optimistic updates - Show before server confirms
  • Idempotency - Retry without duplicates
  • Offline support - Create IDs without server
Message Status Progression
  • User feedback - Know if message was received
  • Trust building - See when message was read
  • Error recovery - Clear indication of failures
Presence Heartbeats
  • Battery efficient - Don't poll constantly
  • Accurate status - Know within 60 seconds
  • Graceful degradation - Works with intermittent connection
Virtual Scrolling
  • Handle any history - 100K+ messages no problem
  • Smooth scrolling - Only visible items in DOM
  • Memory efficient - Constant memory usage

Key Takeaways

  1. Client-generated message IDs enable optimistic updates and offline support
  2. Message status progression (sending → sent → delivered → read) builds trust
  3. Presence uses heartbeat with exponential backoff for efficiency
  4. Typing indicators need debouncing (start) and auto-timeout (stop)
  5. WebSocket + REST hybrid - real-time for new messages, REST for history
  6. Virtual scroll is essential for channels with thousands of messages
  7. Group consecutive messages from same sender for cleaner UI
  8. Reconnection with sync - fetch missed messages by lastMessageId

Key Takeaways

  • Client-generated message IDs enable optimistic updates and idempotency
  • Message status progression: sending → sent → delivered → read
  • Presence uses heartbeat with exponential backoff for efficiency
  • Typing indicators need debouncing (start) and auto-timeout (stop)
  • WebSocket for real-time + REST for history/search
  • Virtual scroll is essential for channels with thousands of messages
  • Group consecutive messages from same sender for cleaner UI
  • Reconnection strategy: exponential backoff with missed message sync