Data Normalization: Organizing State for Performance

Medium•

Data Normalization Map

State Shape

Entities

Lookup tables by id

- O(1) access
- Single source of truth

Relations

Order arrays of ids

- Stable UI ordering
- Cheap relationship updates

Selectors

Derive view models

- Avoid deep traversals
- Memoize expensive joins

Trade-offs

Shape complexity

- More upfront modeling
- Lower long-term mutation cost

Core Lens

Optimize state updates by separating entity storage from ordering and view concerns.

Flow

Ingest→

Normalize→

Reference by id→

Update once

Learn how to structure nested data using normalized lookup tables for O(1) access, easy updates, and better performance in complex applications.

Quick Navigation: Why Normalize? • Normalized Structure • Benefits • Examples • Trade-offs

Why Normalized State?

Nested data structures become problematic as applications scale. Here's why normalization is essential:

Nested: Hard to Update

// ❌ Nested: Hard to update, duplicated data
{
  columns: [
    {
      id: 'col-1',
      issues: [
        { id: 1, title: 'Fix bug', assignee: 'Alice' },
        { id: 2, title: 'Add feature', assignee: 'Bob' }
      ]
    }
  ]
}

// To update issue #1's title:
// 1. Find the column
// 2. Find the issue in nested array
// 3. Update it (complex immutable update)
// 4. If issue appears in multiple places, update all

✗O(n) lookup time (must search through arrays)
✗Complex immutable updates (deep nesting)
✗Data duplication (same issue in multiple columns)
✗Hard to maintain single source of truth

Normalized: Easy Updates

// ✅ Normalized: O(1) lookups, easy updates
{
  issues: {
    '1': { id: 1, title: 'Fix bug', assignee: 'Alice' },
    '2': { id: 2, title: 'Add feature', assignee: 'Bob' }
  },
  issuesByColumn: {
    'col-1': ['1', '2']
  }
}

// To update issue #1's title:
// state.issues['1'].title = 'New title'
// Simple, direct, no searching needed

✓O(1) lookup time (direct key access)
✓Simple immutable updates (flat structure)
✓Single source of truth (one issue object)
✓Easy to reference from multiple places

Normalized Structure Pattern

A normalized state structure consists of three main parts:

1. Lookup Tables (Entities)

Store all entities in flat objects keyed by ID. This provides O(1) access to any entity.

// All entities stored by ID
{
  issues: {
    '1': { id: '1', title: 'Fix bug', assigneeId: 'alice' },
    '2': { id: '2', title: 'Add feature', assigneeId: 'bob' }
  },
  users: {
    'alice': { id: 'alice', name: 'Alice' },
    'bob': { id: 'bob', name: 'Bob' }
  },
  columns: {
    'col-1': { id: 'col-1', name: 'To Do' },
    'col-2': { id: 'col-2', name: 'In Progress' }
  }
}

2. Relationship Maps (Order Arrays)

Use arrays of IDs to represent relationships and maintain order. This separates data from structure.

// Relationships stored separately
{
  issuesByColumn: {
    'col-1': ['1', '2'],  // Issues in "To Do" column
    'col-2': []           // Issues in "In Progress" column
  },
  columnOrder: ['col-1', 'col-2'],  // Column display order
  issuesByAssignee: {
    'alice': ['1'],
    'bob': ['2']
  }
}

3. UI State (Separate from Data)

Keep UI-specific state separate from normalized data. This makes state management cleaner.

// UI state separate from data
{
  selectedIssueId: '1' | null,
  dragState: { issueId: '1', sourceColumn: 'col-1' } | null,
  filter: { assignee: 'alice', status: 'todo' },
  isLoading: false,
  error: null
}

Complete Example

interface BoardState {
  // Lookup tables for O(1) access
  issues: Record<string, Issue>;
  columns: Record<string, Column>;
  users: Record<string, User>;
  
  // Order arrays for rendering
  columnOrder: string[];
  issuesByColumn: Record<string, string[]>;
  
  // UI state
  selectedIssueId: string | null;
  dragState: DragState | null;
  filter: FilterState;
}

Benefits of Normalization

⚡ Performance

• O(1) lookups instead of O(n) searches
• Faster updates (direct key access)
• Better memoization (stable references)
• Reduced re-renders (isolated updates)

🔧 Maintainability

• Simple immutable updates
• Single source of truth
• Easy to debug (flat structure)
• Predictable state shape

🔄 Flexibility

• Easy to reorder (just update array)
• Simple filtering (filter IDs, then lookup)
• Multiple views of same data
• Easy to add relationships

📦 Scalability

• Handles large datasets efficiently
• No nested traversal overhead
• Memory efficient (no duplication)
• Works well with virtualization

Real-World Examples

Kanban Board (Jira-like)

Perfect for drag-and-drop boards with thousands of issues:

// Normalized state for Kanban board
{
  issues: {
    '1': { id: '1', title: 'Fix bug', status: 'todo' },
    '2': { id: '2', title: 'Add feature', status: 'in-progress' }
  },
  issuesByColumn: {
    'todo': ['1'],
    'in-progress': ['2'],
    'done': []
  },
  columnOrder: ['todo', 'in-progress', 'done']
}

// Moving issue from 'todo' to 'in-progress':
// 1. Remove '1' from issuesByColumn['todo']
// 2. Add '1' to issuesByColumn['in-progress']
// 3. Update issue status: issues['1'].status = 'in-progress'
// Simple, fast, no deep nesting!

Social Media Feed (Twitter-like)

Efficient for feeds with nested comments and relationships:

// Normalized state for social feed
{
  tweets: {
    't1': { id: 't1', text: 'Hello', authorId: 'u1' },
    't2': { id: 't2', text: 'World', authorId: 'u2' }
  },
  users: {
    'u1': { id: 'u1', name: 'Alice' },
    'u2': { id: 'u2', name: 'Bob' }
  },
  comments: {
    'c1': { id: 'c1', text: 'Nice!', tweetId: 't1', authorId: 'u2' }
  },
  commentsByTweet: {
    't1': ['c1'],
    't2': []
  },
  feedOrder: ['t1', 't2']  // Timeline order
}

// Adding a comment:
// 1. Add comment to comments lookup
// 2. Add comment ID to commentsByTweet array
// No need to find and update nested tweet object!

E-commerce Product List

Great for product catalogs with categories and filters:

// Normalized state for product catalog
{
  products: {
    'p1': { id: 'p1', name: 'Laptop', price: 999, categoryId: 'c1' },
    'p2': { id: 'p2', name: 'Mouse', price: 29, categoryId: 'c1' }
  },
  categories: {
    'c1': { id: 'c1', name: 'Electronics' },
    'c2': { id: 'c2', name: 'Accessories' }
  },
  productsByCategory: {
    'c1': ['p1', 'p2'],
    'c2': []
  },
  filteredProductIds: ['p1', 'p2'],  // After applying filters
  cartItems: ['p1']  // Product IDs in cart
}

// Filtering products:
// 1. Filter product IDs based on criteria
// 2. Update filteredProductIds array
// 3. Render using filtered IDs (lookup from products)

Trade-offs & When to Use

Use Normalization When:

✓Large datasets (1000+ items)
✓Frequent updates to nested data
✓Multiple views of same data
✓Complex relationships (many-to-many)
✓Performance is critical
✓Real-time updates (WebSocket, etc.)

Skip Normalization When:

✗Small, simple datasets (<100 items)
✗Data is mostly read-only
✗Simple parent-child relationships
✗Prototyping or MVP stage
✗Overhead not worth the benefits

⚠️ Common Pitfalls

Over-normalization: Don't normalize simple, small datasets. The overhead of maintaining lookup tables isn't worth it.
Forgetting to update relationships: When updating an entity, remember to update all relationship arrays that reference it.
Not denormalizing for display: Sometimes you need to denormalize for specific views. That's okay! Normalization is a tool, not a rule.
Complex selectors: You may need helper functions to reconstruct nested views from normalized data. This is normal and expected.

Implementation Tips

1. Use TypeScript for Type Safety

interface NormalizedState<T> {
  entities: Record<string, T>;
  ids: string[];
}

// Usage
type IssueState = NormalizedState<Issue>;

2. Create Helper Functions

// Helper to get entities by IDs
function getEntitiesByIds<T>(
  entities: Record<string, T>,
  ids: string[]
): T[] {
  return ids.map(id => entities[id]).filter(Boolean);
}

// Helper to add entity
function addEntity<T>(
  state: NormalizedState<T>,
  entity: T,
  id: string
): NormalizedState<T> {
  return {
    entities: { ...state.entities, [id]: entity },
    ids: [...state.ids, id]
  };
}

3. Use Memoization for Derived Data

// Memoize filtered/sorted views
const filteredIssues = useMemo(() => {
  const ids = state.issuesByColumn[columnId] || [];
  return ids
    .map(id => state.issues[id])
    .filter(issue => issue.priority === 'high');
}, [state.issues, state.issuesByColumn, columnId]);

Key Takeaways

Normalize for scale: Use normalized state when dealing with large, frequently-updated datasets.
Lookup tables + order arrays: Store entities by ID, relationships as arrays of IDs.
O(1) access: Direct key access is faster than searching nested arrays.
Single source of truth: Each entity exists once, referenced by ID everywhere.
Simple updates: Updating normalized state is straightforward and predictable.

Auto-mark complete on Next

Was this helpful?

Caching Strategies

Complete & Next

API Design Patterns