Data Normalization: Organizing State for Performance

Medium

Learn how to structure nested data using normalized lookup tables for O(1) access, easy updates, and better performance in complex applications.

What is Data Normalization?

Data normalization is the practice of storing data in a flat, lookup-table structure instead of nested objects. This pattern is essential for managing complex state in applications like Kanban boards, social feeds, and project management tools.

Instead of deeply nested structures that are hard to update and duplicate data, normalized state uses:

  • Lookup tables: Objects keyed by ID for O(1) access
  • Order arrays: Arrays of IDs to maintain relationships and ordering
  • Separate concerns: Data storage separate from UI structure

Why Normalized State?

Nested data structures become problematic as applications scale. Here's why normalization is essential:

Nested: Hard to Update

// ❌ Nested: Hard to update, duplicated data
{
  columns: [
    {
      id: 'col-1',
      issues: [
        { id: 1, title: 'Fix bug', assignee: 'Alice' },
        { id: 2, title: 'Add feature', assignee: 'Bob' }
      ]
    }
  ]
}

// To update issue #1's title:
// 1. Find the column
// 2. Find the issue in nested array
// 3. Update it (complex immutable update)
// 4. If issue appears in multiple places, update all
  • O(n) lookup time (must search through arrays)
  • Complex immutable updates (deep nesting)
  • Data duplication (same issue in multiple columns)
  • Hard to maintain single source of truth

Normalized: Easy Updates

// ✅ Normalized: O(1) lookups, easy updates
{
  issues: {
    '1': { id: 1, title: 'Fix bug', assignee: 'Alice' },
    '2': { id: 2, title: 'Add feature', assignee: 'Bob' }
  },
  issuesByColumn: {
    'col-1': ['1', '2']
  }
}

// To update issue #1's title:
// state.issues['1'].title = 'New title'
// Simple, direct, no searching needed
  • O(1) lookup time (direct key access)
  • Simple immutable updates (flat structure)
  • Single source of truth (one issue object)
  • Easy to reference from multiple places

Normalized Structure Pattern

A normalized state structure consists of three main parts:

1. Lookup Tables (Entities)

Store all entities in flat objects keyed by ID. This provides O(1) access to any entity.

// All entities stored by ID
{
  issues: {
    '1': { id: '1', title: 'Fix bug', assigneeId: 'alice' },
    '2': { id: '2', title: 'Add feature', assigneeId: 'bob' }
  },
  users: {
    'alice': { id: 'alice', name: 'Alice' },
    'bob': { id: 'bob', name: 'Bob' }
  },
  columns: {
    'col-1': { id: 'col-1', name: 'To Do' },
    'col-2': { id: 'col-2', name: 'In Progress' }
  }
}

2. Relationship Maps (Order Arrays)

Use arrays of IDs to represent relationships and maintain order. This separates data from structure.

// Relationships stored separately
{
  issuesByColumn: {
    'col-1': ['1', '2'],  // Issues in "To Do" column
    'col-2': []           // Issues in "In Progress" column
  },
  columnOrder: ['col-1', 'col-2'],  // Column display order
  issuesByAssignee: {
    'alice': ['1'],
    'bob': ['2']
  }
}

3. UI State (Separate from Data)

Keep UI-specific state separate from normalized data. This makes state management cleaner.

// UI state separate from data
{
  selectedIssueId: '1' | null,
  dragState: { issueId: '1', sourceColumn: 'col-1' } | null,
  filter: { assignee: 'alice', status: 'todo' },
  isLoading: false,
  error: null
}

Complete Example

interface BoardState {
  // Lookup tables for O(1) access
  issues: Record<string, Issue>;
  columns: Record<string, Column>;
  users: Record<string, User>;
  
  // Order arrays for rendering
  columnOrder: string[];
  issuesByColumn: Record<string, string[]>;
  
  // UI state
  selectedIssueId: string | null;
  dragState: DragState | null;
  filter: FilterState;
}

Benefits of Normalization

⚡ Performance

  • • O(1) lookups instead of O(n) searches
  • • Faster updates (direct key access)
  • • Better memoization (stable references)
  • • Reduced re-renders (isolated updates)

🔧 Maintainability

  • • Simple immutable updates
  • • Single source of truth
  • • Easy to debug (flat structure)
  • • Predictable state shape

🔄 Flexibility

  • • Easy to reorder (just update array)
  • • Simple filtering (filter IDs, then lookup)
  • • Multiple views of same data
  • • Easy to add relationships

📦 Scalability

  • • Handles large datasets efficiently
  • • No nested traversal overhead
  • • Memory efficient (no duplication)
  • • Works well with virtualization

Real-World Examples

Kanban Board (Jira-like)

Perfect for drag-and-drop boards with thousands of issues:

// Normalized state for Kanban board
{
  issues: {
    '1': { id: '1', title: 'Fix bug', status: 'todo' },
    '2': { id: '2', title: 'Add feature', status: 'in-progress' }
  },
  issuesByColumn: {
    'todo': ['1'],
    'in-progress': ['2'],
    'done': []
  },
  columnOrder: ['todo', 'in-progress', 'done']
}

// Moving issue from 'todo' to 'in-progress':
// 1. Remove '1' from issuesByColumn['todo']
// 2. Add '1' to issuesByColumn['in-progress']
// 3. Update issue status: issues['1'].status = 'in-progress'
// Simple, fast, no deep nesting!

Social Media Feed (Twitter-like)

Efficient for feeds with nested comments and relationships:

// Normalized state for social feed
{
  tweets: {
    't1': { id: 't1', text: 'Hello', authorId: 'u1' },
    't2': { id: 't2', text: 'World', authorId: 'u2' }
  },
  users: {
    'u1': { id: 'u1', name: 'Alice' },
    'u2': { id: 'u2', name: 'Bob' }
  },
  comments: {
    'c1': { id: 'c1', text: 'Nice!', tweetId: 't1', authorId: 'u2' }
  },
  commentsByTweet: {
    't1': ['c1'],
    't2': []
  },
  feedOrder: ['t1', 't2']  // Timeline order
}

// Adding a comment:
// 1. Add comment to comments lookup
// 2. Add comment ID to commentsByTweet array
// No need to find and update nested tweet object!

E-commerce Product List

Great for product catalogs with categories and filters:

// Normalized state for product catalog
{
  products: {
    'p1': { id: 'p1', name: 'Laptop', price: 999, categoryId: 'c1' },
    'p2': { id: 'p2', name: 'Mouse', price: 29, categoryId: 'c1' }
  },
  categories: {
    'c1': { id: 'c1', name: 'Electronics' },
    'c2': { id: 'c2', name: 'Accessories' }
  },
  productsByCategory: {
    'c1': ['p1', 'p2'],
    'c2': []
  },
  filteredProductIds: ['p1', 'p2'],  // After applying filters
  cartItems: ['p1']  // Product IDs in cart
}

// Filtering products:
// 1. Filter product IDs based on criteria
// 2. Update filteredProductIds array
// 3. Render using filtered IDs (lookup from products)

Trade-offs & When to Use

Use Normalization When:

  • Large datasets (1000+ items)
  • Frequent updates to nested data
  • Multiple views of same data
  • Complex relationships (many-to-many)
  • Performance is critical
  • Real-time updates (WebSocket, etc.)

Skip Normalization When:

  • Small, simple datasets (<100 items)
  • Data is mostly read-only
  • Simple parent-child relationships
  • Prototyping or MVP stage
  • Overhead not worth the benefits

⚠️ Common Pitfalls

  • Over-normalization: Don't normalize simple, small datasets. The overhead of maintaining lookup tables isn't worth it.
  • Forgetting to update relationships: When updating an entity, remember to update all relationship arrays that reference it.
  • Not denormalizing for display: Sometimes you need to denormalize for specific views. That's okay! Normalization is a tool, not a rule.
  • Complex selectors: You may need helper functions to reconstruct nested views from normalized data. This is normal and expected.

Implementation Tips

1. Use TypeScript for Type Safety

interface NormalizedState<T> {
  entities: Record<string, T>;
  ids: string[];
}

// Usage
type IssueState = NormalizedState<Issue>;

2. Create Helper Functions

// Helper to get entities by IDs
function getEntitiesByIds<T>(
  entities: Record<string, T>,
  ids: string[]
): T[] {
  return ids.map(id => entities[id]).filter(Boolean);
}

// Helper to add entity
function addEntity<T>(
  state: NormalizedState<T>,
  entity: T,
  id: string
): NormalizedState<T> {
  return {
    entities: { ...state.entities, [id]: entity },
    ids: [...state.ids, id]
  };
}

3. Use Memoization for Derived Data

// Memoize filtered/sorted views
const filteredIssues = useMemo(() => {
  const ids = state.issuesByColumn[columnId] || [];
  return ids
    .map(id => state.issues[id])
    .filter(issue => issue.priority === 'high');
}, [state.issues, state.issuesByColumn, columnId]);

Key Takeaways

  • Normalize for scale: Use normalized state when dealing with large, frequently-updated datasets.
  • Lookup tables + order arrays: Store entities by ID, relationships as arrays of IDs.
  • O(1) access: Direct key access is faster than searching nested arrays.
  • Single source of truth: Each entity exists once, referenced by ID everywhere.
  • Simple updates: Updating normalized state is straightforward and predictable.