DynamoDB Best Practices

August 23, 2025|30 min read
Database ArchitectureAWSDynamoDBNoSQL

title: "DynamoDB Preferred Patterns" date: "2025-08-23" readTime: "30 min read" tags: ["Database Architecture", "AWS", "DynamoDB", "NoSQL"] description: "Learnings from leading and building 0-1 at Amazon: idempotent keys, fat GSIs, access pattern modeling, live migrations, and defeating hot partitions." status: "published"

These approaches prevent outages, data loss, and enable systems to scale cost-efficiently.

1. Choose Idempotent Primary Keys Over Complex Sort Keys

The Problem

Sort keys with timestamps seem logical but create a gotcha: async writers using Date.now() can't achieve idempotency because the timestamp can't easily be recovered without a query, introducing race conditions. The race conditions can be mitigated with condition expressions, but you're still paying for the extra query, as well as the retries that will come when the condition expression results in a retry. Additionally those errors can slow down convergence, and throw a wrench into your SQS process throughput. SQS will slow down delivery if it sees errors.

The Solution

Design primary keys that can be deterministically derived from your incoming payload. This enables idempotency asserted via condition expressions and simplifies conflict resolution.

❌ Anti-Pattern: Time-based Sort Key

// This breaks idempotency!
const item = {
  userId: event.userId,
  timestamp: Date.now(), // Can't recover this value!
  eventType: 'USER_ACTION',
  data: event.data
};

// Retry will create duplicate with different timestamp
await dynamodb.putItem({ TableName: 'Events', Item: item });

✅ Pattern: Deterministic Primary Key

// Idempotent by design
const item = {
  // Primary key derived from payload
  eventId: `${event.userId}#${event.actionId}`,
  userId: event.userId,
  timestamp: event.timestamp || Date.now(),
  eventType: 'USER_ACTION',
  data: event.data
};

// Use conditional expression for conflict resolution
const params = {
  TableName: 'Events',
  Item: item,
  ConditionExpression: 'attribute_not_exists(eventId)', // Fail on duplicate
  // OR
  ReturnValues: 'ALL_OLD' // Return success idempotently
};

Key Takeaway: Your primary key should be a pure function of your input data. If you can't recreate it from the payload alone, you've lost convenient idempotency during writes. Use GSIs for your read access patterns, and pay for the extra getItem calls when you require consistent reads because GSIs are eventually consistent.

2. Default to "Fat" GSIs (PROJECT_ALL) Until Cost Forces Otherwise

The Trade-off

"Thin" GSIs project only specific attributes, saving storage but requiring complex code and often resulting in higher read costs due to secondary lookups. "Fat" GSIs with PROJECT_ALL simplify development and often reduce total costs.

Cost Analysis

Thin GSI Cost Model

// This approach costs: 1 RCU (query) + 50 RCUs (batch get) = 51 RCUs total

// Step 1: Query GSI (1 RCU for up to 4KB)
const queryResult = await dynamodb.query({
  TableName: 'Users',
  IndexName: 'ByCreatedDate',
  KeyConditionExpression: 'accountId = :accountId',
  ProjectionExpression: 'userId, createdAt', // Only these fields available
  Limit: 50
});

// Step 2: Batch get full records (1 RCU per item!)
const batchGetParams = {
  RequestItems: {
    'Users': {
      Keys: queryResult.Items.map(item => ({
        accountId: item.accountId,
        userId: item.userId
      }))
    }
  }
};

Fat GSI Cost Model

// This approach costs: ~3-5 RCUs (depending on item size) - 90% cost reduction!

// Single query returns all data
const queryResult = await dynamodb.query({
  TableName: 'Users',
  IndexName: 'ByCreatedDate',
  KeyConditionExpression: 'accountId = :accountId',
  Limit: 50
});

// All fields available immediately - no secondary lookup needed
const users = queryResult.Items;

Development Cost

Thin GSI Complexity:

  • Separate DTO for each GSI projection
  • Complex type management
  • Orchestration logic for secondary fetches
  • Higher error rates from partial data

Fat GSI Simplicity:

  • Single entity type across all indexes
  • Straightforward queries
  • No secondary lookups
  • Predictable performance

When to Consider Thin GSIs

  • Storage costs exceed $1000/month for a single table
  • Items contain large binary data (images, documents)
  • Clear separation between "index fields" and "detail fields"
  • You need consistent reads that aren't 500ms - ~1 second out of date

Consider deflecting large data to S3 before implementing thin GSIs!

3. Model for Access Patterns instead of starting with Entities like you would with RDBMS'

The Paradigm Shift

RDBMS thinking starts with entities and relationships. DynamoDB thinking starts with queries. Design your table structure by listing every access pattern first, then work backwards. Start with writes. Challenging access patterns include e.g. "Filters" on e-commerce pages. DynamoDB may not be your best DB choice if you adamantly require page numbers on large search results, or have high dimensionality on your search terms.

Example: User/Account System

Access Patterns First

// List your access patterns
const accessPatterns = [
  "Get user by userId",
  "List all users in an account",
  "List users by creation date (newest first)",
  "Get account by accountId",
  "List user's recent activities",
  "Get activity by activityId"
];

Single Table Design

// Primary Key Design
interface TableDesign {
  PK: string;    // Partition Key
  SK: string;    // Sort Key
  GSI1PK?: string;
  GSI1SK?: string;
  Type: string;
  // ... other attributes
}

// Records
const account: TableDesign = {
  PK: 'ACCT#123',
  SK: 'METADATA',
  Type: 'Account',
  accountName: 'Acme Corp',
  plan: 'Enterprise'
};

const user: TableDesign = {
  PK: 'ACCT#123',
  SK: 'USER#456',
  GSI1PK: 'USER#456',
  GSI1SK: 'PROFILE',
  Type: 'User',
  email: 'john@acme.com',
  createdAt: '2024-01-15T10:00:00Z'
};

const activity: TableDesign = {
  PK: 'USER#456',
  SK: 'ACTIVITY#2024-01-15T10:30:00Z#789',
  GSI1PK: 'ACCT#123',
  GSI1SK: 'ACTIVITY#2024-01-15T10:30:00Z',
  Type: 'Activity',
  action: 'LOGIN',
  ip: '192.168.1.1'
};

Query Patterns

// 1. Get user by userId
await dynamodb.query({
  KeyConditionExpression: 'GSI1PK = :userId AND GSI1SK = :profile',
  ExpressionAttributeValues: {
    ':userId': 'USER#456',
    ':profile': 'PROFILE'
  }
});

// 2. List all users in account
await dynamodb.query({
  KeyConditionExpression: 'PK = :accountId AND begins_with(SK, :userPrefix)',
  ExpressionAttributeValues: {
    ':accountId': 'ACCT#123',
    ':userPrefix': 'USER#'
  }
});

// 3. List user's recent activities (sorted by time)
await dynamodb.query({
  KeyConditionExpression: 'PK = :userId AND begins_with(SK, :activityPrefix)',
  ExpressionAttributeValues: {
    ':userId': 'USER#456',
    ':activityPrefix': 'ACTIVITY#'
  },
  ScanIndexForward: false // Newest first
});

Design Process

  1. List every access pattern your application needs
  2. Group related items that are queried together
  3. Design composite keys that enable your queries
  4. Use GSIs for alternative access patterns
  5. Denormalize data to avoid joins

4. Master Live Migrations Without Downtime

The Challenge

Unlike RDBMS migrations, DynamoDB doesn't support transactional schema changes. You need backwards-compatible live migrations where both data models coexist during the transition.

The Six-Step Migration Process

Step 1: Create V2 Table

// Define new table with improved schema
const v2TableParams = {
  TableName: 'UserDataV2',
  KeySchema: [
    { AttributeName: 'entityId', KeyType: 'HASH' },
    { AttributeName: 'sortKey', KeyType: 'RANGE' }
  ],
  GlobalSecondaryIndexes: [{
    IndexName: 'ByTimestamp',
    KeySchema: [
      { AttributeName: 'tenantId', KeyType: 'HASH' },
      { AttributeName: 'timestamp', KeyType: 'RANGE' }
    ],
    Projection: { ProjectionType: 'ALL' }
  }],
  BillingMode: 'PAY_PER_REQUEST'
};

Step 2: Implement Dual Writing

class DataService {
  private dualWriteEnabled = false;
  
  async writeItem(data: UserData): Promise<void> {
    // Always write to V1
    await this.writeToV1(data);
    
    // Conditionally write to V2
    if (this.dualWriteEnabled || await this.featureFlag.isEnabled('dual-write')) {
      try {
        await this.writeToV2(this.transformToV2Schema(data));
      } catch (error) {
        // Log but don't fail - V1 is still primary
        logger.error('V2 write failed', { error, data });
        metrics.increment('v2.write.failed');
      }
    }
  }
  
  private transformToV2Schema(v1Data: V1UserData): V2UserData {
    return {
      entityId: `USER#${v1Data.userId}`,
      sortKey: `PROFILE#${v1Data.accountId}`,
      tenantId: v1Data.accountId,
      timestamp: v1Data.createdAt,
      ...v1Data
    };
  }
}

Step 3: Idempotent Backfill

class MigrationBackfill {
  async backfillToV2(): Promise<void> {
    let lastEvaluatedKey = undefined;
    let totalMigrated = 0;
    
    do {
      // Scan V1 table in segments for parallel processing
      const scanResult = await dynamodb.scan({
        TableName: 'UserDataV1',
        Limit: 100,
        ExclusiveStartKey: lastEvaluatedKey,
        // Use segments for parallel scanning
        Segment: process.env.SEGMENT_NUMBER,
        TotalSegments: process.env.TOTAL_SEGMENTS
      });
      
      // Transform and write in batches
      const v2Items = scanResult.Items.map(item => this.transformToV2Schema(item));
      
      await this.batchWriteIdempotent(v2Items);
      totalMigrated += v2Items.length;
      
      // Progress tracking
      if (totalMigrated % 1000 === 0) {
        logger.info('Migration progress', { totalMigrated });
      }
      
      lastEvaluatedKey = scanResult.LastEvaluatedKey;
    } while (lastEvaluatedKey);
  }
  
  private async batchWriteIdempotent(items: V2UserData[]): Promise<void> {
    const chunks = this.chunkArray(items, 25); // DynamoDB batch limit
    
    for (const chunk of chunks) {
      const writeRequests = chunk.map(item => ({
        PutRequest: {
          Item: item,
          ConditionExpression: 'attribute_not_exists(entityId)' // Idempotent
        }
      }));
      
      await dynamodb.batchWrite({
        RequestItems: { 'UserDataV2': writeRequests }
      });
    }
  }
}

Step 4: Implement Read Switching

class DataService {
  async readUser(userId: string): Promise<UserData> {
    const readFromV2 = await this.featureFlag.isEnabled('read-from-v2');
    
    if (readFromV2) {
      try {
        const v2Data = await this.readFromV2(userId);
        
        // Validation: compare with V1 during transition
        if (await this.featureFlag.isEnabled('validate-reads')) {
          const v1Data = await this.readFromV1(userId);
          if (!this.dataMatches(v1Data, v2Data)) {
            logger.error('Data mismatch', { userId, v1Data, v2Data });
            metrics.increment('migration.mismatch');
          }
        }
        
        return v2Data;
      } catch (error) {
        // Fallback to V1 on any error
        logger.error('V2 read failed, falling back', { error, userId });
        return this.readFromV1(userId);
      }
    }
    
    return this.readFromV1(userId);
  }
}

Step 5 & 6: Complete Migration

// Step 5: Full activation (via feature flags)
{
  "dual-write": true,
  "read-from-v2": true,
  "validate-reads": false, // Turn off after confidence
  "v1-writes-enabled": false // Stop V1 writes
}

// Step 6: Archive old table
class TableArchiver {
  async archiveV1Table(): Promise<void> {
    // Export to S3 for backup
    await dynamodb.exportTableToPointInTime({
      TableArn: 'arn:aws:dynamodb:region:account:table/UserDataV1',
      S3Bucket: 'company-dynamodb-archives',
      S3Prefix: 'UserDataV1/final-export/'
    });
    
    // Wait for export completion
    await this.waitForExportCompletion();
    
    // Delete table after verification
    await dynamodb.deleteTable({ TableName: 'UserDataV1' });
  }
}

Migration Best Practices

  • Use feature flags for each migration phase
  • Monitor error rates and latencies at each step
  • Implement comparison validation during dual reads
  • Keep backfill jobs idempotent and resumable
  • Plan rollback procedures for each phase
  • Archive old data before deletion

5. Defeating Hot Partitions

Symptoms

Hot partitions occur when one partition receives disproportionate traffic, causing throttling even when your table has plenty of capacity.

A single hot partition can bring down entire services!

Natural Distribution Through Better Modeling

The best solution isn't random sharding which you may read about - it's modeling your data to naturally distribute load.

Think of each distinct partition as a point of failure. Minimize cross-usage between users where you can.

❌ Anti-Pattern: Account-Level Partitioning

// Problem: All customer activity in an account hits the same partition
// Enterprise accounts with thousands of customers create hot partitions
{
  PK: `ACCOUNT#${accountId}`,           // Hot partition for large accounts!
  SK: `CUSTOMER#${customerId}#ACTION#${actionId}`,
  customerId: customerId,
  actionType: 'purchase',
  timestamp: '2024-01-15T10:00:00Z'
}

// Reading all activity for an account requires scanning one huge partition
const getAccountActivity = async (accountId: string) => {
  return dynamodb.query({
    KeyConditionExpression: 'PK = :pk',
    ExpressionAttributeValues: {
      ':pk': `ACCOUNT#${accountId}`  // Could be millions of items!
    }
  });
};

✅ Pattern: Customer-Level Partitioning

Note in this example, we'd still be banking on async writes to handle a potential hot partition in the GSI on writing >3000 items per second. That is realistic though.

// Solution: Partition by customer, use GSI for account-level queries
{
  PK: `CUSTOMER#${customerId}`,         // Naturally distributed!
  SK: `ACTION#${timestamp}#${actionId}`,
  GSI1PK: `ACCOUNT#${accountId}`,
  GSI1SK: `CUSTOMER#${customerId}`,
  actionType: 'purchase',
  timestamp: '2024-01-15T10:00:00Z'
}

// Customer queries hit their own partition
const getCustomerActivity = async (customerId: string) => {
  return dynamodb.query({
    KeyConditionExpression: 'PK = :pk',
    ExpressionAttributeValues: {
      ':pk': `CUSTOMER#${customerId}`  // Isolated partition
    }
  });
};

// Account-level queries use GSI (read-heavy, not write-heavy)
const getAccountCustomers = async (accountId: string) => {
  return dynamodb.query({
    TableName: 'CustomerData',
    IndexName: 'GSI1',
    KeyConditionExpression: 'GSI1PK = :pk',
    ExpressionAttributeValues: {
      ':pk': `ACCOUNT#${accountId}`
    }
  });
};

Real-World Example: Event Streaming

// ❌ Anti-Pattern: Partition by event source
{
  PK: `SOURCE#payment-service`,  // One service processes millions of events
  SK: `EVENT#${timestamp}#${eventId}`,
  eventData: {...}
}

// ✅ Pattern: Partition by natural business entity
{
  PK: `MERCHANT#${merchantId}`,  // Events naturally distributed by merchant
  SK: `EVENT#${timestamp}#${eventId}`,
  GSI1PK: `DATE#${date}`,        // Time-based queries via GSI, can go more granular if you reach bottlenecks
  GSI1SK: `TIME#${timestamp}#MERCHANT#${merchantId}`, // natural sub-sortation with ISO timestamps
  source: 'payment-service',
  eventData: {...}
}

Key Strategies

  • Identify your natural distribution key (customer, user, device, merchant)
  • Avoid grouping too many entities under one partition key
  • Use GSIs for aggregate queries, not for high-volume writes
  • Monitor partition metrics to validate distribution
  • Consider access patterns: write-heavy paths need the best distribution