Cheat Sheet: Kinesis Data Streams - Scaling & Parallel Processing

Scaling an Amazon Kinesis Data Stream is crucial for handling changes in data throughput and ensuring that your data processing applications can keep up. Scaling is centered around managing the number of shards in a stream.

The Shard: The Unit of Scale

A shard is the base throughput unit of a Kinesis data stream. Understanding its limits is key to scaling:

Ingestion (Write) Capacity: 1 MB/second OR 1,000 records/second.
Egestion (Read) Capacity (Standard Consumer): 2 MB/second (shared across all consumers of the shard).
Egestion (Read) Capacity (Enhanced Fan-Out Consumer): 2 MB/second (dedicated throughput per registered consumer).

You must monitor two key metrics in CloudWatch to know when to scale:

WriteProvisionedThroughputExceeded: Indicates you are trying to write data faster than the shard's capacity.
ReadProvisionedThroughputExceeded: Indicates your standard consumers are trying to read data faster than the shard's shared capacity.

Resharding: How to Scale a Stream

Resharding is the process of increasing or decreasing the number of shards in a stream to adapt to changes in data flow rate.

1. Shard Splitting (Scaling Out)

What it is: Dividing a single shard into two new "child" shards. This increases the total number of shards in the stream, thereby increasing its total capacity.
When to use it:
- When your data ingestion rate is increasing and you are hitting the write limits of a shard (WriteProvisionedThroughputExceeded).
- When a "hot shard" (a shard receiving a disproportionate amount of data due to a popular partition key) is causing a bottleneck. Splitting this shard allows the data for that partition key range to be distributed across two new shards.
Process: A shard split is a pairwise operation. The parent shard is closed and two new child shards are created. The parent shard's data is preserved until its retention period expires, but all new data for its partition key range is routed to the two child shards.

2. Shard Merging (Scaling In)

What it is: Combining two adjacent shards into a single new "child" shard. This decreases the total number of shards, reducing the cost of the stream.
When to use it:
- When your data ingestion rate has decreased and you have "cold" shards that are underutilized.
- To reduce costs when you are over-provisioned for your workload.
Process: A shard merge is also a pairwise operation. The two parent shards are closed, and a single new child shard is created to handle all new data for the combined partition key range of its parents.

Parallel Processing: Consuming the Data

Scaling your stream's capacity is only half the battle; you also need to scale your consumer applications to process the data in parallel.

How Consumers Map to Shards

The fundamental rule of Kinesis consumption is: A single shard can only be processed by one consumer instance (or task) at a time within a given consumer application.

This one-to-one mapping is what enables safe, ordered, parallel processing. Your application's overall processing capacity is directly related to the number of shards in the stream.

Scaling with Different Consumer Types

1. AWS Lambda

How it works: When you use a Kinesis stream as an event source for a Lambda function, AWS manages the consumer logic for you. Lambda polls each shard in your stream and invokes a separate instance of your function for each shard that has new data.
Scaling:
- The number of concurrent Lambda executions is directly tied to the number of shards.
- To increase processing parallelism, you must increase the number of shards in the stream (via shard splitting). A 10-shard stream can have at most 10 concurrent Lambda functions processing it.
- You can also tune the BatchSize and ParallelizationFactor to process multiple batches from a single shard in parallel, up to a factor of 10.

2. Kinesis Client Library (KCL) Applications

What it is: A library that helps you build consumer applications running on platforms like EC2 or ECS. The KCL handles complex tasks like tracking shards, coordinating between multiple consumer instances (workers), and adapting to resharding events.
How it works:
- The KCL uses a DynamoDB table to track shard leases and coordinate workers.
- Each worker instance pulls data from one or more shards. The KCL ensures that each shard is only leased to one worker at a time within the application group.
Scaling:
- You can scale your processing by adding more worker instances (e.g., more EC2 instances). The KCL will automatically rebalance the shard leases across the available workers.
- The maximum number of useful worker instances is equal to the number of shards. If you have 10 shards, you can have at most 10 workers processing in parallel. An 11th worker would sit idle as there would be no free shards for it to lease.
- To scale beyond the number of shards, you must increase the shard count in the stream itself.

Kinesis Scaling Resharding And Parallel Processing