DynamoDB Scan vs Query

Core Distinction at a Glance

Query is a targeted search. It finds items based on a specific Primary Key. It is extremely fast and efficient.
Scan is a full table read. It examines every single item in a table or secondary index. It is slow and inefficient for large tables.

The `Query` Operation

A Query operation finds items in a table or a secondary index that have a specific partition key. You can optionally provide a condition for the sort key to retrieve a range of items.

How It Works

Requires a KeyConditionExpression: You must specify the partition key value. This allows DynamoDB to go directly to the physical partition where the data is stored and retrieve it.
Efficient and Fast: Because it doesn't need to look at every item, a Query is highly efficient. Its performance scales with the size of the data returned, not the size of the table.
RCU Cost: The Read Capacity Units (RCUs) consumed are based only on the size of the data that is read and returned.

When to Use `Query`

Always, if possible. Your data model should be designed around using the Query operation for your application's primary data access patterns.
When you need to fetch a single item by its full primary key.
When you need to fetch a collection of items that share the same partition key (e.g., all orders for a specific customer).

The `Scan` Operation

A Scan operation examines every item in a table or secondary index. By default, it returns all data attributes for every item.

How It Works

Reads Everything: A Scan iterates through the entire table, item by item.
Inefficient and Slow: As the table grows, the Scan operation becomes progressively slower and more expensive. It can easily consume all of your table's provisioned read capacity, impacting other critical application traffic.
RCU Cost: The RCUs consumed are based on the total size of every item in the table, regardless of whether they are returned in the final result.

The Danger of `FilterExpression` with `Scan`

You can use a FilterExpression with a Scan to discard some data from the results. However, this is highly misleading for performance. The filtering is applied after the data has been read. You still pay the RCU cost for reading every single item, even the ones that are filtered out.

When to Use `Scan`

Use with extreme caution. A Scan should almost never be used in a production application that requires scalable performance.
Acceptable uses:
- On very small tables (e.g., a few dozen items).
- For one-time administrative tasks, like exporting the entire table to S3 for analytics.

Detailed Comparison Table

Feature	`Query`	`Scan`
Primary Purpose	To find specific items based on a key.	To read every item in the table.
Required Parameter	`KeyConditionExpression` (must specify the partition key).	None (reads the whole table).
Performance	Very Fast. Goes directly to the data.	Slow. Must inspect every item.
Scalability	Excellent. Performance depends on result size, not table size.	Poor. Performance degrades as table size increases.
RCU Cost	Consumes RCUs based on the size of items returned.	Consumes RCUs based on the size of the entire table.
Use in Production	Preferred and recommended for all primary access patterns.	Strongly discouraged. Can cause performance bottlenecks.

Rule of Thumb

Always design your tables to use the Query operation.

This means identifying your application's access patterns first and creating primary keys and secondary indexes that allow you to fetch the data you need with a Query. If you find yourself needing to use a Scan to answer a common question from your application, it is a strong sign that your data model needs to be revised.

DynamoDB Scan vs Query

📚 Recommended AWS Resources