Amazon SQS

What is Amazon SQS?

Amazon SQS is one of the foundational AWS services. It provides a secure, durable, and highly available hosted queue for storing messages as they travel between different components of an application. By using a queue, you can send, store, and receive messages between software components at any volume, without losing messages or requiring other services to be always available. SQS allows you to decouple application components so they can run and fail independently, increasing the overall fault tolerance of the system.

Core Concepts: The Queue Workflow

SQS operates on a producer-consumer model:

Producer (Component 1): An application component sends a message to an SQS queue. The message is then stored redundantly across multiple Availability Zones.
Queue: The message is held in the queue until a consumer is ready to process it.
Consumer (Component 2): Another application component (the consumer) periodically polls the queue to retrieve messages for processing.
Process & Delete: The consumer processes the message and then deletes it from the queue to prevent it from being processed again.

This simple workflow is the key to decoupling. The producer doesn't need to know where the consumer is or if it's currently running; it only needs to send the message to the queue.

SQS Queue Types: Standard vs. FIFO

Choosing the right queue type is critical and depends entirely on your application's requirements.

Standard Queues (Default)

Ordering: Provides best-effort ordering. The order in which messages are delivered is not guaranteed to be the same as the order in which they were sent.
Delivery: Provides at-least-once delivery. A message will be delivered at least once, but occasionally, a duplicate message might be delivered. Your application must be idempotent (i.e., processing the same message multiple times should not have adverse effects).
Throughput: Offers nearly unlimited throughput.
Use Case: The best choice for most scenarios. Ideal for high-throughput applications that can handle out-of-order and occasionally duplicate messages, such as batch processing, logging, or background tasks.

FIFO (First-In-First-Out) Queues

Ordering: Provides strict, first-in-first-out ordering. Messages are processed in the exact order they were sent. This is guaranteed using a Message Group ID. All messages within the same group are processed in order.
Delivery: Provides exactly-once processing. Duplicates are not introduced by SQS.
Throughput: High throughput, but with default limits that are lower than Standard queues.
Use Case: Essential for applications where message order and the absence of duplicates are business-critical. Examples include financial transactions, user-submitted commands that must be executed in sequence, or inventory management systems.

Key Features & Mechanisms

Polling: Short vs. Long

Short Polling (Default): SQS queries only a subset of its servers to see if a message is available. This means you might get an empty response even if messages exist on other servers. It's faster but less efficient.
Long Polling (ReceiveMessageWaitTimeSeconds > 0): SQS queries all of its servers and waits for a message to arrive in the queue if it's empty, up to a specified timeout (max 20 seconds). Long polling is the preferred method as it reduces the number of empty responses, minimizes cost, and improves efficiency.

Visibility Timeout

This is the core mechanism that prevents messages from being processed by multiple consumers at the same time.

A consumer polls and receives a message.
The message becomes invisible in the queue for the duration of the Visibility Timeout (default 30 seconds, max 12 hours).
The consumer processes the message and deletes it.
If the consumer fails to delete the message before the timeout expires (e.g., it crashes), the message becomes visible again in the queue for another consumer to process.

Dead-Letter Queues (DLQs)

A DLQ is a separate queue used to handle messages that fail processing.

How it works: You configure a redrive policy on your source queue. If a message is received from the source queue a specified number of times (the maxReceiveCount) without being deleted, SQS automatically moves it to the designated DLQ.
Purpose: This isolates problematic messages, allowing you to inspect them later for debugging without blocking the main queue. It's a critical tool for building resilient systems.

Delay Queues & Message Timers

Delay Queues: A queue-level setting that postpones the delivery of all new messages to the queue for a specified duration (0 seconds to 15 minutes).
Message Timers: A message-level setting that allows the producer to specify a delay for a specific message before it becomes visible in the queue.

Security

Encryption:
- In-Transit: Messages are encrypted using TLS/SSL.
- At-Rest: SQS supports Server-Side Encryption (SSE) using AWS KMS keys. You can use the AWS-managed SQS key or your own customer-managed key (CMK).
Access Control:
- IAM Policies: Use identity-based policies to control which users or roles can perform SQS actions (e.g., sqs:SendMessage, sqs:ReceiveMessage).
- Resource-Based Policies (Queue Policies): Attach policies directly to an SQS queue to grant cross-account access or allow other AWS services (like SNS or S3) to send messages to it.

This is a common point of confusion.

SNS (Pub/Sub): A single message is pushed to many different subscribers. It's a "fanout" system.
SQS (Queue): A message is placed in a queue and pulled by one consumer for processing. It's designed for decoupling a single producer from a single consumer.

They are often used together in the fanout pattern: an SNS topic pushes a message to multiple SQS queues, allowing different systems to process the same event in parallel, with the durability and reliability of a queue.

Common Use Cases

Decouple Microservices: Allow a web server to hand off a task (like resizing an image) to a background worker service, ensuring the web server remains responsive.
Batch Processing: Queue up a large number of tasks (e.g., processing records from a file) and have a fleet of EC2 instances or Lambda functions process them in parallel.
Order Processing: Buffer incoming orders from a high-traffic e-commerce site to be processed reliably by a backend system, preventing overload during traffic spikes.
Workflow Automation: Manage tasks in a multi-step workflow. One component completes its work and sends a message to a queue, which triggers the next component in the process.

Amazon SQS

📚 Recommended AWS Resources