What is Amazon SWF?
Amazon Simple Workflow Service (SWF) is a fully managed state tracker and task coordinator for cloud applications. It helps developers build, run, and scale background jobs that have parallel or sequential steps. You can think of SWF as a "state machine in the cloud" that reliably tracks the state of your process, ensuring that a task is assigned only once and is never lost. It's designed to orchestrate long-running processes (up to one year) and is particularly effective for workflows that involve human interaction.
SWF vs. AWS Step Functions: Which to Choose?
This is the most important consideration when evaluating SWF today. For most new applications, AWS Step Functions is the recommended service for orchestrating workflows.
-
Amazon SWF (The "Engine"):
-
Control: Offers maximum control and flexibility. You write the "decider" logic in your programming language of choice, which polls SWF for the workflow state and decides what to do next.
-
Complexity: More complex to use. It requires you to run and manage fleets of "deciders" and "activity workers."
-
Use Case: Best for unique or complex orchestration patterns that don't fit the Step Functions model, or for workflows involving external signals and human interaction where you need fine-grained programmatic control.
-
-
AWS Step Functions (The "Visual Workflow Service"):
-
Ease of Use: Significantly easier to get started with. You define your workflow visually or with a JSON-based state machine definition (Amazon States Language).
-
Integration: Offers deep, native integration with other AWS services (Lambda, SNS, SQS, etc.), making it easy to build serverless workflows.
-
Control: Less programmatic control than SWF. The state transitions are managed by the service itself.
-
Use Case: The preferred choice for orchestrating microservices, automating IT processes, and building data processing pipelines.
-
Verdict: Use Step Functions by default. Only consider SWF if you have a specific requirement for the level of control that SWF provides and Step Functions cannot meet.
Core Concepts: The Actors in SWF
An SWF workflow is powered by three types of programmatic actors that you develop and run.
-
Workflow Starter: Any application component that initiates a workflow execution. This could be a web server after a user places an order, a Lambda function triggered by an S3 upload, or an administrator running a command-line tool.
-
Decider: The "brain" of the workflow. The decider is a program you write that polls SWF for "decision tasks." A decision task tells the decider about the current state of the workflow (the full, unabridged workflow history). Based on this history, the decider determines the next action, such as:
-
Scheduling an activity task.
-
Starting a timer.
-
Waiting for an external signal.
-
Completing or failing the workflow.
-
-
Activity Worker: A program you write that performs a specific job in your workflow. An activity worker polls SWF for "activity tasks." When it receives a task, it executes its function (e.g., charge a credit card, transcode a video file, call a third-party API) and reports the result (success or failure) back to SWF.
How a Workflow Execution Works
-
A Workflow Starter sends a "start workflow" request to SWF.
-
SWF receives the request and creates a new decision task. It places this task on a task list.
-
A Decider polls the task list, receives the decision task, and examines the workflow history. For a new workflow, the history is empty.
-
The Decider makes a decision, for example, "Schedule activity:
processPayment
." It sends this decision back to SWF. -
SWF receives the decision and creates a new activity task. It places this task on a separate task list for activities.
-
An Activity Worker responsible for payments polls the activity task list, receives the
processPayment
task, and executes its business logic. -
The Activity Worker reports the result back to SWF (e.g., "payment successful").
-
SWF receives the result and creates a new decision task, placing it on the decision task list to inform the Decider that the payment activity is complete.
-
This cycle continues until the Decider decides to complete the workflow.
Key Technical Concepts
-
Domain: A logical container for your workflow resources, such as workflow types, activity types, and executions. Domains isolate your workflows from each other.
-
Workflow History: A complete, ordered, and immutable record of every event that has occurred in a workflow execution since it was started. SWF maintains this history, and it's the source of truth for your decider.
-
Task Lists: SWF uses task lists to distribute tasks to workers. You can think of them as dynamic queues. A decider polls a specific task list for decision tasks, and workers poll specific task lists for activity tasks. This allows for flexible routing of tasks to different worker fleets.
-
Timers: A decider can start a timer, which causes SWF to send a "timer fired" event to the decider after a specified duration. This is useful for implementing delays or timeouts.
-
Signals: External applications can send a signal to a running workflow execution. This allows you to inject information into the workflow after it has started. For example, a human manager could signal "approved" or "denied" for a task requiring manual intervention.
Common Use Cases
-
Video Processing Pipeline: A workflow that orchestrates the steps of downloading a video, transcoding it into multiple formats in parallel, and then uploading the results to S3.
-
E-commerce Order Fulfillment: A long-running workflow that manages an order from payment processing, to inventory check, to packaging, to shipping, and finally to sending a delivery confirmation. Steps can be automated or require human intervention (e.g., manual fraud check).
-
Data Analytics Pipeline: A workflow that starts when a new dataset arrives, provisions an EMR cluster to process the data, waits for the cluster to finish, and then tears it down to save costs.