AWS Database Services

Redis Append-Only Files vs Redis Replication

4 min read
Updated June 21, 2025
4,423 characters

Core Distinction at a Glance

The most important concept to understand is that AOF and Replication solve different problems.

  • Append-Only File (AOF): This is a DURABILITY feature. Its goal is to prevent data loss on a single node if the Redis process crashes or is restarted.
  • Replication: This is a HIGH AVAILABILITY and READ SCALING feature. Its goal is to keep the service online if a primary node fails and to spread read traffic across multiple nodes.

In Amazon ElastiCache, these features are mostly mutually exclusive. Enabling Multi-AZ with automatic failover (which uses replication) is the recommended AWS method for both durability and high availability.


Append-Only File (AOF)

AOF is a persistence mechanism that logs every write operation received by the server to a file.

How It Works

  1. As your application sends write commands (e.g., SET, INCR) to Redis, these commands are appended to the AOF file.
  2. If the Redis node restarts, the server replays the commands from the AOF log to rebuild the in-memory dataset, resulting in a "warm" cache.
  3. The frequency of writing to the file can be tuned (e.g., every second, every query), balancing performance and durability.

Key Characteristics

  • Pros:
    • Provides better durability than point-in-time snapshots (RDB).
    • The log is an append-only, human-readable file, which is resilient against corruption.
    • Redis can automatically rewrite the AOF in the background to keep its size manageable.
  • Cons & ElastiCache Limitations:
    • Can have a performance impact due to disk I/O.
    • AOF is disabled in ElastiCache when you enable Multi-AZ.
    • AOF does not protect against underlying hardware failure. If the server hosting the node fails, the AOF file is lost, and recovery is not possible. For this reason, AWS recommends using Multi-AZ replication instead.

Redis Replication

Replication creates copies of your data on one or more secondary "replica" nodes.

How It Works

  1. A replication group consists of a single primary (writer) node and up to five read-only replica nodes.
  2. Data written to the primary node is asynchronously copied to all read replicas.
  3. Applications can direct read traffic to the replicas to scale read performance.
  4. If the primary node fails, ElastiCache can automatically promote a read replica to become the new primary, providing high availability.

ElastiCache Replication Modes

  • Redis (Cluster Mode Disabled):

    • Architecture: A single primary node (one shard) and up to 5 read replicas. All data resides on the primary.
    • Use Case: Best for read-intensive applications where you need to scale reads but not writes. You scale compute vertically (by changing the node type).
  • Redis (Cluster Mode Enabled):

    • Architecture: Data is partitioned across multiple primary nodes (shards). Each shard can have its own read replicas.
    • Use Case: Best for large datasets or write-intensive applications. This mode allows for horizontal scaling of both reads (by adding replicas) and writes (by adding more shards).

Detailed Comparison Table

Feature Append-Only File (AOF) Replication (Multi-AZ in ElastiCache)
Primary Goal Durability (on a single node) High Availability & Read Scaling
Mechanism Logs write commands to a file. Copies data to replica nodes.
Protects Against Redis process restart/crash. Primary node failure, AZ failure.
Data Loss Risk Data can be lost on hardware failure. Minimal data loss (milliseconds of lag) on failover.
ElastiCache Implementation Disabled when Multi-AZ is on. Not the recommended HA/DR solution. Recommended HA/DR solution. Used by default for Multi-AZ deployments.
Performance Can have I/O overhead. Minimal impact on primary; scales read performance.
Failover Not applicable. Automatic promotion of a replica to primary.

Key Takeaway

For production workloads in Amazon ElastiCache, always prefer using Replication with Multi-AZ enabled over AOF. The Multi-AZ feature provides a more robust solution that covers both high availability (failover) and better durability (protection against node failure), which AOF alone cannot provide.
Sources