Core Concepts
-
Buckets: The fundamental containers in S3 where you store data. Bucket names must be globally unique.
-
Objects: The fundamental entities stored in S3. Objects consist of data and metadata. The data portion is opaque to S3.
-
Keys: The unique identifier for an object within a bucket. The combination of a bucket, key, and version ID uniquely identifies an object.
-
Durability and Availability: S3 is designed for 99.999999999% (11 9's) of durability. Data is redundantly stored across multiple facilities and devices within each facility.
-
Consistency Model:
-
Read-after-write consistency for PUTS of new objects. After a successful write of a new object, you can immediately read it.
-
Eventual consistency for overwrite PUTS and DELETES. It might take some time for the change to propagate.
-
S3 Storage Classes
| Storage Class | Use Case | Durability | Availability | Min. Storage Duration |
| :--- | :--- | :--- | :--- | :--- |
| S3 Standard | Frequently accessed data; default tier. | 11 9's | 99.99% | N/A |
| S3 Intelligent-Tiering | Data with unknown or changing access patterns. | 11 9's | 99.9% | N/A |
| S3 Standard-IA | Infrequently accessed data that needs rapid access. | 11 9's | 99.9% | 30 days |
| S3 One Zone-IA | Re-creatable, infrequently accessed data. | 99.999999999% (in one AZ) | 99.5% | 30 days |
| S3 Glacier Instant Retrieval| Long-term archive data with millisecond retrieval. | 11 9's | 99.9% | 90 days |
| S3 Glacier Flexible Retrieval| Long-term backups and archives (minutes-to-hours retrieval). | 11 9's | 99.99% (after retrieval) | 90 days |
| S3 Glacier Deep Archive | Lowest-cost storage for long-term retention (hours retrieval).| 11 9's | 99.99% (after retrieval) | 180 days |
| S3 Express One Zone | Performance-critical applications requiring single-digit millisecond latency. | 99.999999999% (in one AZ) | 99.95% | N/A |
Versioning
-
A bucket-level feature that allows you to keep multiple variants of an object in the same bucket.
-
Protects you from accidental overwrites and deletions.
-
Once enabled, versioning cannot be disabled on a bucket, only suspended.
-
When you delete an object in a version-enabled bucket, S3 inserts a "delete marker" instead of permanently deleting the object. You can restore the object by deleting the delete marker.
-
Versioning is a prerequisite for S3 Replication.
Data Encryption
Server-Side Encryption (SSE) - Encrypts data at rest in S3
-
SSE-S3: S3 manages the data and master encryption keys. Uses AES-256. The most straightforward option.
-
SSE-KMS: AWS Key Management Service (KMS) manages the encryption keys. Provides an audit trail (via CloudTrail) of when your key was used and by whom. Offers more control and a separation of duties.
-
SSE-C: You provide and manage your own encryption keys. S3 performs the encryption/decryption but does not store your key. You must send the key along with every request.
Client-Side Encryption
- You encrypt data on your client before uploading it to S3. S3 stores the encrypted data but has no knowledge of the keys used.
Access Control & Security
-
IAM Policies: User-based policies that define which S3 actions a user, group, or role can perform.
-
Bucket Policies: Resource-based policies attached to a bucket to grant permissions to other accounts or IAM users. Commonly used for cross-account access.
-
Access Control Lists (ACLs): A legacy mechanism to grant basic read/write permissions to other AWS accounts at the bucket or individual object level. It's recommended to use IAM and bucket policies instead.
-
S3 Block Public Access: A set of four settings at the account or bucket level to prevent accidental public exposure of data. It's highly recommended to have this enabled.
S3 Replication (SRR & CRR)
-
The automatic, asynchronous copying of objects across buckets. Versioning must be enabled on both source and destination buckets.
-
Cross-Region Replication (CRR): Replicates objects to a bucket in a different AWS Region. Used for disaster recovery, latency reduction, and compliance.
-
Same-Region Replication (SRR): Replicates objects to a bucket in the same AWS Region. Used for log aggregation or to create separate development/testing accounts with production data.
S3 Lifecycle Policies
-
Automates the management of your objects' lifecycles.
-
You can define rules to automatically transition objects to a more cost-effective storage class after a certain period (e.g., move to S3-IA after 30 days, then to Glacier Deep Archive after 180 days).
-
You can also define rules to permanently expire (delete) objects or old object versions after a specified time.
S3 Object Lock & Glacier Vault Lock
-
Provides Write-Once-Read-Many (WORM) storage, preventing objects from being deleted or overwritten for a fixed amount of time or indefinitely.
-
Retention Modes:
-
Governance Mode: Users can't overwrite or delete an object version or alter its lock settings unless they have special permissions.
-
Compliance Mode: A protected object version can't be overwritten or deleted by any user, including the root user in your AWS account. Its retention mode and period can't be changed.
-
-
Legal Hold: Provides the same protection as a retention period but has no expiration date. It remains in effect until explicitly removed.
Data Access & Performance
-
Static Website Hosting: Configure an S3 bucket to serve static content (HTML, CSS, JS, images) directly from S3.
-
Pre-signed URLs: Generate a URL that provides temporary access to a private object in your bucket.
-
CloudFront with OAI/OAC: To serve private S3 content through the CloudFront CDN, you use an Origin Access Identity (OAI - legacy) or Origin Access Control (OAC - recommended) to restrict access so that users can only access the files through CloudFront URLs, not the direct S3 URL.
-
Multipart Upload: Recommended for uploading files larger than 100 MB. It breaks the file into parts and uploads them in parallel, improving throughput and resilience.
-
S3 Transfer Acceleration: Uses CloudFront's globally distributed edge locations to accelerate uploads to S3 over long distances.