AWS DataSync

How it Works: The Core Workflow

DataSync operates on a simple, three-step process to move your data.

Deploy the Agent: You deploy a DataSync agent (a virtual machine) in your on-premises environment or in your AWS VPC. This agent acts as the worker that securely connects to your storage system to read or write data.
Create a Task: In the AWS Console, you define a data transfer task. A task specifies the source location, the destination location, and the configuration for the transfer.
Start and Monitor: You start the task to begin the data transfer. DataSync handles the entire process, including encryption, transfer optimization, and data integrity validation. You can monitor the progress, performance, and status through the AWS Console or via Amazon CloudWatch.

Agent: A virtual machine (VMware ESXi, Microsoft Hyper-V, or Amazon EC2) that you deploy. The agent reads data from a source location and writes it to a destination location. It connects securely to the DataSync service in the cloud.
Location: A configuration that defines a source or destination endpoint for your data. This can be an on-premises file share or an AWS storage service.
Task: The core resource that defines a complete data transfer job. A task includes:
- A source and a destination location.
- Configuration settings for how the transfer should run, such as scheduling, bandwidth limits, file filtering, and how to handle metadata, permissions, and deleted files.

DataSync can transfer files between a variety of storage systems.

| Source Locations | Destination Locations |

| -------------------------------------- | -------------------------------------- |

| Network File System (NFS) Shares | Amazon S3 (All Storage Classes) |

| Server Message Block (SMB) Shares | Amazon EFS |

| Self-managed Object Storage | Amazon FSx for Windows File Server |

| HDFS (as a source) | Amazon FSx for Lustre |

| AWS Snowcone | Amazon FSx for OpenZFS |

| Amazon S3 | Amazon FSx for NetApp ONTAP |

| Amazon EFS | |

| Amazon FSx (all types) | |

Automated & Simplified: DataSync automates the entire data movement process, including scripting copy jobs, scheduling, monitoring, and data validation.
High-Performance Transfer:
- Uses a purpose-built, secure network protocol that is optimized for WAN transfers.
- Performs parallel transfers and can saturate a 10 Gbps network link with a single agent.
- Automatically scales cloud resources to handle high-volume transfers.
Robust Security:
- All data is encrypted in transit using TLS 1.2.
- Supports encryption at rest on AWS storage services like Amazon S3 (SSE), Amazon EFS (KMS), and Amazon FSx.
- Performs data integrity checks both in transit and at rest to ensure your data arrives intact.
Flexible Task Configuration:
- Scheduling: Configure tasks to run periodically to detect and synchronize changes from source to destination.
- Filtering: Include or exclude specific files and folders from the transfer based on patterns.
- Bandwidth Throttling: Set limits on the network bandwidth used by the agent to minimize impact on other applications.
Comprehensive Monitoring:
- Track task execution and performance directly in the AWS Management Console.
- Integrates with Amazon CloudWatch for logs, metrics, and events, allowing for detailed monitoring and alerting.

Data Migration: Fast and simple migration of active file data from on-premises file servers into Amazon S3, EFS, or FSx.
Archiving Cold Data: Move cold data or backups from on-premises storage directly to durable and low-cost archival storage classes like S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive.
Data Protection & Replication: Set up scheduled tasks to replicate on-premises data to AWS for backup and disaster recovery purposes.
Hybrid Cloud Workflows: Accelerate data transfers for hybrid cloud workflows that require moving large datasets to AWS for processing, analysis, or machine learning.