AWS Analytics Services

AWS Data Exchange

5 min read
Updated June 23, 2025
4,885 characters

AWS Data Exchange Cheat Sheet

AWS Data Exchange is a service that makes it easy to find, subscribe to, and use third-party data in the cloud. It is a data marketplace where qualified data providers can offer their datasets to AWS customers (subscribers).

Core Concepts

  • Data Provider: An AWS account or organization that owns data and is qualified to publish it on AWS Data Exchange. Providers are responsible for maintaining and updating their data.

  • Data Subscriber: An AWS customer who finds and subscribes to data products in the catalog.

  • Product: The unit that data providers offer in AWS Data Exchange. A product contains one or more datasets. Providers set the price and terms of use for their products.

  • Dataset: A container for data. A dataset is made up of revisions.

  • Revision: A container for one or more data assets. Revisions are how providers update a dataset over time. When a provider adds new data, they create a new revision.

  • Asset: The actual data. An asset is a file or a collection of files. Common asset types include S3 objects, Redshift tables, or API assets.

  • Job: An asynchronous task for exporting or importing data assets. For example, when you export a revision's assets to an S3 bucket, AWS Data Exchange creates a job to track this export.

How It Works

The workflow is straightforward for both providers and subscribers.

For Data Providers:

  1. Become a Provider: Apply and get qualified by AWS to become a data provider.

  2. Create a Dataset: Create a new dataset that will contain your data.

  3. Create a Revision and Add Assets: Create a new revision within your dataset and import your data assets into it (e.g., upload files to an S3 bucket managed by Data Exchange).

  4. Finalize the Revision: Finalize the revision to make it immutable. AWS Data Exchange automatically scans the data for malware and sensitive information.

  5. Publish a Product: Create a data product containing your dataset(s).

  6. Create an Offer: Define the terms and pricing for your product. Offers can be public (listed in the AWS Marketplace catalog) or private (extended to specific AWS accounts).

For Data Subscribers:

  1. Find Data: Browse the AWS Data Exchange catalog to find relevant data products.

  2. Subscribe: Subscribe to a product. Some products are free, while others have a subscription fee.

  3. Access Data: Once subscribed, you gain access to the datasets within that product.

  4. Export or Query Data: You can export the data assets from the dataset's revisions to your own Amazon S3 bucket. For Redshift or API datasets, you can query them directly.

  5. Use the Data: Use the data in your S3 bucket with other AWS services like Athena, Redshift Spectrum, QuickSight, or SageMaker for analysis and machine learning.

Key Features

  • Centralized Catalog: A single place to find data from hundreds of qualified data providers across various industries like financial services, healthcare, retail, and more.

  • Direct AWS Integration: Data is delivered directly into AWS, making it easy to use with the ecosystem of AWS analytics and machine learning services. There's no need for complex data ingestion pipelines.

  • Managed Data Delivery: Subscribers get notified via CloudWatch Events when a provider publishes a new revision to a dataset they are subscribed to. This allows for the automation of workflows to process new data as it arrives.

  • Flexible Licensing and Pricing: Providers can offer data under public or private offers with flexible pricing models (e.g., monthly/annual subscription, pay-as-you-go).

  • Secure and Compliant: AWS Data Exchange scans data for security threats. Data is encrypted in transit and at rest.

Types of Datasets

AWS Data Exchange supports three main types of datasets:

  1. File-Based Datasets: The most common type. Providers upload files (CSV, JSON, Parquet, etc.) to S3. Subscribers export these files to their own S3 buckets.

  2. Amazon Redshift Datasets: Providers make read-only tables in their Redshift data warehouse available. Subscribers can query this data directly from their own Redshift clusters without any data movement, enabling live, up-to-date analysis.

  3. API Datasets: Providers can offer access to their APIs. AWS Data Exchange manages the authentication and billing, allowing subscribers to call the third-party API through a consistent AWS SDK experience.

Billing

  • Subscribers: Pay the subscription price set by the data provider, plus standard AWS fees for any services they use to store, process, or analyze the data (e.g., S3 storage costs, Athena query costs).

  • Providers: Pay for the S3 storage used by their data products and a fulfillment fee for each paid subscription.