The Problem It Solves
Traditionally, getting an ML prediction for data stored in a database required a multi-step, often slow, process:
- Build a custom application to extract data from the database.
- Send the data in batches to a hosted ML model endpoint.
- Receive the predictions back in the application.
- Store the predictions or join them with the original data.
Aurora ML simplifies this entire workflow into a single SQL query, eliminating the need for custom applications or data movement.
Supported AWS Services & Use Cases
Aurora ML is natively integrated with two key AWS Machine Learning services:
1. Amazon SageMaker
- What it is: A fully managed service for building, training, and deploying any kind of ML model.
- Aurora Integration: Allows you to invoke your custom SageMaker models directly from Aurora.
- Common Use Cases:
- Fraud Detection: Pass transaction data to a classification model to get a real-time fraud score.
- Predicting Customer Churn: Analyze customer activity data to predict which customers are likely to leave.
- Product Recommendations: Recommend products to users based on their Browse history and past purchases.
- Credit Risk Assessment: Evaluate loan applications by passing applicant data to a risk model.
2. Amazon Comprehend
- What it is: A Natural Language Processing (NLP) service that uses ML to find insights in text.
- Aurora Integration: Allows you to analyze text stored in your database columns.
- Common Use Cases:
- Sentiment Analysis: Determine if a product review or social media comment stored in a
VARCHAR
column is positive, negative, neutral, or mixed. This is the most common use case. - Entity Detection: Extract key entities like people, places, and brands from text.
- Sentiment Analysis: Determine if a product review or social media comment stored in a
How It Works: The High-Level Workflow
- Grant Permissions: An administrator grants the Aurora DB cluster permission to access SageMaker and/or Comprehend via an IAM role.
- Define a Function: A database user defines a stored function using SQL. This function points to a specific SageMaker model endpoint or a Comprehend action (e.g., sentiment analysis).
- Invoke via SQL: The user calls the newly defined function within a standard SQL query, passing one or more table columns as inputs.
- Batch & Predict: Aurora automatically gathers the data from the query, calls the AWS ML service in an optimized batch format, and gets the predictions.
- Return Results: The predictions are returned to the user as a new column or value within the query results.
Key Benefits
- Real-Time Predictions: Because the integration is highly optimized and low-latency, you can enrich your application data with ML predictions in real-time.
- Simplified Architecture: No need for "middleman" applications or complex data pipelines to move data for inference.
- Improved Security: Data doesn't have to leave your VPC to get predictions (when using VPC Endpoints for the ML services), enhancing your security posture.
- Ease of Use: Any developer or DBA who knows SQL can add ML capabilities to an application without needing deep ML expertise.
Compatibility & Pricing
- Supported Engines:
- Aurora MySQL (version 2.07.0 and higher, compatible with MySQL 5.7)
- Aurora PostgreSQL (version 1.1 and higher, compatible with PostgreSQL 10 and 11+)
- Pricing: There is no additional charge for the Aurora Machine Learning feature itself. You only pay for the underlying usage of the SageMaker or Comprehend services that you invoke.
Sources