Redacting PIIs Using S3 Object Lambda

Primary Use Case: Dynamic Data Redaction

The most common use case for S3 Object Lambda is to automatically redact Personally Identifiable Information (PII) or other sensitive data from a dataset before it is returned to an application.

The Problem: You have a single, authoritative dataset in S3 (e.g., a CSV file with customer information), but you need to serve a "clean" version to an analytics application or a less-privileged user without the sensitive fields (like social security numbers or email addresses).
The Solution: S3 Object Lambda intercepts the GET request for the object, runs a Lambda function to remove or mask the sensitive data in-memory, and returns the redacted version to the user. The original, unaltered object in your S3 bucket remains untouched.

The Architecture and Data Flow

Understanding the request lifecycle is key to understanding S3 Object Lambda.

Request Initiation: An application makes a standard GetObject request, but instead of targeting the S3 bucket directly, it targets a unique S3 Object Lambda Access Point.
Lambda Invocation: Amazon S3 intercepts this request and automatically invokes the associated AWS Lambda function.
Lambda Receives Event: S3 sends an event payload to the Lambda function. This event contains context about the request, including a pre-signed URL which provides temporary, secure access to the original object in the source S3 bucket.
Lambda Fetches Original Object: The Lambda function's code uses the pre-signed URL to download the full, original object from S3.
Data Transformation: The code performs the necessary data transformation in memory. For PII redaction, this would involve finding and replacing sensitive data with a placeholder like [REDACTED].
Data Returned to S3: The Lambda function streams the transformed (redacted) data back to S3 using a special API call (WriteGetObjectResponse).
Response to Application: S3 streams the transformed data from the Lambda function back to the application that made the original request. To the application, this entire process is transparent; it simply receives the object data it requested.

Key Components to Configure

To set up S3 Object Lambda, you need to configure four main AWS resources:

The Source S3 Bucket: This bucket holds your original, unmodified data.
A Supporting S3 Access Point: A standard S3 Access Point that provides access to the source bucket. The Object Lambda Access Point will be layered on top of this.
The AWS Lambda Function: This contains your custom Python, Node.js, or other code that will perform the data transformation.
The S3 Object Lambda Access Point: This is the final, unique endpoint that your applications will use to request data. This is where you connect your Lambda function with the supporting S3 Access Point.

Inside the Lambda Function

Your Lambda function's code is the heart of the process.

Event Payload: The event object passed to your Lambda handler contains the getObjectContext, which holds two crucial pieces of information:
- inputS3Url: The pre-signed URL to fetch the original object.
- outputRoute: A token that you must pass back to S3 when you return the transformed data.
Required IAM Permission: The Lambda function's execution role must have the s3-object-lambda:WriteGetObjectResponse permission to be able to send the processed data back to S3.

Conceptual Code Logic (Python):


import boto3

import requests



s3 = boto3.client('s3')



def lambda_handler(event, context):

    # 1. Get context from the event

    context = event['getObjectContext']

    s3_url = context['inputS3Url']

    output_route = context['outputRoute']

    output_token = context['outputToken']



    # 2. Fetch the original object from S3

    response = requests.get(s3_url)

    original_object = response.text



    # 3. Perform your data transformation

    redacted_object = original_object.replace("sensitive-data", "[REDACTED]")



    # 4. Stream the transformed object back to S3 Object Lambda

    s3.write_get_object_response(

        Body=redacted_object,

        RequestRoute=output_route,

        RequestToken=output_token

    )



    return {'status_code': 200}

Other Common Use Cases

Beyond PII redaction, S3 Object Lambda is useful for many other on-the-fly transformations:

Dynamically resizing images or adding watermarks.
Converting data formats, such as converting XML to JSON.
Compressing or decompressing data.
Enriching data by combining it with information from other data sources.