AWS Analytics Services

Amazon MSK

5 min read
Updated June 23, 2025
5,475 characters

I am sorry, I encountered an error while trying to access the URL. I will try again.

Amazon MSK (Managed Streaming for Apache Kafka) Cheat Sheet

Amazon MSK is a fully managed service that makes it easy to build and run applications that use Apache Kafka to process streaming data. Apache Kafka is an open-source platform for building real-time streaming data pipelines and applications.

Core Apache Kafka Concepts

  • Broker: A Kafka server that stores data. An MSK cluster is composed of multiple broker nodes.

  • Topic: A category or feed name to which records are published. Topics in Kafka are multi-subscriber; that is, a topic can have zero, one, or many consumers that subscribe to the data written to it.

  • Partition: Topics are divided into a number of partitions. Partitions allow you to parallelize a topic by splitting the data across multiple brokers. This allows for scalability.

  • Producer: An application that publishes (writes) a stream of records to one or more Kafka topics.

  • Consumer: An application that subscribes to (reads and processes) a stream of records from one or more Kafka topics.

  • ZooKeeper: A centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. MSK manages the ZooKeeper nodes for you.

Amazon MSK Features

  • Fully Managed: MSK provisions, configures, and manages your Apache Kafka clusters and Apache ZooKeeper nodes. It handles tasks like patching, failure detection, and recovery.

  • High Availability: MSK automatically distributes broker nodes across multiple Availability Zones (AZs) within a region to protect against AZ failures.

  • Scalable: You can easily scale out your cluster by adding more brokers or increase the storage per broker with a few clicks.

  • Secure:

    • Runs in an Amazon VPC, allowing you to use your own security groups to control network access.

    • Supports encryption in transit via TLS and encryption at rest using AWS KMS.

    • Supports multiple authentication and authorization methods, including IAM client authentication, SASL/SCRAM, and mTLS.

  • Highly Compatible: MSK is fully compatible with open-source Apache Kafka. You can use standard Kafka APIs and tools to interact with your cluster. You can migrate existing Kafka applications to MSK with minimal code changes.

MSK Cluster Types

Amazon MSK offers two types of clusters to fit different workload needs.

1. MSK Provisioned

  • You choose the number and type of broker instances.

  • You provision and manage the cluster's capacity, giving you fine-grained control over your cluster configuration.

  • Best for: Applications with predictable or stable throughput where you want to manage capacity yourself to optimize costs.

Provisioned Concepts:

  • Broker Instance Type: You select the EC2 instance family and size for your brokers (e.g., kafka.m5.large).

  • Storage: You attach Amazon EBS volumes to each broker for data storage. You can scale this storage up without any downtime.

  • Configuration: You can provide a custom Kafka configuration to tune your cluster's performance and behavior.

2. MSK Serverless

  • Automatically provisions and scales compute and storage resources, so you don't have to manage capacity.

  • It's a serverless option that simplifies running Kafka clusters.

  • Best for: New applications, applications with unpredictable traffic, or workloads with variable throughput where you want to eliminate capacity management.

Serverless Concepts:

  • MSK Serverless automatically manages partitions, so you don't need to manually reassign them when the cluster scales.

  • Pricing is based on throughput, making it easy to align costs with usage.

Security in Amazon MSK

MSK provides a multi-layered security model.

  • Network Level:

    • VPC: All clusters are deployed within your VPC.

    • Security Groups: Act as a firewall for your brokers, controlling inbound and outbound traffic.

  • Authentication (Who can connect?):

    • IAM Client Authentication: Use IAM roles and users to control who can connect to the cluster. This is a highly secure, AWS-native method.

    • SASL/SCRAM: Use username and password credentials managed via AWS Secrets Manager.

    • Mutual TLS (mTLS): Use client certificates from AWS Certificate Manager (ACM) Private Certificate Authority.

    • Unauthenticated: Allows any client within the VPC to connect (not recommended for production).

  • Authorization (What can they do?):

    • Use Kafka Access Control Lists (ACLs) to define read/write permissions for specific users on specific topics. You must enable an authentication method to use Kafka ACLs.
  • Encryption:

    • Encryption in Transit: Enforced using TLS between clients and brokers, and between brokers themselves.

    • Encryption at Rest: Data stored on EBS volumes is encrypted using AWS KMS keys.

Monitoring

  • MSK integrates with Amazon CloudWatch to provide key performance metrics for your clusters at no additional cost.

  • You can monitor at different levels: Default, Per-Broker, and Per-Topic-Per-Broker.

  • Open Monitoring with Prometheus: MSK can expose broker metrics in an open format that can be scraped by tools like Prometheus, allowing you to use other monitoring and alerting systems like Grafana.