Amazon CloudWatch Cheat Sheet
Amazon CloudWatch is a monitoring and observability service for AWS cloud resources and the applications you run on AWS. It enables you to collect and track metrics, collect and monitor log files, set alarms, and automatically react to changes in your AWS resources.
Core Concepts
Metrics
-
A metric is the fundamental concept in CloudWatch and represents a time-ordered set of data points.
-
Think of a metric as a variable to monitor, and the data points as the values of that variable over time.
-
Metrics are uniquely defined by a name, a namespace, and one or more dimensions.
-
Metrics exist only in the region they are created in.
Namespaces
-
A namespace is a container for CloudWatch metrics.
-
Metrics from different AWS services are placed in different namespaces, starting with the
AWS/
prefix (e.g.,AWS/EC2
,AWS/EBS
). -
When you create a custom metric, you can specify a custom namespace.
Dimensions
-
A dimension is a name/value pair that is part of the identity of a metric.
-
You can assign up to 32 dimensions to a metric.
-
Think of a dimension as a category or attribute for a metric (e.g.,
InstanceId=i-1234567890abcdef0
).
Timestamps & Periods
-
Each data point in a metric must be marked with a timestamp.
-
A period is the length of time associated with a specific CloudWatch statistic, specified in seconds. The default value is 60 seconds.
Statistics
-
Statistics are metric data aggregations over specified periods.
-
Minimum: The lowest value observed during the period.
-
Maximum: The highest value observed during the period.
-
Sum: The sum of all values for the metric.
-
Average: The
Sum
divided by theSampleCount
. -
SampleCount: The number of data points used for the calculation.
-
Percentiles (pNN.NN): Indicates the relative standing of a value in a dataset (e.g., p95 shows the 95th percentile). Useful for understanding the distribution of data.
Alarms
-
An alarm watches a single metric over a specified time period and performs one or more actions based on the value of the metric relative to a threshold.
-
The action can be a notification to an Amazon SNS topic or an AWS Auto Scaling action.
-
Alarm States:
-
OK
: The metric is within the defined threshold. -
ALARM
: The metric is outside the defined threshold. -
INSUFFICIENT_DATA
: The alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state.
-
CloudWatch Logs
-
Monitor, store, and access your log files from Amazon EC2 instances, AWS CloudTrail, Route 53, and other sources.
-
Log Event: A record of activity recorded by the application or resource being monitored.
-
Log Stream: A sequence of log events that share the same source.
-
Log Group: A group of log streams that share the same retention, monitoring, and access control settings.
-
Metric Filters: You can create metric filters to search for and match terms, phrases, or values in your log events. You can turn these into CloudWatch metrics and alarms.
-
Logs Insights: Enables you to interactively search and analyze your log data using queries.
-
Vended Logs: Logs that are natively published by AWS services (e.g., VPC Flow Logs).
Other CloudWatch Features
Unified CloudWatch Agent
-
Collects both system-level metrics and log files from Amazon EC2 instances and on-premises servers (Windows and Linux).
-
Provides more detailed metrics than the standard EC2 monitoring (e.g., memory usage, disk space).
Container Insights
-
Collects, aggregates, and summarizes metrics and logs from your containerized applications and microservices (ECS, EKS, Kubernetes on EC2, and Fargate).
-
Provides diagnostic information, such as container restart failures, to help you isolate issues.
Contributor Insights
- Analyzes log data to create time-series data. It helps you understand who or what is impacting your system and application performance by identifying top contributors.
Metric Insights
- A flexible query capability that allows you to aggregate and group your metrics in real-time to quickly identify issues.
Evidently
- A feature for A/B testing and feature flags. You can test new application features by serving them to a subset of your users and monitoring performance before a full launch.
Real-User Monitoring (RUM)
- Collects and views client-side data about your web application performance from actual user sessions.
Pricing
You are charged for:
-
The number of metrics stored per month.
-
API requests (per 1,000 metrics requested).
-
Dashboards (per dashboard per month).
-
Alarms (per alarm metric).
-
Logs (per GB ingested and stored).
-
Events (per million custom events).