Amazon OpenSearch Service (formerly Elasticsearch Service) Cheat Sheet
Amazon OpenSearch Service is a managed service that makes it easy to deploy, operate, and scale OpenSearch clusters in the AWS Cloud. OpenSearch is a distributed, open-source search and analytics suite used for a broad set of use cases like real-time application monitoring, log analytics, and website search.
Core Concepts
-
OpenSearch Domain: This is synonymous with an OpenSearch/Elasticsearch cluster. It is a collection of resources (data nodes, master nodes, storage) that work together. You can specify instance types, instance counts, and storage resources when you create a domain.
-
Index: An index is a collection of documents that have somewhat similar characteristics. It is the highest level of data organization in OpenSearch. For example, you can have an index for product data and another for log data.
-
Document: A document is a basic unit of information that can be indexed. It is represented in JSON (JavaScript Object Notation). In an index of products, each document would represent one product.
-
Shard: Because OpenSearch is a distributed search engine, an index is typically split into multiple pieces called shards. Each shard is in itself a fully-functional and independent "index" that can be hosted on any node in the cluster. Sharding allows you to horizontally split/scale your content volume.
-
Replica: OpenSearch allows you to make one or more copies of your index’s shards, which are called replica shards or replicas. Replication provides high availability in case a shard or node fails and also increases search throughput by allowing searches to be executed on all replicas in parallel.
Key Features
-
Fully Managed: Automates time-consuming tasks such as hardware provisioning, software installation and patching, failure recovery, backups, and monitoring.
-
Scalable: You can easily scale your cluster up or down via the AWS Console, CLI, or API by adding or removing instances, changing instance types, and increasing or decreasing storage.
-
Secure: Provides multiple levels of security, including:
-
Network Isolation: Can be launched into a VPC for secure and isolated access.
-
Access Control: Integrates with AWS IAM and Amazon Cognito for fine-grained access control to the cluster APIs.
-
Encryption: Supports encryption at rest (using AWS KMS) and in-transit (using TLS).
-
-
Integrated with AWS Services: Natively integrates with other AWS services like Amazon Kinesis for data ingestion, AWS IoT, CloudWatch Logs, and S3.
-
Kibana & Logstash: Comes with a fully integrated version of OpenSearch Dashboards (and Kibana for older Elasticsearch versions) for data visualization. It also supports Logstash for data ingestion and transformation.
Deployment & Updates
- Blue/Green Deployment: When you initiate a domain update (e.g., changing instance types, updating the software version), OpenSearch Service uses a blue/green deployment process. It creates a new environment for the update and, once the changes are validated, cuts over to the new environment, minimizing downtime. You are not charged for the extra resources used during the update process.
Storage and Snapshots
-
Storage Options: You can choose between:
-
Instance Store: Local, temporary block-level storage. Offers the best performance but data persists only for the life of the instance.
-
Amazon EBS: Persistent block storage volumes. Recommended for most workloads. Both General Purpose (SSD) and Provisioned IOPS (SSD) volumes are available.
-
-
Snapshots: Snapshots are backups of a cluster's data and state.
-
Automated Snapshots: The service automatically takes daily snapshots of each domain and retains them for 14 days, free of charge.
-
Manual Snapshots: You can take manual snapshots at any time, which are stored in an S3 bucket of your choice and incur standard S3 charges. Manual snapshots are required for migrating data between domains or for long-term backup.
-
Data Ingestion
You can load data into your OpenSearch domain using various methods:
-
Amazon Kinesis Data Firehose: A fully managed service for delivering real-time streaming data directly to destinations like OpenSearch Service.
-
Logstash: An open-source data processing pipeline that can collect data from various sources, transform it, and send it to your domain.
-
AWS IoT: Rules can be set up to send data from IoT devices directly to your OpenSearch domain.
-
Amazon CloudWatch Logs: You can subscribe a domain to receive log events from CloudWatch Log groups.
-
Lambda Event Handlers: Use Lambda functions to process and load streaming data from sources like S3 or Kinesis Data Streams.
Security
-
IAM Access Control: Use IAM policies to control who can access the OpenSearch Service configuration APIs.
-
Fine-Grained Access Control: Within the domain, you can define roles, users, and permissions that determine who can read, write, or manage indices. This provides security at the index, document, and even field level.
-
VPC Access: Launching your domain within a VPC allows you to use security groups and network ACLs to secure access to the cluster.
-
Amazon Cognito Integration: Use Amazon Cognito to provide user authentication for OpenSearch Dashboards/Kibana, enabling secure login with usernames and passwords.
-
Encryption:
-
At Rest: Encrypts data stored in the domain and in automated snapshots.
-
In Transit: Encrypts data as it moves between nodes in the domain using TLS.
-