Amazon Neptune

Supported Graph Models & Query Languages

Neptune supports two popular open standards for graph models and query languages:

Structure: Composed of vertices (data entities), edges (relationships), and properties (attributes for vertices and edges).
Query Language: Gremlin. Gremlin is a graph traversal language used to explore the graph, moving from vertex to vertex along edges.
Best For: Models that focus on the pathways and relationships between entities, like social networks or logistics tracking.

Structure: Data is represented as a set of three-part statements called "triples": Subject-Predicate-Object. This model is excellent for representing complex information and knowledge domains.
Query Language: SPARQL (SPARQL Protocol and RDF Query Language). SPARQL is a declarative query language used to find patterns within the RDF data.
Best For: Knowledge graphs, data integration, and applications where metadata and complex entity attributes are critical.

Social Networking: To model and query complex relationships between users, such as friends, likes, shares, and follows, with low latency.
Recommendation Engines: To store relationships between customer interests, purchase history, and products to provide personalized and relevant recommendations.
Fraud Detection: To identify patterns of fraud by analyzing relationships between accounts, transactions, and devices in real-time.
Knowledge Graphs: To build and query complex information models, enabling search engines and AI applications to discover and reason about information.
Identity Graphs: To link user profile data from various sources (e.g., web, mobile, CRM) for better ad-targeting, personalization, and analytics.

The Neptune architecture is similar to Amazon Aurora, with a design that decouples storage and compute.

DB Cluster: The primary component, which contains your data in a cluster volume and manages one or more DB instances.
Cluster Volume: A single, virtual storage volume that automatically scales as your data grows (up to 128 TiB). Your data is replicated six ways across three Availability Zones for high durability.
Primary (Writer) Instance: Each cluster has a single primary instance that handles all write operations and supports read operations.
Neptune Read Replicas: You can add up to 15 read replicas to a cluster. These replicas share the same underlying storage as the primary instance and are used to scale read throughput. They have minimal replica lag.

Storage Type	Cluster Volume	Instance (Local Storage)
Data Type	Persistent graph data	Temporary data, logs, cache
Scalability	Automatically scales out as needed	Limited to the DB Instance class

6-Way Replication: Data is replicated six times across three Availability Zones to withstand failures without data loss.
Self-Healing Storage: Data blocks and disks are continuously scanned for errors and repaired automatically.
Automatic Failover: If the primary instance fails, Neptune automatically promotes one of the read replicas to become the new primary. You can assign priority tiers to control which replica is promoted first.

Low Latency: Optimized for graph queries, providing millisecond latency.
Read Scaling: Supports up to 15 read replicas to handle high-volume read traffic.
Query Optimization: Includes optimizers for both Gremlin and SPARQL queries.

Automated Backups: Always enabled. Backups are stored in Amazon S3.
Point-in-Time Restoration (PITR): You can restore your database to any point within your backup retention period, down to a granularity of five minutes.
Manual Snapshots: You can take manual snapshots of your cluster for long-term archival.
Monitoring: Use the Neptune Workbench for visualizing your graph. Receive event notifications for clusters, instances, and snapshots via Amazon SNS.

Encryption at Rest: Data can be encrypted at rest using keys managed in AWS Key Management Service (KMS). Encryption must be enabled when the cluster is created.
Encryption in Transit: Enforces HTTPS with a minimum of TLS v1.2 for all client connections.
Network Isolation: Neptune runs within an Amazon VPC, allowing you to isolate your database and control network access using security groups.
Authentication & Authorization: Access is managed through AWS IAM policies.

No Cross-Region Replicas: Unlike Aurora, Neptune does not currently support cross-region read replicas.
Encryption of Existing Clusters: You cannot encrypt a Neptune cluster after it has been created. To do so, you must create a new encrypted cluster and migrate your data.
Snapshot Sharing: Automatic snapshots cannot be shared directly. You must first create a manual copy of the snapshot and then share the manual copy.
Sources