This post introduces concepts and principles used to design DynamoDB tables as I have been learning how to leverage this technology in a new project.
This introduction will compare and contrast the core concepts and ideas with those found in relational database management systems (RDBMS).
While using DynamoDB the last couple of months, one of the big takeaways is that data modeling is a vital part and that the modeling process itself looks structurally similar to relational database design. The high-level process still looks like this:
Define how entities relate to each other (e.g. ERDs).
Determine your data access patterns based on business requirements. In the RDBMS world this will help you decide how normalized your relational schema should be; in DynamoDB world you have different levers but the access patterns drive the DynamoDB table design significantly.
Design your primary key and secondary indices based off of your data access pattern needs. Specifics documented via examples below.
However, some relational database concepts or ideas will hinder your thought process, specifically:
one table for one entity and/or relationship
JOINacross tables (we implicitly "join" within the same DynamoDB table across partitions)
|DynamoDB Concept||RDBMS Analog||Description|
|Table||set of related tables||DynamoDB tables less rigidly defined|
|Partition||table||Multiple entities can be modeled inside the same DDB table|
|Item||record||Key-Value pairs describing a data value|
|Primary Key||primary key||Uniquely identify each item in a DynamoDB table|
|Attribute||column||Attributes are more flexible and differ across items|
The API consists of operations on:
items (requires you to identify the full primary key), includes batch operations
scans (like a full table scan, which you want to avoid)
DynamoDB Primary Keys
You can have two kinds of primary keys:
Simple primary key (partition key)
Composite primary key (partition key + sort key)
The partition key is used to disperse data across shards. Items with the same partition key reside in the same partition (some developers may be use the analogy as of a "shard" for their initial intuitions).
Sort keys are used to create ranges of items within a partition.
So far we have been modeling complex enough data domains such that composite primary keys are essential for our query access model.
Retrieving items using a DynamoDB table's primary key is the most efficient way, but at times we will still need to support a query pattern that isn't supported by the primary key. This is where, to avoid inefficient scan operations, we are able to use secondary indices.
There are two types of secondary indices:
- Global Secondary Index (GSI)
can be used to provide different partition and sort keys.
- Local Secondary Index (LSI)
used with a composite primary key where the partition key is the same but the sort key is different.
Our DynamoDB tables typically use LSIs and a couple use GSIs.
Data Modeling for DynamoDB
There are a few explicit steps that we borrow from our prior experience of data modeling in the relational world:
identify access patterns
identify self-describing mechanism for primary keys
identify secondary indices based on access patterns as input
Defining naming conventions
It doesn't matter if your fields are Capitalized, snake_cased, kebab-cased, camelCased, or PascalCased just pick something and make it consistent. The naming convention should be defined for:
Primary Key names
Just keep it simple and make sure it is applied consistently across your services (that one team will work on).
It seems silly to do first, but it will save a lot of irritation or rework later.
Think about the nouns in your domains and then the relationships
between them. This provides a good starting point for identifying your
entities. This is basically the identical process to identifying
entities for relational database design. The one difference is that we
will typically model many entities in one DynamoDB table together
whereas a table in the relational database way of thinking serves a
fundamentally different aim given a fundamental primitive of
relational database is the
JOIN which is absent in DynamoDB.
Attributes are usually easily identified after finding your entities. This is identical to identifying the columns for relational schema design before normalizing data.
The way you identify access patterns to design a DynamoDB table well (for your current needs) is much like you would identify access patterns in the RDBMS world: by understanding the business requirements of the software. Second/third systems have the benefit of having clarity of access and usage patterns, but the risk in migrating data from one datastore to another is that the migrations aren't particularly simple, not to mention cutting over 24/7 systems over without downtime or data loss.
When reading user stories or technical stories it is possible to infer how data would need to be queried.
Defining partition and sort keys that are self-describing based on a scheme of our entity model has allowed us to encode consistent querying and writing of items to provide a higher-level domain-oriented API on top of the lower-level DynamoDB API.
There is a lot I didn't cover especially with respect to cost minimization. I will attempt to revisit that topic and recommended practices based on our experiences as we learn more.