Skip to content

DynamoDb Multi-Attribute Keys #535

@aphex

Description

@aphex

Announced yesterday DynamoDB can now manage multi-attribute composite PK's and SK's for GSIs. This is a pretty valuable upgrade for Dynamo users. Though performance still needs to be tested the assumption is having access to native data types will be a win. However a more interesting direct benefit is the ability to add new access patterns without having to backfill data. A GSI can be created across any existing attributes and Dynamo will handle querying and sparse indexing.

Reference

https://aws.amazon.com/blogs/database/multi-key-support-for-global-secondary-index-in-amazon-dynamodb/
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.DesignPattern.MultiAttributeKeys.html

Summary

This Issue is to discuss how this could be implemented in Electro, keeping the existing concatenation pattern but moving towards the native managed solution.

Below are some initial thoughts and examples. The TLDR of this is: We introduce a new index type of multi-attribute this then forces the pk and sk properties to be arrays. These arrays are ordered just like the KeySchema is in the Dynamo configuration. I also propose introducing a "magic string" to indicate the entity type. Luckily all ElectroDB Items have been writing this automatically so users will have access to this when creating new access patterns.

Sample Code

import { Entity } from 'electrodb'

const User = new Entity(
  {
    model: {
      entity: 'user',
      service: 'user-directory',
      version: '1',
    },
    attributes: {
      userId: {
        type: 'string',
      },
      firstName: {
        type: 'string',
        required: true,
      },
      lastName: {
        type: 'string',
        required: true,
      },
      birthYear: {
        type: 'number',
      },
      petsOwned: {
        type: 'number',
      },
    },
    indexes: {
      byUserId: {
        pk: {
          field: 'pk',
          composite: ['userId'],
        },
        sk: {
          field: 'sk',
          composite: [],
        },
      },
      // Indicate this index is a multi-attribute index so there is no "real" PK or SK attributes
      // Switching pk to an array should allow TS to infer this is a different kind of index and also
      // enforce that sk should also only be an array.
      // `collection` would not be an available option here when using a multi-attribute index
      // There is also no casing, template, or casting of the index attribute as it is read directly from
      // Dynamo.
      byNameAndBirthYear: {
        index: 'name_born-gsi',
        type: 'multi-attribute',
        pk: ['lastName'],
        sk: ['birthYear'],
      },
      // In this scenario the user has added the `__edb_e__` attribute to the KeySchema as PK1
      // this now means we do not need to filter on the entity name and the entity is now part
      // of the index PK
      byEntity_NameAndBirthYear: {
        index: 'user_name_born-gsi',
        type: 'multi-attribute',
        pk: ['$entity', 'lastName'], // some special attribute to indicate the entity name
        sk: ['birthYear'],
      },
      // This is another potential multi-attribute index use case where a user needs to query across
      // the whole entity partition. However this may be such bad practice we do not provide a
      // way to do this OOTB.
      byPetsOwned: {
        index: 'user-pets-gsi',
        type: 'multi-attribute',
        pk: ['$entity'], // some special attribute to indicate the entity name
        sk: ['petsOwned'],
      },
    },
  },
  { table: 'your_table_name' }
)

// Query by users last name and born before 1980
// This index cannot guarantee the results are `User` entities so we will need to use a filter
// It is likely best to inform users that creating unique attributes for multi-attribute GSI's would
// be a best practice when not using `$entity`. 
// This pattern allows users to add new multi-attribute indexes to their existing data without
user.query.byNameAndBirthYear({ lastName: 'Doe' }).gt({ birthYear: 1980 }).go()
**// Results in**
// {
//   TableName: 'UserTable',
//   IndexName: 'name_born-gsi',
//   KeyConditionExpression: '#lastName = :lastName AND #birthYear > :birthYear',
//   ExpressionAttributeNames: {
//     '#lastName': 'lastName',
//     '#birthYear': 'birthYear',
//     '#entity': '__edb_e__',
//   },
//   ExpressionAttributeValues: {
//     ':lastName': 'Doe',
//     ':birthYear': 1980,
//     ':entity': 'User',
//   },
//   "FilterExpression": "#entity = :entity"
// }

// Similar to the previous example but here we are adding the entity name to the index instead of reaching
// across all entities.
user.query.byEntity_NameAndBirthYear({ lastName: 'Doe' }).gt({ birthYear: 1980 }).go()
**// Results In**
// {
//   TableName: 'UserTable',
//   IndexName: 'user_name_born-gsi',
//   KeyConditionExpression: '#entity = :entity AND #lastName = :lastName AND #birthYear > :birthYear',
//   ExpressionAttributeNames: {
//     '#lastName': 'lastName',
//     '#birthYear': 'birthYear',
//     '#entity': '__edb_e__',
//   },
//   ExpressionAttributeValues: {
//     ':lastName': 'Doe',
//     ':birthYear': 1980,
//     ':entity': 'User',
//   }
// }

// Query all users that have more than 5 pets
// Note here it is possible there is no need for any PK attributes as it is querying across the whole entity
// partition. Electro could take care of this for you. This is obviously a dangerous pattern as it can easily
// lead to a hot partition.
user.query.byPetsOwned().gt({ petsOwned: 5 }).go()
**// Results In**
// {
//   TableName: 'UserTable',
//   IndexName: 'user-pets-gsi',
//   KeyConditionExpression: '#entity = :entity AND #petsOwned > :petsOwned',
//   ExpressionAttributeNames: {
//     '#entity': '__edb_e__',
//     '#petsOwned': 'petsOwned',
//   },
//   ExpressionAttributeValues: {
//     ':entity': 'User',
//     ':petsOwned': 5,
//   }
// }

These examples assume the following DynamoDB configuration

{
GlobalSecondaryIndexes: [
        {
            IndexName: 'name_born-gsi',
            KeySchema: [
                { AttributeName: 'lastName', KeyType: 'HASH' },    // GSI PK 1
                { AttributeName: 'birthYear', KeyType: 'RANGE' },  // GSI SK 1
            ],
            Projection: { ProjectionType: 'ALL' }
        },
        {
            IndexName: 'user_name_born-gsi',
            KeySchema: [
                { AttributeName: '__edb_e__', KeyType: 'HASH' },       // GSI PK 1
                { AttributeName: 'lastName', KeyType: 'HASH' },       // GSI PK 2
                { AttributeName: 'birthYear', KeyType: 'RANGE' },      // GSI SK 1
            ],
            Projection: { ProjectionType: 'ALL' }
        },
        {
            IndexName: 'user-pets-gsi',
            KeySchema: [
                { AttributeName: '__edb_e__', KeyType: 'HASH' },       // GSI PK 1
                { AttributeName: 'petsOwned', KeyType: 'RANGE' },      // GSI SK 1
            ],
            Projection: { ProjectionType: 'ALL' }
        }
    ]
}

Additional Options

It is also worth noting it may be valuable to introduce a new common attribute of __edb_c__ which would be the collection name. This would allow collections to participate in multi-attribute indexes. This is not a requirement but something to consider moving forward. It could allow for a similar $collection attribute to be used in the in a multi-attribute index PK to get multiple entities from a collection. Since currently collection names are backed into SK compositing they will not be available for this new feature. However when creating new entities, or new entity versions, we could start providing this option. By using collection Electro could know about this connection across entities.

However it would also be possible for a user to simply used shared attributes to create a collection. For example they could add an attribute of collection to each entity and then add collection as a PK to a multi-attribute indexes KeySchema. This would then return multiple entities from the query. This opens some very interesting options for cross entity querying and may put some work back on Electro to property separate these entities for queries on a multi-attribute index that is not using the $entity template string.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions