Susan Potter

Unique identifier types

Fri January 1, 2023

Have you ever found yourself in a situation where you needed to generate unique identifiers (UIDs) for objects, records, or entities within a system? It's a common problem in software development, and there are different schemes to generate UIDs. Each of these schemes has its own set of properties and tradeoffs, so it's important to choose the right one for your needs. In this article, we'll introduce you to some of the most commonly used UID schemes and help you understand the pros and cons of each.

At first, you might think to use an auto-incrementing counter for each new row in your database to generate unique identifiers. But there are some big drawbacks to using auto-incrementing numbers: they're predictable, not guaranteed to be unique, and they can expose sensitive information. Plus, they don't scale well in systems with a lot of concurrency or that generate a lot of unique identifiers.

Specifically auto-incrementing counters:

Are Predictable

Auto-incrementing numbers are predictable, which means that it is possible for someone to guess the unique identifier of a resource. This could potentially be a security risk, as it could allow someone to tamper with or access resources that they should not have access to.

Lack of uniqueness

Auto-incrementing numbers are not guaranteed to be unique, especially in a distributed system where multiple devices or processes are generating unique identifiers. If two devices or processes generate an auto-incrementing number at the same time, it is possible for them to generate the same number, leading to a collision.

Expose sensitive information

In some cases, the auto-incrementing number itself may contain sensitive information, such as the number of resources that created. This information could potentially be used by an attacker to gain insights into the system or to launch an attack.

Limit scalability (single point of failure)

Auto-incrementing numbers may not scale well in systems with high levels of concurrency or that generate a large number of unique identifiers. This is because the process of incrementing the number and storing it in a database can become a bottleneck, leading to performance issues.

Overview of non-auto-incrementing identifiers

So what can you use instead? There are lots of options out there like UUIDs, ULIDs, NanoIDs, HashIDs, and SnowflakeIDs. These formats can help you generate unique identifiers that are not predictable, guarantee uniqueness, and don't expose sensitive information depending on your need. They're definitely worth considering the next time you need to generate unique identifiers in your system.

UUIDs

UUID stands for universally unique identifier. It is a string of characters that guarantees uniqueness across devices. UUIDs are often used to uniquely identify resources, such as database records or files. There are three active versions of UUIDs, but the most common one is version 4, which uses random numbers to generate a UUID:

UUIDv1

This version of UUID uses the MAC address of the device generating the UUID, along with a timestamp and a randomly generated number, to create a unique identifier. The MAC address provides a unique identifier for the device, and the timestamp and random number ensure that the UUID will be unique even if multiple UUIDs get generated on the same device within the same timestamp.

UUIDv4

This version of UUID uses random numbers to generate a UUID. It is the most common version of UUID, and it is the version that is most often used when you need a random unique identifier.

UUIDv5

UUIDv5 uses a hashing algorithm (SHA-1) to generate a UUID based on this input, which means that it guarantees uniqueness as long as the namespace and name are unique.

Using TypeScript as our lingua franca here is how you can generate each version with the uuid ES module (tested using Deno):

import {
    v1 as uuidv1,
    v4 as uuidv4,
    v5 as uuidv5
} from 'https://cdn.skypack.dev/uuid?dts';

const url = 'https://example.org/ns/foobarqux';

function UUID(v1, v4, v5) {
  this.v1 = v1;
  this.v4 = v4;
  this.v5 = v5;
}

const genUUIDs = (_: number) =>
  new UUID(
    uuidv1(),
    uuidv4(),
    uuidv5(url, uuidv5.URL)
  );

console.table([1,2,3,4].map(genUUIDs), ["v1", "v4", "v5"]);
┌───────┬────────────────────────────────────────┬────────────────────────────────────────┬────────────────────────────────────────┐
│ (idx) │ v1                                     │ v4                                     │ v5                                     │
├───────┼────────────────────────────────────────┼────────────────────────────────────────┼────────────────────────────────────────┤
│     0 │ "cb083f70-8fa3-11ed-80f7-1b51dc25a5a9" │ "d25117c0-0766-41cf-b959-607e91be100f" │ "33c54240-b918-5b72-97db-390bd228cb7e" │
│     1 │ "cb086680-8fa3-11ed-80f7-1b51dc25a5a9" │ "12b80e15-71d6-46f9-93ea-1779b19c090b" │ "33c54240-b918-5b72-97db-390bd228cb7e" │
│     2 │ "cb086681-8fa3-11ed-80f7-1b51dc25a5a9" │ "3c014806-1c89-4b73-a71e-6b100ff303d6" │ "33c54240-b918-5b72-97db-390bd228cb7e" │
│     3 │ "cb086682-8fa3-11ed-80f7-1b51dc25a5a9" │ "7d776239-0c45-487e-abf4-596923e7a2c9" │ "33c54240-b918-5b72-97db-390bd228cb7e" │
└───────┴────────────────────────────────────────┴────────────────────────────────────────┴────────────────────────────────────────┘

Here are some notes about the output from the script:

  • v4 output changes every time one gets generated

  • v5 output remains constant across multiple invocations for the same inputs

  • v1 output changes in the first part of the UUID that doesn't necessarily hold for the rest of the UUID (try running the script multiple times and note what changes between runs but not between invocations within the same run)

ULIDs

ULID stands for Universally Unique Lexicographically Sortable Identifier. It is a string of characters that guarantees uniqueness, like a UUID, but it is also designed to be sortable, which makes it easier to use in certain contexts, such as when storing data in a database. ULIDs compose two parts: a timestamp and a random string. This structure ensures sortability and uniqueness.

TODO: Show example code to demonstrate properties

NanoIDs

NanoID is a small, secure, and URL-friendly unique string ID generator. It is similar to UUIDs and ULIDs in that it generates unique strings, but optimizes for use in URLs and other contexts where space is limited. NanoIDs are composed of random numbers and letters, and they are typically shorter than UUIDs or ULIDs.

TODO: Show example code to demonstrate properties

HashIDs

HashIDs is a small library that generates short, unique, non-sequential ids from numbers. It converts a number (e.g. 123) into a unique string (e.g. bVm), and can also convert the string back into the original number. HashIDs are a replacement for regular integers in URLs, to make them more obscure and less prone to tampering.

TODO: Show example code to demonstrate properties

SnowflakeID

Twitter (years ago, before Musk took over) designed and built SnowflakeID to generate globally unique IDs at high scale. It is a unique identifier format that is similar to UUIDs, ULIDs, NanoIDs, and HashIDs in that generates unique strings and overcomes the issues of auto-incrementing counters.

SnowflakeID is composed of a timestamp, a worker identifier, and a sequence number. SnowflakeIDs get generated by servers that assign unique IDs to clients in a distributed system. The timestamp component of a SnowflakeID ensures that the ID is unique, while the worker identifier and sequence number differentiate between multiple IDs generated within the same timestamp.

SnowflakeID has the following properties that make it suitable for generating unique IDs:

Globality

SnowflakeID generates IDs that are globally unique, as the combination of the timestamp, worker ID, and sequence number guarantees uniqueness across all worker machines.

Sorting

SnowflakeIDs are sortable by time, as the timestamp component is the most significant part of the ID. This makes them suitable for use as primary keys in databases, where the primary key needs sorting by time.

Scalability

SnowflakeID can generate IDs at high scale, as it can generate up to 4096 IDs per worker machine per millisecond. This makes it suitable for use in distributed systems where large numbers of IDs are generated concurrently.

TODO: Show example code to demonstrate properties

To wrap up …

Each of these schemes have its own set of tradeoffs, and the best choice will depend on the specific requirements of the application. UUIDs are widely used as globally unique identifiers, and come in multiple versions (UUIDv1, UUIDv4, and UUIDv5) although v4 is a reasonable default for general purpose random UID. ULIDs are lexicographically sortable and encode a timestamp in their structure, making them suitable for use as primary keys in databases for time sortable timelines. NanoIDs are short and easy to generate and customizable for your alphabet or length but you have determine suitability of the appropriate probability of collision for your needs ahead of time. HashIds get derived from hashes of input data and are collision-resistant, but are not globally unique.

In summary, there are different schemes that generate unique identifiers (UIDs) for objects, records, events, transactions or entities for modern highly concurrent and distributed systems that don't include the drawbacks of traditional auto-incrementing database counters many of us leveraged by default, sometimes without thinking.