Primary Keys

What kind of Primary Key should you be using in your application?

Why Obfuscate?

Using sequential ids can leak information about your applications.

Estimated counts - If all of your entities are auto incrementing a competitor could estimate how many users or how much content you have. This isn’t a huge deal and could be negated by setting an arbitrary initial auto increment value.
Parameter Injection - If all of your urls are formatted /user/{userId} anyone can easily just change the id in the url. Ideally you have authorization in place to prevent sensitive data from being accessed. This is just an extra precaution.
Automated Scraping - Using parameter injection it would be easy to write a script from 0 to some max id and scrape all data from your website without obfuscated ids. Obfuscated ids won’t prevent this if all ids are public somehow but will make them work a little harder.

k-sortable ids

k-sortable ids are ids that can be roughly sorted so they order themselves by the time they were created. This could be a potential problem, since attacked may then be able to guess the next ID. See https://www.intruder.io/research/in-guid-we-trust

Note: All monotonically increasing (auto-increment, k-sortable), and timestamp-based ids share the security issues with Cuid. V4 UUIDs and GUIDs are also insecure because it’s possible to predict future values of many random algorithms, and many of them are biased, leading to increased probability of collision. Likewise, UUID V6-V8 are also insecure because they leak information which could be used to exploit systems or violate user privacy. Here are some example exploits:

https://github.com/paralleldrive/cuid?tab=readme-ov-file

ID Types

Auto Incrementing Sequential Integer

Easy and cheap

May give away information if it’s presented in the frontend application.

See German Tank Problem.

UUID/GUID - Universally Unique IDentifier

UUID

ex: [9EE44A7C-0485-4AF4-8F72-2CFE80043A4A]

A UUID is 128 bits long, and can guarantee uniqueness across space and time.

This specification defines a Uniform Resource Name namespace for UUIDs (Universally Unique IDentifier), also known as GUIDs (Globally Unique IDentifier).

What’s the difference between GUIDs and UUIDs?

GUID is Microsoft’s implementation of the UUID standard.

Cons

UUID can be suboptimal for many uses-cases because:

doesn’t cluster well, produces terrible locality, and no insight as to when it was generated
It isn’t the most character efficient way of encoding 128 bits of randomness
UUID v1/v2 is impractical in many environments, as it requires access to a unique, stable MAC address
UUID v3/v5 requires a unique seed and produces randomly distributed IDs, which can cause fragmentation in many data structures
UUID v4 provides no other information than randomness which can cause fragmentation in many data structures
Sections are hard to parse for a human and are UUID version dependent. If we can’t directly interpret each section, the dashes on the string representation add no value
If we only care about an unique variant/version, we wouldn’t need to encode the rest
Having to support more variants and versions, the parsing and generation is slower than it could be
Version 4, being fully random, produces fragmentation in many data structures, and aren’t sortable in a meaningful way
Version 4 are slow to generate, requiring 122 bits of good quality entropy
It’s just not the most efficient way to encode 128 bits into a string. For example, base64 without padding would require 22 characters, not 36

CUID/CUID2

CUID2

Secure, collision-resistant ids optimized for horizontal scaling and performance. Next generation UUIDs.

Cuid2 is not good for: Sequential ids (see the note on K-sortable ids, below) High performance tight loops, such as render loops (if you don’t need cross-host unique ids or security, consider a simple counter for this use-case, or try Ulid or NanoId).

import { createId } from '@paralleldrive/cuid2';

const ids = [
  createId(), // 'tz4a98xxat96iws9zmbrgj3a'
  createId(), // 'pfh0haxfpzowht3oi213cqos'
  createId(), // 'nc6bzmkmd014706rfda898to'
];

https://github.com/paralleldrive/cuid2

CUID (Deprecated)

ULID - Universally Unique Lexicographically Sortable Identifier`

`[01ARZ3NDEKTSV4RRFFQ69G5FAV]`

Features

128-bit compatibility with UUID
1.21e+24 unique ULIDs per millisecond
Lexicographically sortable!
Canonically encoded as a 26 character string, as opposed to the 36 character UUID
Uses Crockford’s base32 for better efficiency and readability (5 bits per character)
Case insensitive
No special characters (URL safe)
Monotonic sort order (correctly detects and handles the same millisecond)

Components

timestamp
randomness

Resources

Hashids

Hashids is a small open-source library that generates short, unique, non-sequential ids from numbers.

It converts numbers like 347 into strings like “yr8”, or array of numbers like [27, 986] into “3kTMd”.

You can also decode those ids back. This is useful in bundling several parameters into one or simply using them as short UIDs.

https://hashids.org/
https://www.stubbornjava.com/posts/obfuscating-and-shortening-sequential-ids-with-hashids
http://carnage.github.io/2015/08/cryptanalysis-of-hashids - very easy to reverse the salt used to generate the shuffled alphabet

Nanoid

Optimus

https://github.com/pjebs/optimus-go

Mongo ObjectID

`[507f1f77bcf86cd799439011]`

Format:

a 4-byte value representing the seconds since the Unix epoch,
a 5-byte random value, and
a 3-byte counter, starting with a random value.

Resources

https://docs.mongodb.com/manual/reference/method/ObjectId/

xid

###[9m4e2mr0ui3e8a215n4g]

Xid is using Mongo Object ID algorithm to generate globally unique ids with a different serialization (base64) to make it shorter when transported as a string

4-byte value representing the seconds since the Unix epoch,
3-byte machine identifier,
2-byte process id, and
3-byte counter, starting with a random value.

Resources

https://github.com/rs/xid

KSUID - K-Sortable Unique IDentifier

`[0ujsszwN8NRY24YaXiTIE2VWDTS]`

KSUID is for K-Sortable Unique IDentifier. It’s a way to generate globally unique IDs similar to RFC 4122 UUIDs, but contain a time component so they can be "roughly" sorted by time of creation. The remainder of the KSUID is randomly generated bytes.

Sortable by Timestamp
No Coordination Required
Lexographically Sortable, Portable Representations

KSUIDs are 20-bytes: a 32-bit unsigned integer UTC timestamp and a 128-bit randomly generated payload. The timestamp uses big-endian encoding, to allow lexicographic sorting. The timestamp epoch is adjusted to March 5th, 2014, providing over 100 years of useful life starting at UNIX epoch + 14e8. The payload uses a cryptographically-strong pseudorandom number generator.

The string representation is fixed at 27-characters encoded using a base62 encoding that also sorts lexicographically.

Resources

https://github.com/segmentio/ksuid

Twitter Snowflake

To generate the roughly-sorted 64 bit ids in an uncoordinated manner, we settled on a composition of: timestamp, worker number and sequence number.

Sequence numbers are per-thread and worker numbers are chosen at startup via zookeeper (though that’s overridable via a config file).

needs machin/DC configuration, needs central server, sortable
https://github.com/twitter-archive/snowflake/tree/snowflake-2010
https://blog.twitter.com/engineering/en_us/a/2010/announcing-snowflake.html

Firebase PushIDs

`[-JhLeOlGIEjaIOFHR0xd]`

Push IDs are string identifiers that are generated client-side. They are a combination of a timestamp and some random bits. The timestamp ensures they are ordered chronologically, and the random bits ensure that each ID is unique, even if thousands of people are creating push IDs at the same time.

A push ID contains 120 bits of information. The first 48 bits are a timestamp, which both reduces the chance of collision and allows consecutively created push IDs to sort chronologically. The timestamp is followed by 72 bits of randomness, which ensures that even two people creating push IDs at the exact same millisecond are extremely unlikely to generate identical IDs. One caveat to the randomness is that in order to preserve chronological ordering if a client creates multiple push IDs in the same millisecond, we just ‘increment’ the random bits by one.

To turn our 120 bits of information (timestamp + randomness) into an ID that can be used as a Firebase key, we basically base64 encode it into ASCII characters, but we use a modified base64 alphabet that ensures the IDs will still sort correctly when ordered lexicographically (since Firebase keys are ordered lexicographically).

There’s one caveat to the chronological ordering of push IDs. Since the timestamp is generated client-side, it’s at the mercy of the client’s local clock, which could be incorrect. We make an attempt to compensate for these ‘skewed’ clocks to some degree. When a Firebase client establishes a connection, the Firebase server sends an accurate timestamp to the client so it can use that to correct its incorrect clock when generating push IDs.

Resources

Instagram

Instagram:

We’ve delegated ID creation to each table inside each shard, by using PL/PGSQL, Postgres’ internal programming language, and Postgres’ existing auto-increment functionality.

Each of our IDs consists of:

41 bits for time in milliseconds (gives us 41 years of IDs with a custom epoch)

13 bits that represent the logical shard ID

10 bits that represent an auto-incrementing sequence, modulus 1024. This means we can generate 1024 >IDs, per shard, per millisecond

Brian Wigginton

Why Obfuscate?

k-sortable ids

ID Types

Auto Incrementing Sequential Integer

UUID/GUID - Universally Unique IDentifier

UUID

What’s the difference between GUIDs and UUIDs?

Cons

CUID/CUID2

CUID2

CUID (Deprecated)

ULID - Universally Unique Lexicographically Sortable Identifier`

`[01ARZ3NDEKTSV4RRFFQ69G5FAV]`

Features

Components

Resources

Hashids

Nanoid

Optimus

Mongo ObjectID

`[507f1f77bcf86cd799439011]`

Format:

Resources

xid

Resources

KSUID - K-Sortable Unique IDentifier

`[0ujsszwN8NRY24YaXiTIE2VWDTS]`

Resources

Twitter Snowflake

Firebase PushIDs

`[-JhLeOlGIEjaIOFHR0xd]`

Resources

Instagram

Resources

Related Notes