[SD] Summary: Sequencer

TL; DR

Summarize how to generate a unique ID.

QA

What are the requirements for unique identifiers?

  • Uniqueness: We have to ensure each ID we generate is unique.
  • Scalability: The ID generation system should generate at least a billion unique IDs daily.
  • Availability: We should generate unique IDs to support nanosecond-level events.
  • 64-bit numeric ID: We restrict the length to 64 bits because this bit size is enough for many years in the future.

Why don’t we use UUID?

There are two main reasons:

  1. It’s long for its 128-bit length. Using a 128-bit primary key will slow down the index system.
  2. We can’t claim the UUID to be deterministically unique, although the chance is minimal.

Why is the range handler a good choice?

The range handler is composed of two parts:

  1. Server Layer: The server in it can quickly generate unique IDs by using a range in Memory.
  2. Storage Layer: To persist the allocated range in a replicated way.

For example, We can persist the maximum allocated ID in the Storage Layer. And if a server restarts, it can get the maximum allocated ID from the storage layer, like 1000, and make a new range in memory, like [1001, 2000], and store the max ID in the Storage Layer.

After that, this server can assign unique IDs by using the range. Like request one gets 1001, request two gets 1002, etc. When the server runs out of the range, it can assign a new range by doing the above.

As a result, there are many pros of it:

  1. The IDs are unique.
  2. It’s really fast.
  3. It can avoid the single-point failure.

What is the Twitter snowflake?

Basically, it is a 64-bit numeric ID generation strategy. Each ID generated by the snowflake can be divided into four parts:

  • Sign(1 bit): Always be zero.
  • Timestamp in millisecond(41 bits)
  • Worker number(10 bits)
  • Sequence Number(12 bits): For every ID generated on the server, the sequence number is incremented by one. We’ll reset it to zero when it reaches 4,096.

The drawbacks are:

  1. It will waste so many IDs with time going by.
  2. The time is not reliable. So, the IDs may be repeated.