Extract Time From Mongo ObjectID For Time Range Queries

Injae Lee
2 min readDec 26, 2020

When storing massive amounts of data in data lake or data warehouse solutions with each data entry attached to an ObjectID, the timestamp data embedded in the ObjectID comes in handy for certain time range queries without unnecessary joins. Here are a few example use cases.

  • Summarize 12 month trailing purchase records by user creation/signup time given that user ID’s are ObjectID’s
  • Summarize 12 month trailing activities by user session given that session ID’s are ObjectID’s

In this article, we will look into parsing and using the timestamp part of the ObjectID in Hive and Presto queries. Before we go further, let’s first look into the anatomy of the 12-byte ObjectID and go into the queries.

Anatomy of the ObjectID

Let’s break down the Mongo ObjectID “5fee6600f75eabce97882944” as an illustration. Depending on the MongoDB version, the anatomy is different. We will look at both but the meaning of the first 4 bytes did not change.

Exclusively From MongoDB 3.2

5fee6600 | f75eabce97 | 882944
(1) (2) (3)
  1. The first 4 bytes (1) is the creation time in seconds since the Unix epoch
  2. The next 5 bytes (2) is the random value
  3. The last 3 bytes (3) is the incrementing counter

Inclusively To MongoDB 3.2

5fee6600 | f75eab | ce97 | 882944
(1) (2) (3) (4)
  1. The first 4 bytes (1) and the last 3 bytes (4) are the same as the later versions
  2. The second 3 bytes (2) is the machine identifier
  3. The next 2 bytes (3) is the process id

Extracting the Creation Time of the ObjectID’s

The following operations in their respective platform will result in an epoch time.

Hive

CAST(
CONV(
SUBSTRING(your_object_id, 0, 8),
16, 10
)
AS BIGINT)

Presto

FROM_BASE(
SUBSTR(your_object_id, 1, 8), -- index starts from 1
16
)

References

--

--

Injae Lee

"Am I a good writer? No. But just trying to be helpful since the 80s" - software development, engineering management, trading and now DeFi