Extract Time From Mongo ObjectID For Time Range Queries

2 min readDec 26, 2020

When storing massive amounts of data in data lake or data warehouse solutions with each data entry attached to an ObjectID, the timestamp data embedded in the ObjectID comes in handy for certain time range queries without unnecessary joins. Here are a few example use cases.

Summarize 12 month trailing purchase records by user creation/signup time given that user ID’s are ObjectID’s
Summarize 12 month trailing activities by user session given that session ID’s are ObjectID’s

In this article, we will look into parsing and using the timestamp part of the ObjectID in Hive and Presto queries. Before we go further, let’s first look into the anatomy of the 12-byte ObjectID and go into the queries.

Anatomy of the ObjectID

Let’s break down the Mongo ObjectID “5fee6600f75eabce97882944” as an illustration. Depending on the MongoDB version, the anatomy is different. We will look at both but the meaning of the first 4 bytes did not change.

Exclusively From MongoDB 3.2

5fee6600 | f75eabce97 | 882944
(1)        (2)          (3)

The first 4 bytes (1) is the creation time in seconds since the Unix epoch
The next 5 bytes (2) is the random value
The last 3 bytes (3) is the incrementing counter

Inclusively To MongoDB 3.2

5fee6600 | f75eab | ce97 | 882944
(1)        (2)      (3)    (4)

The first 4 bytes (1) and the last 3 bytes (4) are the same as the later versions
The second 3 bytes (2) is the machine identifier
The next 2 bytes (3) is the process id

Extracting the Creation Time of the ObjectID’s

The following operations in their respective platform will result in an epoch time.

Hive

CAST(
  CONV(
    SUBSTRING(your_object_id, 0, 8),
    16, 10
  ) 
AS BIGINT)

Presto

FROM_BASE(
  SUBSTR(your_object_id, 1, 8), -- index starts from 1
  16
)