001 - Conjunction Event ID

Purpose

This design document will propose a design for Conjunction Event Identifiers.

Context and scope

One of the key responsibilities identified for Monitor Space Hazards is to take a feed of Conjunction Data Messages (CDMs). It will filter down to CDMs that relate to satellites that are the responsibility of the UK licensed Satellite Operators and therefore require the co-ordination of communication about Conjunction Events and associated Orbital Analysis. Therefore Monitor space hazards will group together CDMs and raise a Conjunction Event ID based on the following formula:

Time of Closest Approach (TCA) +/- 1 sec with matching primary and secondary objects

The Conjunction Event ID will be inserted by Monitor Space Hazards into the COMMENT field on the related CDMs and then make this available to the Orbital Analysts. After the Orbital Analysis has taken place off-system it will be posted back to Monitor space hazards with the associated Conjunction Event ID for data matching.

Goals and non-goals

The goal of this design doc is to articulate a design for the structure of the Conjunction Event ID, not the data flow.

Goals

Have a unique conjunction event identifier that can be independently generated from multiple systems (Monitor Space Hazards, Aurora etc) without collision
Have the ID time sortable (so conjunction events naturally sort by most recent)
IDs should fit into 64-bits for smaller indexes and better storage
Generation of the IDs should be easily implemented without involving additional infrastructure or components
A human readable form of ID or unique label is highly desired for interpersonal communication

The actual design

Original version (before April 2023)

Note: though many of the principles within this section still apply, the exact bit layout of the event IDs has changed (see new design)

The proposal is to adopt an approach like that used by Instagram, which is in turn based on the Snowflake ID used at Twitter. Indeed many big tech companies are using a similar approach since it works well for distributed systems.

The proposal is for each ID to consist of:

41 bits for time in milliseconds (which gives over 41 years of IDs with a custom epoch (to be agreed))
10 bits that represent the generating system, or machine ID, (that is a lot of generating systems, (2^10) 1024 to be precise).
13 bits that represent an auto-incrementing sequence, modulus 8192 (this means 8192 IDs can be generated per generating system per millisecond, which means lots of scaling headroom).

For human readable identifiers the generation of short unique ids from the integer generated is recommended. Something like sqids look like good candidate routines, and even has something to avoid generating the most common English swear words.

An example will help explain this, so here goes:

The first 41 bits as calculated from 12 January 2022 12:00:00.000 (UTC) with an epoch of 01 October 2021 00:00:00.000 (UTC) = 1641988800000 - 1633046400000 = 8942400000 (decimal) =1000010101000000100011001000000000 (binary)

The next 10 bits are fixed based on the generating system, e.g. Monitor Space Hazards= 1, Aurora = 2 and so on until 1024, so 0000000001 (binary)

The final 13 bits are an auto-incrementing sequence, and for the sake of argument let’s say this is the 2345 event in that millisecond, i.e. 00100100101001 (binary)

This gives a binary number of 1000010101000000100011001000000000 000000000100100100101001 or 150028576358418729 in decimal.

So far so good for a machine calculation, but how does a human read and disclose this?

Well the sqid for this number is 5p6G1oyZd7ZE as calculated by this demo code with salt “Msh” and encoding “150028576358418729”. We can vary the alphabet used, so it is only one case (upper or lower) and it returns with a hyphen every n-many characters along. Let’s research with users which works best.

Note: There is a requirement for time to synchronise across machines and systems generating event IDs, the NTP and associated services will solve this problem for us.

The new design (after April 2023)

We discovered an edge case where collisions with multiple objects which are close together could end up with the same Event short ID for different events.

Requirements:

The same or very close (<60 seconds) TCA_TIME
The same primary object name
The same secondary object name

The cause is that we desperately wanted to shorten Event URLs and we spent too many bits on TCA time but not enough on object names. We also used object names, not the NORAD_IDs to distinguish different objects.

To solve this we increased the number of bits used in the Event Short IDs to 77 bits and removed the generating system (as Monitor Space Hazards is the only system generating such short IDs). This left us with the following design:

33 bits for time in seconds (giving us 272 years)
22 bits for primary object NORAD ID
22 bits for secondary object NORAD ID

This produces short IDs of the style jbxleqg-ndrxvgb-gorpwg (7-7-6) using lowercase a-z.

APIs

No public API for Conjunction Event IDs is to be exposed.

Data Storage

The Monitor Space Hazards database uses a varchar to store the Conjunction Event ID.

Code and pseudocode

This is untested but based on the Instagram implementation it is envisaged that something like the following PL/PGSQL will work:

CREATE OR REPLACE FUNCTION mys.next_id(OUT result bigint) AS $$
DECLARE
    our_epoch bigint := 1633046400000; #replaced with our epoch
    seq_id bigint;
    now_millis bigint;
    system_id int := 1;
BEGIN
    SELECT nextval('mys.table_id_seq') %% 8192 INTO seq_id;
    SELECT FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000) INTO now_millis;
    result := (now_millis - our_epoch) << 23;
    result := result | (system_id <<10);
    result := result | (seq_id);
END;
    $$ LANGUAGE PLPGSQL;

And when creating the table:

CREATE TABLE mys.our_table (
    "id" bigint NOT NULL DEFAULT mys.next_id(),
    ...rest of table schema...
  )

Degrees of constraint

The main constraint on this design is the ability to implement in multiple systems and co-ordinate the system ids. Given the ease that this is implemented in PL/PGSQL it is considered to be readily implemented in other databases and languages once the logic of generation is understood.

Alternatives considered

Time in milliseconds + primary object id + secondary object id - this is a valid approach and readily implemented. Gives a far higher chance of collision and doesn’t provide detail of which system generated the identifier.
UUIDs provide a good answer if all that was desired was uniqueness. An example UUID is “23042946-af93-4812-80e7-02a765082c95” (generated using this tool). Although it does a form of hashing to make it human readable, it is long at 128-bits, and doesn’t provide any natural sort order based on time. Which is an advantage for queries such as “give me the last 10 conjunction events”.

Cross-cutting concerns

The design will conform with the cross-cutting concerns, such as observability, privacy and security, decided and documented elsewhere for Monitor Space Hazards.

This page was last reviewed on 22 January 2024. It needs to be reviewed again on 22 January 2025 .

This page was set to be reviewed before 22 January 2025. This might mean the content is out of date.