001 - Conjunction Event ID
Purpose
This design document will propose a design for Conjunction Event Identifiers.
Context and scope
One of the key responsibilities identified for Monitor Space Hazards is to take a feed of Conjunction Data Messages (CDMs). It will filter down to CDMs that relate to satellites that are the responsibility of the UK licensed Satellite Operators and therefore require the co-ordination of communication about Conjunction Events and associated Orbital Analysis. Therefore Monitor space hazards will group together CDMs and raise a Conjunction Event ID based on the following formula:
Time of Closest Approach (TCA) +/- 1 sec with matching primary and secondary objects
The Conjunction Event ID will be inserted by Monitor Space Hazards into the COMMENT field on the related CDMs and then make this available to the Orbital Analysts. After the Orbital Analysis has taken place off-system it will be posted back to Monitor space hazards with the associated Conjunction Event ID for data matching.
Goals and non-goals
The goal of this design doc is to articulate a design for the structure of the Conjunction Event ID, not the data flow.
Goals
- Have a unique conjunction event identifier that can be independently generated from multiple systems (Monitor Space Hazards, Aurora etc) without collision
- Have the ID time sortable (so conjunction events naturally sort by most recent)
- IDs should fit into 64-bits for smaller indexes and better storage
- Generation of the IDs should be easily implemented without involving additional infrastructure or components
- A human readable form of ID or unique label is highly desired for interpersonal communication
The actual design
Original version (before April 2023)
Note: though many of the principles within this section still apply, the exact bit layout of the event IDs has changed (see new design)
The proposal is to adopt an approach like that used by Instagram, which is in turn based on the Snowflake ID used at Twitter. Indeed many big tech companies are using a similar approach since it works well for distributed systems.
The proposal is for each ID to consist of:
- 41 bits for time in milliseconds (which gives over 41 years of IDs with a custom epoch (to be agreed))
- 10 bits that represent the generating system, or machine ID, (that is a lot of generating systems, (2^10) 1024 to be precise).
- 13 bits that represent an auto-incrementing sequence, modulus 8192 (this means 8192 IDs can be generated per generating system per millisecond, which means lots of scaling headroom).
For human readable identifiers the generation of short unique ids from the integer generated is recommended. Something like sqids look like good candidate routines, and even has something to avoid generating the most common English swear words.
An example will help explain this, so here goes:
The first 41 bits as calculated from 12 January 2022 12:00:00.000 (UTC) with an epoch of 01 October 2021 00:00:00.000 (UTC) = 1641988800000 - 1633046400000 = 8942400000 (decimal) =1000010101000000100011001000000000 (binary)
The next 10 bits are fixed based on the generating system, e.g. Monitor Space Hazards= 1, Aurora = 2 and so on until 1024, so 0000000001 (binary)
The final 13 bits are an auto-incrementing sequence, and for the sake of argument let’s say this is the 2345 event in that millisecond, i.e. 00100100101001 (binary)
This gives a binary number of 1000010101000000100011001000000000 000000000100100100101001 or 150028576358418729 in decimal.
So far so good for a machine calculation, but how does a human read and disclose this?
Well the sqid for this number is 5p6G1oyZd7ZE as calculated by this demo code with salt “Msh” and encoding “150028576358418729”. We can vary the alphabet used, so it is only one case (upper or lower) and it returns with a hyphen every n-many characters along. Let’s research with users which works best.
Note: There is a requirement for time to synchronise across machines and systems generating event IDs, the NTP and associated services will solve this problem for us.
The new design (after April 2023)
We discovered an edge case where collisions with multiple objects which are close together could end up with the same Event short ID for different events.
Requirements:
- The same or very close (<60 seconds) TCA_TIME
- The same primary object name
- The same secondary object name
The cause is that we desperately wanted to shorten Event URLs and we spent too many bits on TCA time but not enough on object names. We also used object names, not the NORAD_IDs to distinguish different objects.
To solve this we increased the number of bits used in the Event Short IDs to 77 bits and removed the generating system (as Monitor Space Hazards is the only system generating such short IDs). This left us with the following design:
- 33 bits for time in seconds (giving us 272 years)
- 22 bits for primary object NORAD ID
- 22 bits for secondary object NORAD ID
This produces short IDs of the style jbxleqg-ndrxvgb-gorpwg
(7-7-6) using lowercase a-z.
APIs
No public API for Conjunction Event IDs is to be exposed.
Data Storage
The Monitor Space Hazards database uses a varchar to store the Conjunction Event ID.
Code and pseudocode
This is untested but based on the Instagram implementation it is envisaged that something like the following PL/PGSQL will work:
CREATE OR REPLACE FUNCTION mys.next_id(OUT result bigint) AS $$
DECLARE
our_epoch bigint := 1633046400000; #replaced with our epoch
seq_id bigint;
now_millis bigint;
system_id int := 1;
BEGIN
SELECT nextval('mys.table_id_seq') %% 8192 INTO seq_id;
SELECT FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000) INTO now_millis;
result := (now_millis - our_epoch) << 23;
result := result | (system_id <<10);
result := result | (seq_id);
END;
$$ LANGUAGE PLPGSQL;
And when creating the table:
CREATE TABLE mys.our_table (
"id" bigint NOT NULL DEFAULT mys.next_id(),
...rest of table schema...
)
Degrees of constraint
The main constraint on this design is the ability to implement in multiple systems and co-ordinate the system ids. Given the ease that this is implemented in PL/PGSQL it is considered to be readily implemented in other databases and languages once the logic of generation is understood.
Alternatives considered
- Time in milliseconds + primary object id + secondary object id - this is a valid approach and readily implemented. Gives a far higher chance of collision and doesn’t provide detail of which system generated the identifier.
- UUIDs provide a good answer if all that was desired was uniqueness. An example UUID is “23042946-af93-4812-80e7-02a765082c95” (generated using this tool). Although it does a form of hashing to make it human readable, it is long at 128-bits, and doesn’t provide any natural sort order based on time. Which is an advantage for queries such as “give me the last 10 conjunction events”.
Cross-cutting concerns
The design will conform with the cross-cutting concerns, such as observability, privacy and security, decided and documented elsewhere for Monitor Space Hazards.