uuid   [plain text]

UUID scheme

The replication system uses cluster wide Globally Unique Message Unique
Identifies (UUID)s to replicate messages shared by a single instance store,
to improve the efficiency of the replication engine when moving messages
between mailboxes and in order to resolve conflicts when the master and
replica end disagree about the collection of message UIDs in a given
folder. See ./replication_protocol for details of the replication system.

The UUID system shouldn't be _required_ for replication, but it does make
the replication system work quite a lot more effectively and reliably.

UUIDs are 96 bit (12 byte) numbers, represented on the wire as 24 hex digits
(I plan to switch to a 16 digit BASE64 wire representation for efficiency).

UUIDs are stored in network byte ordering (big endian), and the first
byte in the 12 byte array defines the structure of the remaining 11 bytes.
At the moment only two UUID schemas are defined.

UUID schema 0  (first byte == 0)

The only defined value in UUID 0 is the NULL value, where all 12 bytes are
zero. (any other value with a leading zero byte is illegal). This is the
NULL UUID, used when UUID values are unavailable.  If a message uses the
NULL UUID as its value, various optimisatations and sanity checks are
disabled. The same effect is seen if UUIDs are disabled entirely.

UUID schema 1 (first byte == 1)

The big idea here is that the Cyrus master process on each system is the
natural place to allocate blocks of UUIDs to service process.

Service which expect/need UUIDs use an extra option "provide_uuid=1" in the
cyrus.conf file: see sample cyrus.conf for an example.

If the first byte of a UUID is binary value 1, the remaining bytes define a
unique message body/text. The remaining 11 bytes are allocated in blocks of
2^24 UUIDs by the Cyrus master process with the following structure:

struct uuid_info {
    unsigned short schema;                /*  8 bits used */  /* 1 */
    unsigned short machine;               /*  8 bits used */
    unsigned short timestamp_generation;  /*  8 bits used */
    unsigned long  master_start_time;     /* 32 bits used */
    unsigned short child_counter;         /* 16 bits used */
    unsigned long  count;                 /* 24 bits used */

Service applications have no interest in the first 9 bytes in the 12 byte
sequence: they just treat them as an opaque prefix to a 24 bit counter,
which can be converted into a 12 byte sequence or 24 digit hex number.

Schema 1 limits us to 256 machines in a cluster and uses the startup time
of the master process (a 32 bit timestamp) to create a starting value for a
counter. To protect us from system clocks running backwards, the number
corresponding to the master start time is recorded in the file
"/var/imap/master_uuid" and increments slowly as the child_counter
field overflows (65536 child processes). The master process will refuse to
start if the current time is less than the timestamp recorded in the
master_uuid file ("timestamp_generation" can be used as an emergency escape
mechanism if the system clock consistently jumps backwards):

In normal operation it is very unlikely that we will consistently allocate
start 65536 child processes every second, so a buffer of available UUID
ranges accumulates quickly after the master process starts. After a few
seconds of operation we will normally be protected against clock slews.

Sample /var/imap/master_uuid structure:
     schema=1                      # Constant
     machine=1                     # Should be unique within cluster at
                                   # given time to prevent conflicts.
     timestamp_generation=0        # Emergency escape mechanism
     master_start_time=1062919294  # 32 bit timestamp should be good til 2070
                                   # (and we can always overflow into
                                     timestamp_generation or schema then!)