SCHEDULER_README.html [plain text]

<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN"
        "http://www.w3.org/TR/html4/loose.dtd">

<html>

<head>

<title>Postfix Queue Scheduler</title>

<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">

</head>

<body>

<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix Queue Scheduler</h1>

<hr>

<h2>What this file is about</h2>

<p> This is the beginning of documentation for the clever queue
manager scheduling algorithm by Patrik Rak. For a long time, this
code was made available under the name "nqmgr(8)" (new queue manager),
as an optional module. As of Postfix 2.1 this is the default queue
manager, which is always called "qmgr(8)". The old queue manager will
for some time will be available under the name of "oqmgr(8)". </p>

<h2>Why the old Postfix queue manager was replaced</h2>

<p> The old Postfix scheduler had several limitations due to
unfortunate choices in its design. </p>

<ol>

    <li> <p> Round-robin selection by destination for mail that is
    delivered via the same message delivery transport. The round-robin
    strategy was chosen with the intention to prevent a single
    (destination) site from using up too many mail delivery resources.
    However, that strategy penalized inbound mail on bi-directional
    gateways.  The poor suffering inbound destination would be
    selected only 1/number-of-destinations of the time, even when
    it had more mail than other destinations, and thus mail could
    be delayed. </p>

    <p> Victor Duchovni found a workaround: use different message
    delivery transports, and thus avoid the starvation problem.
    The Patrik Rak scheduler solves this problem by using FIFO
    selection. </p>

    <li> <p> A second limitation of the old Postfix scheduler was
    that delivery of bulk mail would block all other deliveries,
    causing large delays.  Patrik Rak's scheduler allows mail with
    fewer recipients to slip past bulk mail in an elegant manner.
    </p>

</ol>

<h2>How the queue manager scheduler works </h2>

<p> The following text is from Patrik Rak and should be read together
with the postconf(5) manual that describes each configuration
parameter in detail. </p>

<p> From user's point of view, oqmgr(8) and qmgr(8) are both the same,
except for how next message is chosen when delivery agent becomes
available.  You already know that oqmgr(8) uses round-robin by destination
while qmgr(8) uses simple FIFO, except for some preemptive magic.
The postconf(5) manual documents all the knobs the user
can use to control this preemptive magic - there is nothing else
to the preemption than the quite simple conditions described below.
</p>

<p> As for programmer-level documentation, this will have to be
extracted from all those emails we have exchanged with Wietse [rats!
I hoped that Patrik would do the work for me -- Wietse] But I think
there are no missing bits which we have not mentioned in our
conversations. </p>

<p> However, even from programmer's point of view, there is nothing
more to add to the message scheduling idea itself.  There are few
things which make it look more complicated than it is, but the
algorithm is the same as the user perceives it. The summary of the
differences of the programmer's view from the user's view are: </p>

<ol>

    <li> <p> Simplification of terms for users: The user knows
    about messages and recipients. The program itself works with
    jobs (one message is split among several jobs, one per each
    transport needed to deliver the message) and queue entries
    (each entry may group several recipients for same destination).
    Then there is the peer structure introduced by qmgr(8) which is
    simply per-job analog of the queue structure. </p>

    <li> <p> Dealing with concurrency limits: The actual implementation
    is complicated by the fact that the messages (resp. jobs) may
    not be delivered in the exactly scheduled order because of the
    concurrency limits. It is necessary to skip some "blocker" jobs
    when the concurrency limit is reached and get back to them
    again when the limit permits. </p>

    <li> <p> Dealing with resource limits: The actual implementation is
    complicated by the fact that not all recipients may be read in-core.
    Therefore each message has some recipients in-core and some may
    remain on-file. This means that a) the preemptive algorithm needs
    to work with recipient count estimates instead of exact counts, b)
    there is extra code which needs to manipulate the per-transport
    pool of recipients which may be read in-core at the same time, and
    c) there is extra code which needs to be able to read recipients
    into core in batches and which is triggered at appropriate moments. </p>

    <li> <p> Doing things efficiently: All important things I am
    aware of are done in the minimum time possible (either directly
    or at least when amortized complexity is used), but to choose
    which job is the best candidate for preempting the current job
    requires linear search of up to all transport jobs (the worst
    theoretical case - the reality is much better). As this is done
    every time the next queue entry to be delivered is about to be
    chosen, it seemed reasonable to add cache which minimizes the
    overhead. Maintenance of this candidate cache slightly obfuscates
    things.

</ol>

<p> The points 2 and 3 are those which made the implementation
(look) complicated and were the real coding work, but I believe
that to understand the scheduling algorithm itself (which was the
real thinking work) is fairly easy. </p>

</body>

</html>