DEBUG_README [plain text]

1 - Purpose of this document
============================

This document describes how to debug parts of the Postfix mail
system, either by making the software log a lot of detail to the
syslog daemon, or by running some daemon processes under control
of an interactive debugger.

2 - Verbose logging for specific SMTP connections
=================================================

In /etc/postfix/main.cf, list the remote site name or address in
the "debug_peer_list" parameter. For example, in order to make the
software log a lot of information to the syslog daemon for connections
from or to the loopback interface:

    debug_peer_list = 127.0.0.1

You can specify one or more hosts, domains, addresses or net/masks.

2b - Record the SMTP connection with a sniffer
==============================================

This example uses tcpdump. In order to record a conversation you
need to specify a large enough buffer or else you will miss some
or all of the packet payload.

    tcpdump -w /file/name -s 2000 host hostname and port 25

Run this for a while, stop with Ctrl-C when done. To view the data
use a binary viewer, or use my tcpdumpx utility that is available
from ftp://ftp.porcupine.org/pub/debugging.

3 - Making Postfix daemon programs more verbose
===============================================

Append one or more -v options to selected daemon definitions in
/etc/postfix/master.cf and type "postfix reload". This will cause
a lot of activity to be logged to the syslog daemon.

4 - Manually tracing a Postfix daemon process
=============================================

Some systems allow you to inspect a running process with a system
call tracer. For example:

    # trace -p process-id (SunOS 4)
    # strace -p process-id (Linux and many others)
    # truss -p process-id (Solaris, FreeBSD)
    # ktrace -p process-id (generic 4.4BSD)

Even more informative are traces of system library calls. Examples:

    # ltrace -p process-id (Linux, also ported to FreeBSD and BSD/OS)
    # sotruss -p process-id (Solaris)

See your system documentation for details.

Tracing a running process can give valuable information about what
a process is attempting to do. This is as much information as you
can get without running an interactive debugger program, as described
in a later section.

5 - Automatically tracing a Postfix daemon process
==================================================

Postfix can attach a call tracer whenever a daemon process starts.
Call tracers come in several kinds. 

1) System call tracers such as trace, truss, strace, or ktrace.
   These show the communication between the process and the kernel.

2) Library call tracers such as sotruss and ltrace. These show
   calls of library routines, and give a better idea of what is
   going on within the process.

Append a -D option to the suspect command in /etc/postfix/master.cf,
for example:

    smtp      inet  n       -       n       -       -       smtpd -D

Edit the debugger_command definition in /etc/postfix/main.cf so
that it invokes the call tracer of your choice, for example:

    debugger_command =
         PATH=/bin:/usr/bin:/usr/local/bin;
         (truss -p $process_id 2>&1 | logger -p mail.info) & sleep 5

Type "postfix reload" and watch the logfile.

6 - Running daemon programs under a debugger
============================================

Append a -D option to the suspect command in /etc/postfix/master.cf,
for example:

    smtp      inet  n       -       n       -       -       smtpd -D

Edit the debugger_command definition in /etc/postfix/main.cf so
that it invokes the debugger of your choice.

Two choices are described in detail:

1) If you do not have X Windows installed on the Postfix machine,
   or if you are not familiar with interactive debuggers, then you
   can try to run gdb in non-interactive mode:

   /etc/postfix/main.cf:
   --------------------
    debugger_command =
	PATH=/bin:/usr/bin:/usr/local/bin; export PATH; (echo cont;
	echo where) | gdb $daemon_directory/$process_name $process_id 2>&1
	>$config_directory/$process_name.$process_id.log & sleep 5

   Type "postfix reload" to make the configuration changes effective.

   Whenever a suspect daemon process is started, an output file
   is created, named after the daemon and process ID (for example,
   smtpd.12345.log). When the process crashes, a stack trace (with
   output from the "where" command) is written to its logfile.

2) If you have X Windows installed on the Postfix machine, then
   an interactive debugger such as xxgdb can be convenient.

   /etc/postfix/main.cf:
   --------------------
    debugger_command =
         PATH=/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin
         xxgdb $daemon_directory/$process_name $process_id & sleep 5

   Be sure that gdb is in the command search path, and export
   XAUTHORITY so that X access control works, for example:

    % setenv XAUTHORITY ~/.Xauthority

   Stop and start the Postfix system.  This is necessary so that
   Postfix runs with the proper XAUTHORITY and DISPLAY settings.

   Whenever the suspect daemon process is started, a debugger window
   pops up and you can watch in detail what happens (when using
   xxgdb) or a file is created (if using gdb in non-interactive
   mode).

7 - Unreasonable behavior
=========================

Sometimes the behavior exhibit by Postfix just does not match the
source code. Why can a program deviate from the instructions given
by its author? There are two possibilities.

1 - The compiler has messed up.

2 - The hardware has messed up.

In both cases, the program being executed is not the program that
was supposed to be executed, so anything can happen.

There is a third possibility:

3 - Bugs in system software (kernel or libraries).

Hardware-related failures happen erratically, and they usually do
not reproduce after power cycling and rebooting the system.  There's
little I can do about bad hardware.  Be sure to use hardware that
at the very least can detect memory errors. Otherwise, Postfix will
just be a sitting duck waiting to be hit by a bit error. Critical
systems deserve real hardware.

When a compiler messes up, the problem can be reproduced whenever
the resulting program is run. Compiler errors are most likely to
happen in the code optimizer. If a problem is reproducible across
power cycles and system reboots, it can be worthwhile to rebuild
Postfix with optimization disabled, and to see if optimization
makes a difference.

In order to compile Postfix with optimizations turned off:

    % make tidy
    % make makefiles OPT=

This produces a set of Makefiles that do not request compiler
optimization. 

Once the makefiles are set up, build the software:

    % make
    % su
    # make install

And see if the problem reproduces. If the problem goes away, talk
to your vendor.