1 - Purpose of this document ============================ This document describes how to debug parts of the Postfix mail system, either by making the software log a lot of detail to the syslog daemon, or by running some daemon processes under control of an interactive debugger. 2 - Verbose logging for specific SMTP connections ================================================= In /etc/postfix/main.cf, list the remote site name or address in the "debug_peer_list" parameter. For example, in order to make the software log a lot of information to the syslog daemon for connections from or to the loopback interface: debug_peer_list = 127.0.0.1 You can specify one or more hosts, domains, addresses or net/masks. 2b - Record the SMTP connection with a sniffer ============================================== This example uses tcpdump. In order to record a conversation you need to specify a large enough buffer or else you will miss some or all of the packet payload. tcpdump -w /file/name -s 2000 host hostname and port 25 Run this for a while, stop with Ctrl-C when done. To view the data use a binary viewer, or use my tcpdumpx utility that is available from ftp://ftp.porcupine.org/pub/debugging. 3 - Making Postfix daemon programs more verbose =============================================== Append one or more -v options to selected daemon definitions in /etc/postfix/master.cf and type "postfix reload". This will cause a lot of activity to be logged to the syslog daemon. 4 - Manually tracing a Postfix daemon process ============================================= Some systems allow you to inspect a running process with a system call tracer. For example: # trace -p process-id (SunOS 4) # strace -p process-id (Linux and many others) # truss -p process-id (Solaris, FreeBSD) # ktrace -p process-id (generic 4.4BSD) Even more informative are traces of system library calls. Examples: # ltrace -p process-id (Linux, also ported to FreeBSD and BSD/OS) # sotruss -p process-id (Solaris) See your system documentation for details. Tracing a running process can give valuable information about what a process is attempting to do. This is as much information as you can get without running an interactive debugger program, as described in a later section. 5 - Automatically tracing a Postfix daemon process ================================================== Postfix can attach a call tracer whenever a daemon process starts. Call tracers come in several kinds. 1) System call tracers such as trace, truss, strace, or ktrace. These show the communication between the process and the kernel. 2) Library call tracers such as sotruss and ltrace. These show calls of library routines, and give a better idea of what is going on within the process. Append a -D option to the suspect command in /etc/postfix/master.cf, for example: smtp inet n - n - - smtpd -D Edit the debugger_command definition in /etc/postfix/main.cf so that it invokes the call tracer of your choice, for example: debugger_command = PATH=/bin:/usr/bin:/usr/local/bin; (truss -p $process_id 2>&1 | logger -p mail.info) & sleep 5 Type "postfix reload" and watch the logfile. 6 - Running daemon programs under a debugger ============================================ Append a -D option to the suspect command in /etc/postfix/master.cf, for example: smtp inet n - n - - smtpd -D Edit the debugger_command definition in /etc/postfix/main.cf so that it invokes the debugger of your choice. Two choices are described in detail: 1) If you do not have X Windows installed on the Postfix machine, or if you are not familiar with interactive debuggers, then you can try to run gdb in non-interactive mode: /etc/postfix/main.cf: -------------------- debugger_command = PATH=/bin:/usr/bin:/usr/local/bin; export PATH; (echo cont; echo where) | gdb $daemon_directory/$process_name $process_id 2>&1 >$config_directory/$process_name.$process_id.log & sleep 5 Type "postfix reload" to make the configuration changes effective. Whenever a suspect daemon process is started, an output file is created, named after the daemon and process ID (for example, smtpd.12345.log). When the process crashes, a stack trace (with output from the "where" command) is written to its logfile. 2) If you have X Windows installed on the Postfix machine, then an interactive debugger such as xxgdb can be convenient. /etc/postfix/main.cf: -------------------- debugger_command = PATH=/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin xxgdb $daemon_directory/$process_name $process_id & sleep 5 Be sure that gdb is in the command search path, and export XAUTHORITY so that X access control works, for example: % setenv XAUTHORITY ~/.Xauthority Stop and start the Postfix system. This is necessary so that Postfix runs with the proper XAUTHORITY and DISPLAY settings. Whenever the suspect daemon process is started, a debugger window pops up and you can watch in detail what happens (when using xxgdb) or a file is created (if using gdb in non-interactive mode). 7 - Unreasonable behavior ========================= Sometimes the behavior exhibit by Postfix just does not match the source code. Why can a program deviate from the instructions given by its author? There are two possibilities. 1 - The compiler has messed up. 2 - The hardware has messed up. In both cases, the program being executed is not the program that was supposed to be executed, so anything can happen. There is a third possibility: 3 - Bugs in system software (kernel or libraries). Hardware-related failures happen erratically, and they usually do not reproduce after power cycling and rebooting the system. There's little I can do about bad hardware. Be sure to use hardware that at the very least can detect memory errors. Otherwise, Postfix will just be a sitting duck waiting to be hit by a bit error. Critical systems deserve real hardware. When a compiler messes up, the problem can be reproduced whenever the resulting program is run. Compiler errors are most likely to happen in the code optimizer. If a problem is reproducible across power cycles and system reboots, it can be worthwhile to rebuild Postfix with optimization disabled, and to see if optimization makes a difference. In order to compile Postfix with optimizations turned off: % make tidy % make makefiles OPT= This produces a set of Makefiles that do not request compiler optimization. Once the makefiles are set up, build the software: % make % su # make install And see if the problem reproduces. If the problem goes away, talk to your vendor.