<HTML> <HEAD> <!-- This HTML file has been created by texi2html 1.51 from /mnt/apple/gdb/source/gdb.apple/source/gdb/gdb/doc/stabs.texinfo on 23 November 1999 --> <TITLE>STABS - Encoding the Structure of the Program</TITLE> </HEAD> <BODY> Go to the <A HREF="stabs_1.html">first</A>, <A HREF="stabs_1.html">previous</A>, <A HREF="stabs_3.html">next</A>, <A HREF="stabs_14.html">last</A> section, <A HREF="stabs_toc.html">table of contents</A>. <P><HR><P> <H1><A NAME="SEC7" HREF="stabs_toc.html#TOC7">Encoding the Structure of the Program</A></H1> <P> The elements of the program structure that stabs encode include the name of the main function, the names of the source and include files, the line numbers, procedure names and types, and the beginnings and ends of blocks of code. </P> <H2><A NAME="SEC8" HREF="stabs_toc.html#TOC8">Main Program</A></H2> <P> <A NAME="IDX1"></A> Most languages allow the main program to have any name. The <CODE>N_MAIN</CODE> stab type tells the debugger the name that is used in this program. Only the string field is significant; it is the name of a function which is the main program. Most C compilers do not use this stab (they expect the debugger to assume that the name is <CODE>main</CODE>), but some C compilers emit an <CODE>N_MAIN</CODE> stab for the <CODE>main</CODE> function. I'm not sure how XCOFF handles this. </P> <H2><A NAME="SEC9" HREF="stabs_toc.html#TOC9">Paths and Names of the Source Files</A></H2> <P> <A NAME="IDX2"></A> Before any other stabs occur, there must be a stab specifying the source file. This information is contained in a symbol of stab type <CODE>N_SO</CODE>; the string field contains the name of the file. The value of the symbol is the start address of the portion of the text section corresponding to that file. </P> <P> With the Sun Solaris2 compiler, the desc field contains a source-language code. </P> <P> Some compilers (for example, GCC2 and SunOS4 <TT>`/bin/cc'</TT>) also include the directory in which the source was compiled, in a second <CODE>N_SO</CODE> symbol preceding the one containing the file name. This symbol can be distinguished by the fact that it ends in a slash. Code from the <CODE>cfront</CODE> C++ compiler can have additional <CODE>N_SO</CODE> symbols for nonexistent source files after the <CODE>N_SO</CODE> for the real source file; these are believed to contain no useful information. </P> <P> For example: </P> <PRE> .stabs "/cygint/s1/users/jcm/play/",100,0,0,Ltext0 # 100 is N_SO .stabs "hello.c",100,0,0,Ltext0 .text Ltext0: </PRE> <P> <A NAME="IDX3"></A> Instead of <CODE>N_SO</CODE> symbols, XCOFF uses a <CODE>.file</CODE> assembler directive which assembles to a <CODE>C_FILE</CODE> symbol; explaining this in detail is outside the scope of this document. </P> <P> If it is useful to indicate the end of a source file, this is done with an <CODE>N_SO</CODE> symbol with an empty string for the name. The value is the address of the end of the text section for the file. For some systems, there is no indication of the end of a source file, and you just need to figure it ended when you see an <CODE>N_SO</CODE> for a different source file, or a symbol ending in <CODE>.o</CODE> (which at least some linkers insert to mark the start of a new <CODE>.o</CODE> file). </P> <H2><A NAME="SEC10" HREF="stabs_toc.html#TOC10">Names of Include Files</A></H2> <P> There are several schemes for dealing with include files: the traditional <CODE>N_SOL</CODE> approach, Sun's <CODE>N_BINCL</CODE> approach, and the XCOFF <CODE>C_BINCL</CODE> approach (which despite the similar name has little in common with <CODE>N_BINCL</CODE>). </P> <P> <A NAME="IDX4"></A> An <CODE>N_SOL</CODE> symbol specifies which include file subsequent symbols refer to. The string field is the name of the file and the value is the text address corresponding to the end of the previous include file and the start of this one. To specify the main source file again, use an <CODE>N_SOL</CODE> symbol with the name of the main source file. </P> <P> <A NAME="IDX5"></A> <A NAME="IDX6"></A> <A NAME="IDX7"></A> The <CODE>N_BINCL</CODE> approach works as follows. An <CODE>N_BINCL</CODE> symbol specifies the start of an include file. In an object file, only the string is significant; the linker puts data into some of the other fields. The end of the include file is marked by an <CODE>N_EINCL</CODE> symbol (which has no string field). In an object file, there is no significant data in the <CODE>N_EINCL</CODE> symbol. <CODE>N_BINCL</CODE> and <CODE>N_EINCL</CODE> can be nested. </P> <P> If the linker detects that two source files have identical stabs between an <CODE>N_BINCL</CODE> and <CODE>N_EINCL</CODE> pair (as will generally be the case for a header file), then it only puts out the stabs once. Each additional occurance is replaced by an <CODE>N_EXCL</CODE> symbol. I believe the GNU linker and the Sun (both SunOS4 and Solaris) linker are the only ones which supports this feature. </P> <P> A linker which supports this feature will set the value of a <CODE>N_BINCL</CODE> symbol to the total of all the characters in the stabs strings included in the header file, omitting any file numbers. The value of an <CODE>N_EXCL</CODE> symbol is the same as the value of the <CODE>N_BINCL</CODE> symbol it replaces. This information can be used to match up <CODE>N_EXCL</CODE> and <CODE>N_BINCL</CODE> symbols which have the same filename. The <CODE>N_EINCL</CODE> value, and the values of the other and description fields for all three, appear to always be zero. </P> <P> <A NAME="IDX8"></A> <A NAME="IDX9"></A> For the start of an include file in XCOFF, use the <TT>`.bi'</TT> assembler directive, which generates a <CODE>C_BINCL</CODE> symbol. A <TT>`.ei'</TT> directive, which generates a <CODE>C_EINCL</CODE> symbol, denotes the end of the include file. Both directives are followed by the name of the source file in quotes, which becomes the string for the symbol. The value of each symbol, produced automatically by the assembler and linker, is the offset into the executable of the beginning (inclusive, as you'd expect) or end (inclusive, as you would not expect) of the portion of the COFF line table that corresponds to this include file. <CODE>C_BINCL</CODE> and <CODE>C_EINCL</CODE> do not nest. </P> <H2><A NAME="SEC11" HREF="stabs_toc.html#TOC11">Line Numbers</A></H2> <P> <A NAME="IDX10"></A> An <CODE>N_SLINE</CODE> symbol represents the start of a source line. The desc field contains the line number and the value contains the code address for the start of that source line. On most machines the address is absolute; for stabs in sections (see section <A HREF="stabs_13.html#SEC87">Using Stabs in Their Own Sections</A>), it is relative to the function in which the <CODE>N_SLINE</CODE> symbol occurs. </P> <P> <A NAME="IDX11"></A> <A NAME="IDX12"></A> GNU documents <CODE>N_DSLINE</CODE> and <CODE>N_BSLINE</CODE> symbols for line numbers in the data or bss segments, respectively. They are identical to <CODE>N_SLINE</CODE> but are relocated differently by the linker. They were intended to be used to describe the source location of a variable declaration, but I believe that GCC2 actually puts the line number in the desc field of the stab for the variable itself. GDB has been ignoring these symbols (unless they contain a string field) since at least GDB 3.5. </P> <P> For single source lines that generate discontiguous code, such as flow of control statements, there may be more than one line number entry for the same source line. In this case there is a line number entry at the start of each code range, each with the same line number. </P> <P> XCOFF does not use stabs for line numbers. Instead, it uses COFF line numbers (which are outside the scope of this document). Standard COFF line numbers cannot deal with include files, but in XCOFF this is fixed with the <CODE>C_BINCL</CODE> method of marking include files (see section <A HREF="stabs_2.html#SEC10">Names of Include Files</A>). </P> <H2><A NAME="SEC12" HREF="stabs_toc.html#TOC12">Procedures</A></H2> <P> <A NAME="IDX13"></A> <A NAME="IDX14"></A> <A NAME="IDX15"></A> <A NAME="IDX16"></A> All of the following stabs normally use the <CODE>N_FUN</CODE> symbol type. However, Sun's <CODE>acc</CODE> compiler on SunOS4 uses <CODE>N_GSYM</CODE> and <CODE>N_STSYM</CODE>, which means that the value of the stab for the function is useless and the debugger must get the address of the function from the non-stab symbols instead. On systems where non-stab symbols have leading underscores, the stabs will lack underscores and the debugger needs to know about the leading underscore to match up the stab and the non-stab symbol. BSD Fortran is said to use <CODE>N_FNAME</CODE> with the same restriction; the value of the symbol is not useful (I'm not sure it really does use this, because GDB doesn't handle this and no one has complained). </P> <P> <A NAME="IDX17"></A> A function is represented by an <SAMP>`F'</SAMP> symbol descriptor for a global (extern) function, and <SAMP>`f'</SAMP> for a static (local) function. For a.out, the value of the symbol is the address of the start of the function; it is already relocated. For stabs in ELF, the SunPRO compiler version 2.0.1 and GCC put out an address which gets relocated by the linker. In a future release SunPRO is planning to put out zero, in which case the address can be found from the ELF (non-stab) symbol. Because looking things up in the ELF symbols would probably be slow, I'm not sure how to find which symbol of that name is the right one, and this doesn't provide any way to deal with nested functions, it would probably be better to make the value of the stab an address relative to the start of the file, or just absolute. See section <A HREF="stabs_13.html#SEC89">Having the Linker Relocate Stabs in ELF</A> for more information on linker relocation of stabs in ELF files. For XCOFF, the stab uses the <CODE>C_FUN</CODE> storage class and the value of the stab is meaningless; the address of the function can be found from the csect symbol (XTY_LD/XMC_PR). </P> <P> The type information of the stab represents the return type of the function; thus <SAMP>`foo:f5'</SAMP> means that foo is a function returning type 5. There is no need to try to get the line number of the start of the function from the stab for the function; it is in the next <CODE>N_SLINE</CODE> symbol. </P> <P> Some compilers (such as Sun's Solaris compiler) support an extension for specifying the types of the arguments. I suspect this extension is not used for old (non-prototyped) function definitions in C. If the extension is in use, the type information of the stab for the function is followed by type information for each argument, with each argument preceded by <SAMP>`;'</SAMP>. An argument type of 0 means that additional arguments are being passed, whose types and number may vary (<SAMP>`...'</SAMP> in ANSI C). GDB has tolerated this extension (parsed the syntax, if not necessarily used the information) since at least version 4.8; I don't know whether all versions of dbx tolerate it. The argument types given here are not redundant with the symbols for the formal parameters (see section <A HREF="stabs_4.html#SEC24">Parameters</A>); they are the types of the arguments as they are passed, before any conversions might take place. For example, if a C function which is declared without a prototype takes a <CODE>float</CODE> argument, the value is passed as a <CODE>double</CODE> but then converted to a <CODE>float</CODE>. Debuggers need to use the types given in the arguments when printing values, but when calling the function they need to use the types given in the symbol defining the function. </P> <P> If the return type and types of arguments of a function which is defined in another source file are specified (i.e., a function prototype in ANSI C), traditionally compilers emit no stab; the only way for the debugger to find the information is if the source file where the function is defined was also compiled with debugging symbols. As an extension the Solaris compiler uses symbol descriptor <SAMP>`P'</SAMP> followed by the return type of the function, followed by the arguments, each preceded by <SAMP>`;'</SAMP>, as in a stab with symbol descriptor <SAMP>`f'</SAMP> or <SAMP>`F'</SAMP>. This use of symbol descriptor <SAMP>`P'</SAMP> can be distinguished from its use for register parameters (see section <A HREF="stabs_4.html#SEC25">Passing Parameters in Registers</A>) by the fact that it has symbol type <CODE>N_FUN</CODE>. </P> <P> The AIX documentation also defines symbol descriptor <SAMP>`J'</SAMP> as an internal function. I assume this means a function nested within another function. It also says symbol descriptor <SAMP>`m'</SAMP> is a module in Modula-2 or extended Pascal. </P> <P> Procedures (functions which do not return values) are represented as functions returning the <CODE>void</CODE> type in C. I don't see why this couldn't be used for all languages (inventing a <CODE>void</CODE> type for this purpose if necessary), but the AIX documentation defines <SAMP>`I'</SAMP>, <SAMP>`P'</SAMP>, and <SAMP>`Q'</SAMP> for internal, global, and static procedures, respectively. These symbol descriptors are unusual in that they are not followed by type information. </P> <P> The following example shows a stab for a function <CODE>main</CODE> which returns type number <CODE>1</CODE>. The <CODE>_main</CODE> specified for the value is a reference to an assembler label which is used to fill in the start address of the function. </P> <PRE> .stabs "main:F1",36,0,0,_main # 36 is N_FUN </PRE> <P> The stab representing a procedure is located immediately following the code of the procedure. This stab is in turn directly followed by a group of other stabs describing elements of the procedure. These other stabs describe the procedure's parameters, its block local variables, and its block structure. </P> <P> If functions can appear in different sections, then the debugger may not be able to find the end of a function. Recent versions of GCC will mark the end of a function with an <CODE>N_FUN</CODE> symbol with an empty string for the name. The value is the address of the end of the current function. Without such a symbol, there is no indication of the address of the end of a function, and you must assume that it ended at the starting address of the next function or at the end of the text section for the program. </P> <H2><A NAME="SEC13" HREF="stabs_toc.html#TOC13">Nested Procedures</A></H2> <P> For any of the symbol descriptors representing procedures, after the symbol descriptor and the type information is optionally a scope specifier. This consists of a comma, the name of the procedure, another comma, and the name of the enclosing procedure. The first name is local to the scope specified, and seems to be redundant with the name of the symbol (before the <SAMP>`:'</SAMP>). This feature is used by GCC, and presumably Pascal, Modula-2, etc., compilers, for nested functions. </P> <P> If procedures are nested more than one level deep, only the immediately containing scope is specified. For example, this code: </P> <PRE> int foo (int x) { int bar (int y) { int baz (int z) { return x + y + z; } return baz (x + 2 * y); } return x + bar (3 * x); } </PRE> <P> produces the stabs: </P> <PRE> .stabs "baz:f1,baz,bar",36,0,0,_baz.15 # 36 is N_FUN .stabs "bar:f1,bar,foo",36,0,0,_bar.12 .stabs "foo:F1",36,0,0,_foo </PRE> <H2><A NAME="SEC14" HREF="stabs_toc.html#TOC14">Block Structure</A></H2> <P> <A NAME="IDX18"></A> <A NAME="IDX19"></A> The program's block structure is represented by the <CODE>N_LBRAC</CODE> (left brace) and the <CODE>N_RBRAC</CODE> (right brace) stab types. The variables defined inside a block precede the <CODE>N_LBRAC</CODE> symbol for most compilers, including GCC. Other compilers, such as the Convex, Acorn RISC machine, and Sun <CODE>acc</CODE> compilers, put the variables after the <CODE>N_LBRAC</CODE> symbol. The values of the <CODE>N_LBRAC</CODE> and <CODE>N_RBRAC</CODE> symbols are the start and end addresses of the code of the block, respectively. For most machines, they are relative to the starting address of this source file. For the Gould NP1, they are absolute. For stabs in sections (see section <A HREF="stabs_13.html#SEC87">Using Stabs in Their Own Sections</A>), they are relative to the function in which they occur. </P> <P> The <CODE>N_LBRAC</CODE> and <CODE>N_RBRAC</CODE> stabs that describe the block scope of a procedure are located after the <CODE>N_FUN</CODE> stab that represents the procedure itself. </P> <P> Sun documents the desc field of <CODE>N_LBRAC</CODE> and <CODE>N_RBRAC</CODE> symbols as containing the nesting level of the block. However, dbx seems to not care, and GCC always sets desc to zero. </P> <P> <A NAME="IDX20"></A> <A NAME="IDX21"></A> <A NAME="IDX22"></A> For XCOFF, block scope is indicated with <CODE>C_BLOCK</CODE> symbols. If the name of the symbol is <SAMP>`.bb'</SAMP>, then it is the beginning of the block; if the name of the symbol is <SAMP>`.be'</SAMP>; it is the end of the block. </P> <H2><A NAME="SEC15" HREF="stabs_toc.html#TOC15">Alternate Entry Points</A></H2> <P> <A NAME="IDX23"></A> <A NAME="IDX24"></A> Some languages, like Fortran, have the ability to enter procedures at some place other than the beginning. One can declare an alternate entry point. The <CODE>N_ENTRY</CODE> stab is for this; however, the Sun FORTRAN compiler doesn't use it. According to AIX documentation, only the name of a <CODE>C_ENTRY</CODE> stab is significant; the address of the alternate entry point comes from the corresponding external symbol. A previous revision of this document said that the value of an <CODE>N_ENTRY</CODE> stab was the address of the alternate entry point, but I don't know the source for that information. </P> <P><HR><P> Go to the <A HREF="stabs_1.html">first</A>, <A HREF="stabs_1.html">previous</A>, <A HREF="stabs_3.html">next</A>, <A HREF="stabs_14.html">last</A> section, <A HREF="stabs_toc.html">table of contents</A>. </BODY> </HTML>