stabs_2.html [plain text]

<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.51
     from /mnt/apple/gdb/source/gdb.apple/source/gdb/gdb/doc/stabs.texinfo on 23 November 1999 -->

<TITLE>STABS - Encoding the Structure of the Program</TITLE>
</HEAD>
<BODY>
Go to the <A HREF="stabs_1.html">first</A>, <A HREF="stabs_1.html">previous</A>, <A HREF="stabs_3.html">next</A>, <A HREF="stabs_14.html">last</A> section, <A HREF="stabs_toc.html">table of contents</A>.
<P><HR><P>


<H1><A NAME="SEC7" HREF="stabs_toc.html#TOC7">Encoding the Structure of the Program</A></H1>

<P>
The elements of the program structure that stabs encode include the name
of the main function, the names of the source and include files, the
line numbers, procedure names and types, and the beginnings and ends of
blocks of code.

</P>



<H2><A NAME="SEC8" HREF="stabs_toc.html#TOC8">Main Program</A></H2>

<P>
<A NAME="IDX1"></A>
Most languages allow the main program to have any name.  The
<CODE>N_MAIN</CODE> stab type tells the debugger the name that is used in this
program.  Only the string field is significant; it is the name of
a function which is the main program.  Most C compilers do not use this
stab (they expect the debugger to assume that the name is <CODE>main</CODE>),
but some C compilers emit an <CODE>N_MAIN</CODE> stab for the <CODE>main</CODE>
function.  I'm not sure how XCOFF handles this.

</P>


<H2><A NAME="SEC9" HREF="stabs_toc.html#TOC9">Paths and Names of the Source Files</A></H2>

<P>
<A NAME="IDX2"></A>
Before any other stabs occur, there must be a stab specifying the source
file.  This information is contained in a symbol of stab type
<CODE>N_SO</CODE>; the string field contains the name of the file.  The
value of the symbol is the start address of the portion of the
text section corresponding to that file.

</P>
<P>
With the Sun Solaris2 compiler, the desc field contains a
source-language code.

</P>
<P>
Some compilers (for example, GCC2 and SunOS4 <TT>`/bin/cc'</TT>) also
include the directory in which the source was compiled, in a second
<CODE>N_SO</CODE> symbol preceding the one containing the file name.  This
symbol can be distinguished by the fact that it ends in a slash.  Code
from the <CODE>cfront</CODE> C++ compiler can have additional <CODE>N_SO</CODE> symbols for
nonexistent source files after the <CODE>N_SO</CODE> for the real source file;
these are believed to contain no useful information.

</P>
<P>
For example:

</P>

<PRE>
.stabs "/cygint/s1/users/jcm/play/",100,0,0,Ltext0     # 100 is N_SO
.stabs "hello.c",100,0,0,Ltext0
        .text
Ltext0:
</PRE>

<P>
<A NAME="IDX3"></A>
Instead of <CODE>N_SO</CODE> symbols, XCOFF uses a <CODE>.file</CODE> assembler
directive which assembles to a <CODE>C_FILE</CODE> symbol; explaining this in
detail is outside the scope of this document.

</P>
<P>
If it is useful to indicate the end of a source file, this is done with
an <CODE>N_SO</CODE> symbol with an empty string for the name.  The value is
the address of the end of the text section for the file.  For some
systems, there is no indication of the end of a source file, and you
just need to figure it ended when you see an <CODE>N_SO</CODE> for a different
source file, or a symbol ending in <CODE>.o</CODE> (which at least some
linkers insert to mark the start of a new <CODE>.o</CODE> file).

</P>


<H2><A NAME="SEC10" HREF="stabs_toc.html#TOC10">Names of Include Files</A></H2>

<P>
There are several schemes for dealing with include files: the
traditional <CODE>N_SOL</CODE> approach, Sun's <CODE>N_BINCL</CODE> approach, and the
XCOFF <CODE>C_BINCL</CODE> approach (which despite the similar name has little in
common with <CODE>N_BINCL</CODE>).

</P>
<P>
<A NAME="IDX4"></A>
An <CODE>N_SOL</CODE> symbol specifies which include file subsequent symbols
refer to.  The string field is the name of the file and the value is the
text address corresponding to the end of the previous include file and
the start of this one.  To specify the main source file again, use an
<CODE>N_SOL</CODE> symbol with the name of the main source file.

</P>
<P>
<A NAME="IDX5"></A>
<A NAME="IDX6"></A>
<A NAME="IDX7"></A>
The <CODE>N_BINCL</CODE> approach works as follows.  An <CODE>N_BINCL</CODE> symbol
specifies the start of an include file.  In an object file, only the
string is significant; the linker puts data into some of the other
fields.  The end of the include file is marked by an <CODE>N_EINCL</CODE>
symbol (which has no string field).  In an object file, there is no
significant data in the <CODE>N_EINCL</CODE> symbol.  <CODE>N_BINCL</CODE> and
<CODE>N_EINCL</CODE> can be nested.

</P>
<P>
If the linker detects that two source files have identical stabs between
an <CODE>N_BINCL</CODE> and <CODE>N_EINCL</CODE> pair (as will generally be the case
for a header file), then it only puts out the stabs once.  Each
additional occurance is replaced by an <CODE>N_EXCL</CODE> symbol.  I believe
the GNU linker and the Sun (both SunOS4 and Solaris) linker are the only
ones which supports this feature.

</P>
<P>
A linker which supports this feature will set the value of a
<CODE>N_BINCL</CODE> symbol to the total of all the characters in the stabs
strings included in the header file, omitting any file numbers.  The
value of an <CODE>N_EXCL</CODE> symbol is the same as the value of the
<CODE>N_BINCL</CODE> symbol it replaces.  This information can be used to
match up <CODE>N_EXCL</CODE> and <CODE>N_BINCL</CODE> symbols which have the same
filename.  The <CODE>N_EINCL</CODE> value, and the values of the other and
description fields for all three, appear to always be zero.

</P>
<P>
<A NAME="IDX8"></A>
<A NAME="IDX9"></A>
For the start of an include file in XCOFF, use the <TT>`.bi'</TT> assembler
directive, which generates a <CODE>C_BINCL</CODE> symbol.  A <TT>`.ei'</TT>
directive, which generates a <CODE>C_EINCL</CODE> symbol, denotes the end of
the include file.  Both directives are followed by the name of the
source file in quotes, which becomes the string for the symbol.
The value of each symbol, produced automatically by the assembler
and linker, is the offset into the executable of the beginning
(inclusive, as you'd expect) or end (inclusive, as you would not expect)
of the portion of the COFF line table that corresponds to this include
file.  <CODE>C_BINCL</CODE> and <CODE>C_EINCL</CODE> do not nest.

</P>


<H2><A NAME="SEC11" HREF="stabs_toc.html#TOC11">Line Numbers</A></H2>

<P>
<A NAME="IDX10"></A>
An <CODE>N_SLINE</CODE> symbol represents the start of a source line.  The
desc field contains the line number and the value contains the code
address for the start of that source line.  On most machines the address
is absolute; for stabs in sections (see section <A HREF="stabs_13.html#SEC87">Using Stabs in Their Own Sections</A>), it is
relative to the function in which the <CODE>N_SLINE</CODE> symbol occurs.

</P>
<P>
<A NAME="IDX11"></A>
<A NAME="IDX12"></A>
GNU documents <CODE>N_DSLINE</CODE> and <CODE>N_BSLINE</CODE> symbols for line
numbers in the data or bss segments, respectively.  They are identical
to <CODE>N_SLINE</CODE> but are relocated differently by the linker.  They
were intended to be used to describe the source location of a variable
declaration, but I believe that GCC2 actually puts the line number in
the desc field of the stab for the variable itself.  GDB has been
ignoring these symbols (unless they contain a string field) since
at least GDB 3.5.

</P>
<P>
For single source lines that generate discontiguous code, such as flow
of control statements, there may be more than one line number entry for
the same source line.  In this case there is a line number entry at the
start of each code range, each with the same line number.

</P>
<P>
XCOFF does not use stabs for line numbers.  Instead, it uses COFF line
numbers (which are outside the scope of this document).  Standard COFF
line numbers cannot deal with include files, but in XCOFF this is fixed
with the <CODE>C_BINCL</CODE> method of marking include files (see section <A HREF="stabs_2.html#SEC10">Names of Include Files</A>).

</P>


<H2><A NAME="SEC12" HREF="stabs_toc.html#TOC12">Procedures</A></H2>

<P>
<A NAME="IDX13"></A>
<A NAME="IDX14"></A>
<A NAME="IDX15"></A>
<A NAME="IDX16"></A>
All of the following stabs normally use the <CODE>N_FUN</CODE> symbol type.
However, Sun's <CODE>acc</CODE> compiler on SunOS4 uses <CODE>N_GSYM</CODE> and
<CODE>N_STSYM</CODE>, which means that the value of the stab for the function
is useless and the debugger must get the address of the function from
the non-stab symbols instead.  On systems where non-stab symbols have
leading underscores, the stabs will lack underscores and the debugger
needs to know about the leading underscore to match up the stab and the
non-stab symbol.  BSD Fortran is said to use <CODE>N_FNAME</CODE> with the
same restriction; the value of the symbol is not useful (I'm not sure it
really does use this, because GDB doesn't handle this and no one has
complained).

</P>
<P>
<A NAME="IDX17"></A>
A function is represented by an <SAMP>`F'</SAMP> symbol descriptor for a global
(extern) function, and <SAMP>`f'</SAMP> for a static (local) function.  For
a.out, the value of the symbol is the address of the start of the
function; it is already relocated.  For stabs in ELF, the SunPRO
compiler version 2.0.1 and GCC put out an address which gets relocated
by the linker.  In a future release SunPRO is planning to put out zero,
in which case the address can be found from the ELF (non-stab) symbol.
Because looking things up in the ELF symbols would probably be slow, I'm
not sure how to find which symbol of that name is the right one, and
this doesn't provide any way to deal with nested functions, it would
probably be better to make the value of the stab an address relative to
the start of the file, or just absolute.  See section <A HREF="stabs_13.html#SEC89">Having the Linker Relocate Stabs in ELF</A> for more information on linker relocation of stabs in ELF
files.  For XCOFF, the stab uses the <CODE>C_FUN</CODE> storage class and the
value of the stab is meaningless; the address of the function can be
found from the csect symbol (XTY_LD/XMC_PR).

</P>
<P>
The type information of the stab represents the return type of the
function; thus <SAMP>`foo:f5'</SAMP> means that foo is a function returning type
5.  There is no need to try to get the line number of the start of the
function from the stab for the function; it is in the next
<CODE>N_SLINE</CODE> symbol.

</P>
<P>
Some compilers (such as Sun's Solaris compiler) support an extension for
specifying the types of the arguments.  I suspect this extension is not
used for old (non-prototyped) function definitions in C.  If the
extension is in use, the type information of the stab for the function
is followed by type information for each argument, with each argument
preceded by <SAMP>`;'</SAMP>.  An argument type of 0 means that additional
arguments are being passed, whose types and number may vary (<SAMP>`...'</SAMP>
in ANSI C).  GDB has tolerated this extension (parsed the syntax, if not
necessarily used the information) since at least version 4.8; I don't
know whether all versions of dbx tolerate it.  The argument types given
here are not redundant with the symbols for the formal parameters
(see section <A HREF="stabs_4.html#SEC24">Parameters</A>); they are the types of the arguments as they are
passed, before any conversions might take place.  For example, if a C
function which is declared without a prototype takes a <CODE>float</CODE>
argument, the value is passed as a <CODE>double</CODE> but then converted to a
<CODE>float</CODE>.  Debuggers need to use the types given in the arguments
when printing values, but when calling the function they need to use the
types given in the symbol defining the function.

</P>
<P>
If the return type and types of arguments of a function which is defined
in another source file are specified (i.e., a function prototype in ANSI
C), traditionally compilers emit no stab; the only way for the debugger
to find the information is if the source file where the function is
defined was also compiled with debugging symbols.  As an extension the
Solaris compiler uses symbol descriptor <SAMP>`P'</SAMP> followed by the return
type of the function, followed by the arguments, each preceded by
<SAMP>`;'</SAMP>, as in a stab with symbol descriptor <SAMP>`f'</SAMP> or <SAMP>`F'</SAMP>.
This use of symbol descriptor <SAMP>`P'</SAMP> can be distinguished from its use
for register parameters (see section <A HREF="stabs_4.html#SEC25">Passing Parameters in Registers</A>) by the fact that it has
symbol type <CODE>N_FUN</CODE>.

</P>
<P>
The AIX documentation also defines symbol descriptor <SAMP>`J'</SAMP> as an
internal function.  I assume this means a function nested within another
function.  It also says symbol descriptor <SAMP>`m'</SAMP> is a module in
Modula-2 or extended Pascal.

</P>
<P>
Procedures (functions which do not return values) are represented as
functions returning the <CODE>void</CODE> type in C.  I don't see why this couldn't
be used for all languages (inventing a <CODE>void</CODE> type for this purpose if
necessary), but the AIX documentation defines <SAMP>`I'</SAMP>, <SAMP>`P'</SAMP>, and
<SAMP>`Q'</SAMP> for internal, global, and static procedures, respectively.
These symbol descriptors are unusual in that they are not followed by
type information.

</P>
<P>
The following example shows a stab for a function <CODE>main</CODE> which
returns type number <CODE>1</CODE>.  The <CODE>_main</CODE> specified for the value
is a reference to an assembler label which is used to fill in the start
address of the function.

</P>

<PRE>
.stabs "main:F1",36,0,0,_main      # 36 is N_FUN
</PRE>

<P>
The stab representing a procedure is located immediately following the
code of the procedure.  This stab is in turn directly followed by a
group of other stabs describing elements of the procedure.  These other
stabs describe the procedure's parameters, its block local variables, and
its block structure.

</P>
<P>
If functions can appear in different sections, then the debugger may not
be able to find the end of a function.  Recent versions of GCC will mark
the end of a function with an <CODE>N_FUN</CODE> symbol with an empty string
for the name.  The value is the address of the end of the current
function.  Without such a symbol, there is no indication of the address
of the end of a function, and you must assume that it ended at the
starting address of the next function or at the end of the text section
for the program.

</P>


<H2><A NAME="SEC13" HREF="stabs_toc.html#TOC13">Nested Procedures</A></H2>

<P>
For any of the symbol descriptors representing procedures, after the
symbol descriptor and the type information is optionally a scope
specifier.  This consists of a comma, the name of the procedure, another
comma, and the name of the enclosing procedure.  The first name is local
to the scope specified, and seems to be redundant with the name of the
symbol (before the <SAMP>`:'</SAMP>).  This feature is used by GCC, and
presumably Pascal, Modula-2, etc., compilers, for nested functions.

</P>
<P>
If procedures are nested more than one level deep, only the immediately
containing scope is specified.  For example, this code:

</P>

<PRE>
int
foo (int x)
{
  int bar (int y)
    {
      int baz (int z)
        {
          return x + y + z;
        }
      return baz (x + 2 * y);
    }
  return x + bar (3 * x);
}
</PRE>

<P>
produces the stabs:

</P>

<PRE>
.stabs "baz:f1,baz,bar",36,0,0,_baz.15         # 36 is N_FUN
.stabs "bar:f1,bar,foo",36,0,0,_bar.12
.stabs "foo:F1",36,0,0,_foo
</PRE>



<H2><A NAME="SEC14" HREF="stabs_toc.html#TOC14">Block Structure</A></H2>

<P>
<A NAME="IDX18"></A>
<A NAME="IDX19"></A>
The program's block structure is represented by the <CODE>N_LBRAC</CODE> (left
brace) and the <CODE>N_RBRAC</CODE> (right brace) stab types.  The variables
defined inside a block precede the <CODE>N_LBRAC</CODE> symbol for most
compilers, including GCC.  Other compilers, such as the Convex, Acorn
RISC machine, and Sun <CODE>acc</CODE> compilers, put the variables after the
<CODE>N_LBRAC</CODE> symbol.  The values of the <CODE>N_LBRAC</CODE> and
<CODE>N_RBRAC</CODE> symbols are the start and end addresses of the code of
the block, respectively.  For most machines, they are relative to the
starting address of this source file.  For the Gould NP1, they are
absolute.  For stabs in sections (see section <A HREF="stabs_13.html#SEC87">Using Stabs in Their Own Sections</A>), they are
relative to the function in which they occur.

</P>
<P>
The <CODE>N_LBRAC</CODE> and <CODE>N_RBRAC</CODE> stabs that describe the block
scope of a procedure are located after the <CODE>N_FUN</CODE> stab that
represents the procedure itself.

</P>
<P>
Sun documents the desc field of <CODE>N_LBRAC</CODE> and
<CODE>N_RBRAC</CODE> symbols as containing the nesting level of the block.
However, dbx seems to not care, and GCC always sets desc to
zero.

</P>
<P>
<A NAME="IDX20"></A>
<A NAME="IDX21"></A>
<A NAME="IDX22"></A>
For XCOFF, block scope is indicated with <CODE>C_BLOCK</CODE> symbols.  If the
name of the symbol is <SAMP>`.bb'</SAMP>, then it is the beginning of the block;
if the name of the symbol is <SAMP>`.be'</SAMP>; it is the end of the block.

</P>


<H2><A NAME="SEC15" HREF="stabs_toc.html#TOC15">Alternate Entry Points</A></H2>

<P>
<A NAME="IDX23"></A>
<A NAME="IDX24"></A>
Some languages, like Fortran, have the ability to enter procedures at
some place other than the beginning.  One can declare an alternate entry
point.  The <CODE>N_ENTRY</CODE> stab is for this; however, the Sun FORTRAN
compiler doesn't use it.  According to AIX documentation, only the name
of a <CODE>C_ENTRY</CODE> stab is significant; the address of the alternate
entry point comes from the corresponding external symbol.  A previous
revision of this document said that the value of an <CODE>N_ENTRY</CODE> stab
was the address of the alternate entry point, but I don't know the
source for that information.

</P>
<P><HR><P>
Go to the <A HREF="stabs_1.html">first</A>, <A HREF="stabs_1.html">previous</A>, <A HREF="stabs_3.html">next</A>, <A HREF="stabs_14.html">last</A> section, <A HREF="stabs_toc.html">table of contents</A>.
</BODY>
</HTML>