flex.info-3 [plain text]

This is flex.info, produced by makeinfo version 4.5 from flex.texi.

INFO-DIR-SECTION Programming
START-INFO-DIR-ENTRY
* flex: (flex).      Fast lexical analyzer generator (lex replacement).
END-INFO-DIR-ENTRY


   The flex manual is placed under the same licensing conditions as the
rest of flex:

   Copyright (C) 1990, 1997 The Regents of the University of California.
All rights reserved.

   This code is derived from software contributed to Berkeley by Vern
Paxson.

   The United States Government has rights in this work pursuant to
contract no. DE-AC03-76SF00098 between the United States Department of
Energy and the University of California.

   Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

  1.  Redistributions of source code must retain the above copyright
     notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright
     notice, this list of conditions and the following disclaimer in the
     documentation and/or other materials provided with the
     distribution.
   Neither the name of the University nor the names of its contributors
may be used to endorse or promote products derived from this software
without specific prior written permission.

   THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

File: flex.info,  Node: Debugging Options,  Next: Miscellaneous Options,  Prev: Options for Scanner Speed and Size,  Up: Scanner Options

Debugging Options
=================

`-b, --backup, `%option backup''
     Generate backing-up information to `lex.backup'.  This is a list of
     scanner states which require backing up and the input characters on
     which they do so.  By adding rules one can remove backing-up
     states.  If _all_ backing-up states are eliminated and `-Cf' or
     `-CF' is used, the generated scanner will run faster (see the
     `--perf-report' flag).  Only users who wish to squeeze every last
     cycle out of their scanners need worry about this option.  (*note
     Performance::).

`-d, --debug, `%option debug''
     makes the generated scanner run in "debug" mode.  Whenever a
     pattern is recognized and the global variable `yy_flex_debug' is
     non-zero (which is the default), the scanner will write to
     `stderr' a line of the form:


              -accepting rule at line 53 ("the matched text")

     The line number refers to the location of the rule in the file
     defining the scanner (i.e., the file that was fed to flex).
     Messages are also generated when the scanner backs up, accepts the
     default rule, reaches the end of its input buffer (or encounters a
     NUL; at this point, the two look the same as far as the scanner's
     concerned), or reaches an end-of-file.

`-p, --perf-report, `%option perf-report''
     generates a performance report to `stderr'.  The report consists of
     comments regarding features of the `flex' input file which will
     cause a serious loss of performance in the resulting scanner.  If
     you give the flag twice, you will also get comments regarding
     features that lead to minor performance losses.

     Note that the use of `REJECT', and variable trailing context
     (*note Limitations::) entails a substantial performance penalty;
     use of `yymore()', the `^' operator, and the `--interactive' flag
     entail minor performance penalties.

`-s, --nodefault, `%option nodefault''
     causes the _default rule_ (that unmatched scanner input is echoed
     to `stdout)' to be suppressed.  If the scanner encounters input
     that does not match any of its rules, it aborts with an error.
     This option is useful for finding holes in a scanner's rule set.

`-T, --trace, `%option trace''
     makes `flex' run in "trace" mode.  It will generate a lot of
     messages to `stderr' concerning the form of the input and the
     resultant non-deterministic and deterministic finite automata.
     This option is mostly for use in maintaining `flex'.

`-w, --nowarn, `%option nowarn''
     suppresses warning messages.

`-v, --verbose, `%option verbose''
     specifies that `flex' should write to `stderr' a summary of
     statistics regarding the scanner it generates.  Most of the
     statistics are meaningless to the casual `flex' user, but the
     first line identifies the version of `flex' (same as reported by
     `--version'), and the next line the flags used when generating the
     scanner, including those that are on by default.

`--warn, `%option warn''
     warn about certain things. In particular, if the default rule can
     be matched but no defualt rule has been given, the flex will warn
     you.  We recommend using this option always.



File: flex.info,  Node: Miscellaneous Options,  Prev: Debugging Options,  Up: Scanner Options

Miscellaneous Options
=====================

`-c'
     is a do-nothing option included for POSIX compliance.

     generates

`-h, -?, --help'
     generates a "help" summary of `flex''s options to `stdout' and
     then exits.

`-n'
     is another do-nothing option included only for POSIX compliance.

`-V, --version'
     prints the version number to `stdout' and exits.



File: flex.info,  Node: Performance,  Next: Cxx,  Prev: Scanner Options,  Up: Top

Performance Considerations
**************************

   The main design goal of `flex' is that it generate high-performance
scanners.  It has been optimized for dealing well with large sets of
rules.  Aside from the effects on scanner speed of the table compression
`-C' options outlined above, there are a number of options/actions
which degrade performance.  These are, from most expensive to least:


         REJECT
         arbitrary trailing context
     
         pattern sets that require backing up
         %option yylineno
         %array
     
         %option interactive
         %option always-interactive
     
         @samp{^} beginning-of-line operator
         yymore()

   with the first two all being quite expensive and the last two being
quite cheap.  Note also that `unput()' is implemented as a routine call
that potentially does quite a bit of work, while `yyless()' is a
quite-cheap macro. So if you are just putting back some excess text you
scanned, use `ss()'.

   `REJECT' should be avoided at all costs when performance is
important.  It is a particularly expensive option.

   There is one case when `%option yylineno' can be expensive. That is
when your patterns match long tokens that could _possibly_ contain a
newline character. There is no performance penalty for rules that can
not possibly match newlines, since flex does not need to check them for
newlines.  In general, you should avoid rules such as `[^f]+', which
match very long tokens, including newlines, and may possibly match your
entire file! A better approach is to separate `[^f]+' into two rules:


     %option yylineno
     %%
         [^f\n]+
         \n+

   The above scanner does not incur a performance penalty.

   Getting rid of backing up is messy and often may be an enormous
amount of work for a complicated scanner.  In principal, one begins by
using the `-b' flag to generate a `lex.backup' file.  For example, on
the input:


         %%
         foo        return TOK_KEYWORD;
         foobar     return TOK_KEYWORD;

   the file looks like:


         State #6 is non-accepting -
          associated rule line numbers:
                2       3
          out-transitions: [ o ]
          jam-transitions: EOF [ \001-n  p-\177 ]
     
         State #8 is non-accepting -
          associated rule line numbers:
                3
          out-transitions: [ a ]
          jam-transitions: EOF [ \001-`  b-\177 ]
     
         State #9 is non-accepting -
          associated rule line numbers:
                3
          out-transitions: [ r ]
          jam-transitions: EOF [ \001-q  s-\177 ]
     
         Compressed tables always back up.

   The first few lines tell us that there's a scanner state in which it
can make a transition on an 'o' but not on any other character, and
that in that state the currently scanned text does not match any rule.
The state occurs when trying to match the rules found at lines 2 and 3
in the input file.  If the scanner is in that state and then reads
something other than an 'o', it will have to back up to find a rule
which is matched.  With a bit of headscratching one can see that this
must be the state it's in when it has seen `fo'.  When this has
happened, if anything other than another `o' is seen, the scanner will
have to back up to simply match the `f' (by the default rule).

   The comment regarding State #8 indicates there's a problem when
`foob' has been scanned.  Indeed, on any character other than an `a',
the scanner will have to back up to accept "foo".  Similarly, the
comment for State #9 concerns when `fooba' has been scanned and an `r'
does not follow.

   The final comment reminds us that there's no point going to all the
trouble of removing backing up from the rules unless we're using `-Cf'
or `-CF', since there's no performance gain doing so with compressed
scanners.

   The way to remove the backing up is to add "error" rules:


         %%
         foo         return TOK_KEYWORD;
         foobar      return TOK_KEYWORD;
     
         fooba       |
         foob        |
         fo          {
                     /* false alarm, not really a keyword */
                     return TOK_ID;
                     }

   Eliminating backing up among a list of keywords can also be done
using a "catch-all" rule:


         %%
         foo         return TOK_KEYWORD;
         foobar      return TOK_KEYWORD;
     
         [a-z]+      return TOK_ID;

   This is usually the best solution when appropriate.

   Backing up messages tend to cascade.  With a complicated set of rules
it's not uncommon to get hundreds of messages.  If one can decipher
them, though, it often only takes a dozen or so rules to eliminate the
backing up (though it's easy to make a mistake and have an error rule
accidentally match a valid token.  A possible future `flex' feature
will be to automatically add rules to eliminate backing up).

   It's important to keep in mind that you gain the benefits of
eliminating backing up only if you eliminate _every_ instance of
backing up.  Leaving just one means you gain nothing.

   _Variable_ trailing context (where both the leading and trailing
parts do not have a fixed length) entails almost the same performance
loss as `REJECT' (i.e., substantial).  So when possible a rule like:


         %%
         mouse|rat/(cat|dog)   run();

   is better written:


         %%
         mouse/cat|dog         run();
         rat/cat|dog           run();

   or as


         %%
         mouse|rat/cat         run();
         mouse|rat/dog         run();

   Note that here the special '|' action does _not_ provide any
savings, and can even make things worse (*note Limitations::).

   Another area where the user can increase a scanner's performance (and
one that's easier to implement) arises from the fact that the longer the
tokens matched, the faster the scanner will run.  This is because with
long tokens the processing of most input characters takes place in the
(short) inner scanning loop, and does not often have to go through the
additional work of setting up the scanning environment (e.g., `yytext')
for the action.  Recall the scanner for C comments:


         %x comment
         %%
                 int line_num = 1;
     
         "/*"         BEGIN(comment);
     
         <comment>[^*\n]*
         <comment>"*"+[^*/\n]*
         <comment>\n             ++line_num;
         <comment>"*"+"/"        BEGIN(INITIAL);

   This could be sped up by writing it as:


         %x comment
         %%
                 int line_num = 1;
     
         "/*"         BEGIN(comment);
     
         <comment>[^*\n]*
         <comment>[^*\n]*\n      ++line_num;
         <comment>"*"+[^*/\n]*
         <comment>"*"+[^*/\n]*\n ++line_num;
         <comment>"*"+"/"        BEGIN(INITIAL);

   Now instead of each newline requiring the processing of another
action, recognizing the newlines is distributed over the other rules to
keep the matched text as long as possible.  Note that _adding_ rules
does _not_ slow down the scanner!  The speed of the scanner is
independent of the number of rules or (modulo the considerations given
at the beginning of this section) how complicated the rules are with
regard to operators such as `*' and `|'.

   A final example in speeding up a scanner: suppose you want to scan
through a file containing identifiers and keywords, one per line and
with no other extraneous characters, and recognize all the keywords.  A
natural first approach is:


         %%
         asm      |
         auto     |
         break    |
         ... etc ...
         volatile |
         while    /* it's a keyword */
     
         .|\n     /* it's not a keyword */

   To eliminate the back-tracking, introduce a catch-all rule:


         %%
         asm      |
         auto     |
         break    |
         ... etc ...
         volatile |
         while    /* it's a keyword */
     
         [a-z]+   |
         .|\n     /* it's not a keyword */

   Now, if it's guaranteed that there's exactly one word per line, then
we can reduce the total number of matches by a half by merging in the
recognition of newlines with that of the other tokens:


         %%
         asm\n    |
         auto\n   |
         break\n  |
         ... etc ...
         volatile\n |
         while\n  /* it's a keyword */
     
         [a-z]+\n |
         .|\n     /* it's not a keyword */

   One has to be careful here, as we have now reintroduced backing up
into the scanner.  In particular, while _we_ know that there will never
be any characters in the input stream other than letters or newlines,
`flex' can't figure this out, and it will plan for possibly needing to
back up when it has scanned a token like `auto' and then the next
character is something other than a newline or a letter.  Previously it
would then just match the `auto' rule and be done, but now it has no
`auto' rule, only a `auto\n' rule.  To eliminate the possibility of
backing up, we could either duplicate all rules but without final
newlines, or, since we never expect to encounter such an input and
therefore don't how it's classified, we can introduce one more
catch-all rule, this one which doesn't include a newline:


         %%
         asm\n    |
         auto\n   |
         break\n  |
         ... etc ...
         volatile\n |
         while\n  /* it's a keyword */
     
         [a-z]+\n |
         [a-z]+   |
         .|\n     /* it's not a keyword */

   Compiled with `-Cf', this is about as fast as one can get a `flex'
scanner to go for this particular problem.

   A final note: `flex' is slow when matching `NUL's, particularly when
a token contains multiple `NUL's.  It's best to write rules which match
_short_ amounts of text if it's anticipated that the text will often
include `NUL's.

   Another final note regarding performance: as mentioned in *Note
Matching::, dynamically resizing `yytext' to accommodate huge tokens is
a slow process because it presently requires that the (huge) token be
rescanned from the beginning.  Thus if performance is vital, you should
attempt to match "large" quantities of text but not "huge" quantities,
where the cutoff between the two is at about 8K characters per token.


File: flex.info,  Node: Cxx,  Next: Reentrant,  Prev: Performance,  Up: Top

Generating C++ Scanners
***********************

   *IMPORTANT*: the present form of the scanning class is _experimental_
and may change considerably between major releases.

   `flex' provides two different ways to generate scanners for use with
C++.  The first way is to simply compile a scanner generated by `flex'
using a C++ compiler instead of a C compiler.  You should not encounter
any compilation errors (*note Reporting Bugs::).  You can then use C++
code in your rule actions instead of C code.  Note that the default
input source for your scanner remains `yyin', and default echoing is
still done to `yyout'.  Both of these remain `FILE *' variables and not
C++ _streams_.

   You can also use `flex' to generate a C++ scanner class, using the
`-+' option (or, equivalently, `%option c++)', which is automatically
specified if the name of the `flex' executable ends in a '+', such as
`flex++'.  When using this option, `flex' defaults to generating the
scanner to the file `lex.yy.cc' instead of `lex.yy.c'.  The generated
scanner includes the header file `FlexLexer.h', which defines the
interface to two C++ classes.

   The first class, `FlexLexer', provides an abstract base class
defining the general scanner class interface.  It provides the
following member functions:

`const char* YYText()'
     returns the text of the most recently matched token, the
     equivalent of `yytext'.

`int YYLeng()'
     returns the length of the most recently matched token, the
     equivalent of `yyleng'.

`int lineno() const'
     returns the current input line number (see `%option yylineno)', or
     `1' if `%option yylineno' was not used.

`void set_debug( int flag )'
     sets the debugging flag for the scanner, equivalent to assigning to
     `yy_flex_debug' (*note Scanner Options::).  Note that you must
     build the scannerusing `%option debug' to include debugging
     information in it.

`int debug() const'
     returns the current setting of the debugging flag.

   Also provided are member functions equivalent to
`yy_switch_to_buffer()', `yy_create_buffer()' (though the first
argument is an `istream*' object pointer and not a `FILE*)',
`yy_flush_buffer()', `yy_delete_buffer()', and `yyrestart()' (again,
the first argument is a `istream*' object pointer).

   The second class defined in `FlexLexer.h' is `yyFlexLexer', which is
derived from `FlexLexer'.  It defines the following additional member
functions:

`yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )'
     constructs a `yyFlexLexer' object using the given streams for input
     and output.  If not specified, the streams default to `cin' and
     `cout', respectively.

`virtual int yylex()'
     performs the same role is `yylex()' does for ordinary `flex'
     scanners: it scans the input stream, consuming tokens, until a
     rule's action returns a value.  If you derive a subclass `S' from
     `yyFlexLexer' and want to access the member functions and variables
     of `S' inside `yylex()', then you need to use `%option
     yyclass="S"' to inform `flex' that you will be using that subclass
     instead of `yyFlexLexer'.  In this case, rather than generating
     `yyFlexLexer::yylex()', `flex' generates `S::yylex()' (and also
     generates a dummy `yyFlexLexer::yylex()' that calls
     `yyFlexLexer::LexerError()' if called).

`virtual void switch_streams(istream* new_in = 0, ostream* new_out = 0)'
     reassigns `yyin' to `new_in' (if non-null) and `yyout' to
     `new_out' (if non-null), deleting the previous input buffer if
     `yyin' is reassigned.

`int yylex( istream* new_in, ostream* new_out = 0 )'
     first switches the input streams via `switch_streams( new_in,
     new_out )' and then returns the value of `yylex()'.

   In addition, `yyFlexLexer' defines the following protected virtual
functions which you can redefine in derived classes to tailor the
scanner:

`virtual int LexerInput( char* buf, int max_size )'
     reads up to `max_size' characters into `buf' and returns the
     number of characters read.  To indicate end-of-input, return 0
     characters.  Note that `interactive' scanners (see the `-B' and
     `-I' flags in *Note Scanner Options::) define the macro
     `YY_INTERACTIVE'.  If you redefine `LexerInput()' and need to take
     different actions depending on whether or not the scanner might be
     scanning an interactive input source, you can test for the
     presence of this name via `#ifdef' statements.

`virtual void LexerOutput( const char* buf, int size )'
     writes out `size' characters from the buffer `buf', which, while
     `NUL'-terminated, may also contain internal `NUL's if the
     scanner's rules can match text with `NUL's in them.

`virtual void LexerError( const char* msg )'
     reports a fatal error message.  The default version of this
     function writes the message to the stream `cerr' and exits.

   Note that a `yyFlexLexer' object contains its _entire_ scanning
state.  Thus you can use such objects to create reentrant scanners, but
see also *Note Reentrant::.  You can instantiate multiple instances of
the same `yyFlexLexer' class, and you can also combine multiple C++
scanner classes together in the same program using the `-P' option
discussed above.

   Finally, note that the `%array' feature is not available to C++
scanner classes; you must use `%pointer' (the default).

   Here is an example of a simple C++ scanner:


             // An example of using the flex C++ scanner class.
     
         %{
         int mylineno = 0;
         %}
     
         string  \"[^\n"]+\"
     
         ws      [ \t]+
     
         alpha   [A-Za-z]
         dig     [0-9]
         name    ({alpha}|{dig}|\$)({alpha}|{dig}|[_.\-/$])*
         num1    [-+]?{dig}+\.?([eE][-+]?{dig}+)?
         num2    [-+]?{dig}*\.{dig}+([eE][-+]?{dig}+)?
         number  {num1}|{num2}
     
         %%
     
         {ws}    /* skip blanks and tabs */
     
         "/*"    {
                 int c;
     
                 while((c = yyinput()) != 0)
                     {
                     if(c == '\n')
                         ++mylineno;
     
                     else if(c == @samp{*})
                         {
                         if((c = yyinput()) == '/')
                             break;
                         else
                             unput(c);
                         }
                     }
                 }
     
         {number}  cout  "number "  YYText()  '\n';
     
         \n        mylineno++;
     
         {name}    cout  "name "  YYText()  '\n';
     
         {string}  cout  "string "  YYText()  '\n';
     
         %%
     
         int main( int /* argc */, char** /* argv */ )
             {
             @code{flex}Lexer* lexer = new yyFlexLexer;
             while(lexer->yylex() != 0)
                 ;
             return 0;
             }

   If you want to create multiple (different) lexer classes, you use the
`-P' flag (or the `prefix=' option) to rename each `yyFlexLexer' to
some other `xxFlexLexer'.  You then can include `<FlexLexer.h>' in your
other sources once per lexer class, first renaming `yyFlexLexer' as
follows:


         #undef yyFlexLexer
         #define yyFlexLexer xxFlexLexer
         #include <FlexLexer.h>
     
         #undef yyFlexLexer
         #define yyFlexLexer zzFlexLexer
         #include <FlexLexer.h>

   if, for example, you used `%option prefix="xx"' for one of your
scanners and `%option prefix="zz"' for the other.


File: flex.info,  Node: Reentrant,  Next: Lex and Posix,  Prev: Cxx,  Up: Top

Reentrant C Scanners
********************

   `flex' has the ability to generate a reentrant C scanner. This is
accomplished by specifying `%option reentrant' (`-R') The generated
scanner is both portable, and safe to use in one or more separate
threads of control.  The most common use for reentrant scanners is from
within multi-threaded applications.  Any thread may create and execute
a reentrant `flex' scanner without the need for synchronization with
other threads.

* Menu:

* Reentrant Uses::
* Reentrant Overview::
* Reentrant Example::
* Reentrant Detail::
* Reentrant Functions::


File: flex.info,  Node: Reentrant Uses,  Next: Reentrant Overview,  Prev: Reentrant,  Up: Reentrant

Uses for Reentrant Scanners
===========================

   However, there are other uses for a reentrant scanner.  For example,
you could scan two or more files simultaneously to implement a `diff' at
the token level (i.e., instead of at the character level):


         /* Example of maintaining more than one active scanner. */
     
         do {
             int tok1, tok2;
     
             tok1 = yylex( scanner_1 );
             tok2 = yylex( scanner_2 );
     
             if( tok1 != tok2 )
                 printf("Files are different.");
     
        } while ( tok1 && tok2 );

   Another use for a reentrant scanner is recursion.  (Note that a
recursive scanner can also be created using a non-reentrant scanner and
buffer states. *Note Multiple Input Buffers::.)

   The following crude scanner supports the `eval' command by invoking
another instance of itself.


         /* Example of recursive invocation. */
     
         %option reentrant
     
         %%
         "eval(".+")"  {
                           yyscan_t scanner;
                           YY_BUFFER_STATE buf;
     
                           yylex_init( &scanner );
                           yytext[yyleng-1] = ' ';
     
                           buf = yy_scan_string( yytext + 5, scanner );
                           yylex( scanner );
     
                           yy_delete_buffer(buf,scanner);
                           yylex_destroy( scanner );
                      }
         ...
         %%


File: flex.info,  Node: Reentrant Overview,  Next: Reentrant Example,  Prev: Reentrant Uses,  Up: Reentrant

An Overview of the Reentrant API
================================

   The API for reentrant scanners is different than for non-reentrant
scanners. Here is a quick overview of the API:

     `%option reentrant' must be specified.

   * All functions take one additional argument: `yyscanner'

   * All global variables are replaced by their macro equivalents.  (We
     tell you this because it may be important to you during debugging.)

   * `yylex_init' and `yylex_destroy' must be called before and after
     `yylex', respectively.

   * Accessor methods (get/set functions) provide access to common
     `flex' variables.

   * User-specific data can be stored in `yyextra'.


File: flex.info,  Node: Reentrant Example,  Next: Reentrant Detail,  Prev: Reentrant Overview,  Up: Reentrant

Reentrant Example
=================

   First, an example of a reentrant scanner:

         /* This scanner prints "//" comments. */
         %option reentrant stack
         %x COMMENT
         %%
         "//"                 yy_push_state( COMMENT, yyscanner);
         .|\n
         <COMMENT>\n          yy_pop_state( yyscanner );
         <COMMENT>[^\n]+      fprintf( yyout, "%s\n", yytext);
         %%
         int main ( int argc, char * argv[] )
         {
             yyscan_t scanner;
     
             yylex_init ( &scanner );
             yylex ( scanner );
             yylex_destroy ( scanner );
         return 0;
        }


File: flex.info,  Node: Reentrant Detail,  Next: Reentrant Functions,  Prev: Reentrant Example,  Up: Reentrant

The Reentrant API in Detail
===========================

   Here are the things you need to do or know to use the reentrant C
API of `flex'.

* Menu:

* Specify Reentrant::
* Extra Reentrant Argument::
* Global Replacement::
* Init and Destroy Functions::
* Accessor Methods::
* Extra Data::
* About yyscan_t::


File: flex.info,  Node: Specify Reentrant,  Next: Extra Reentrant Argument,  Prev: Reentrant Detail,  Up: Reentrant Detail

Declaring a Scanner As Reentrant
--------------------------------

   %option reentrant (-reentrant) must be specified.

   Notice that `%option reentrant' is specified in the above example
(*note Reentrant Example::. Had this option not been specified, `flex'
would have happily generated a non-reentrant scanner without
complaining. You may explicitly specify `%option noreentrant', if you
do _not_ want a reentrant scanner, although it is not necessary. The
default is to generate a non-reentrant scanner.


File: flex.info,  Node: Extra Reentrant Argument,  Next: Global Replacement,  Prev: Specify Reentrant,  Up: Reentrant Detail

The Extra Argument
------------------

   All functions take one additional argument: `yyscanner'.

   Notice that the calls to `yy_push_state' and `yy_pop_state' both
have an argument, `yyscanner' , that is not present in a non-reentrant
scanner.  Here are the declarations of `yy_push_state' and
`yy_pop_state' in the generated scanner:


         static void yy_push_state  ( int new_state , yyscan_t yyscanner ) ;
         static void yy_pop_state  ( yyscan_t yyscanner  ) ;

   Notice that the argument `yyscanner' appears in the declaration of
both functions.  In fact, all `flex' functions in a reentrant scanner
have this additional argument.  It is always the last argument in the
argument list, it is always of type `yyscan_t' (which is typedef'd to
`void *') and it is always named `yyscanner'.  As you may have guessed,
`yyscanner' is a pointer to an opaque data structure encapsulating the
current state of the scanner.  For a list of function declarations, see
*Note Reentrant Functions::. Note that preprocessor macros, such as
`BEGIN', `ECHO', and `REJECT', do not take this additional argument.


File: flex.info,  Node: Global Replacement,  Next: Init and Destroy Functions,  Prev: Extra Reentrant Argument,  Up: Reentrant Detail

Global Variables Replaced By Macros
-----------------------------------

   All global variables in traditional flex have been replaced by macro
equivalents.

   Note that in the above example, `yyout' and `yytext' are not plain
variables. These are macros that will expand to their equivalent lvalue.
All of the familiar `flex' globals have been replaced by their macro
equivalents. In particular, `yytext', `yyleng', `yylineno', `yyin',
`yyout', `yyextra', `yylval', and `yylloc' are macros. You may safely
use these macros in actions as if they were plain variables. We only
tell you this so you don't expect to link to these variables
externally. Currently, each macro expands to a member of an internal
struct, e.g.,


     #define yytext (((struct yyguts_t*)yyscanner)->yytext_r)

   One important thing to remember about `yytext' and friends is that
`yytext' is not a global variable in a reentrant scanner, you can not
access it directly from outside an action or from other functions. You
must use an accessor method, e.g., `yyget_text', to accomplish this.
(See below).


File: flex.info,  Node: Init and Destroy Functions,  Next: Accessor Methods,  Prev: Global Replacement,  Up: Reentrant Detail

Init and Destroy Functions
--------------------------

   `yylex_init' and `yylex_destroy' must be called before and after
`yylex', respectively.


         int yylex_init ( yyscan_t * ptr_yy_globals ) ;
         int yylex ( yyscan_t yyscanner ) ;
         int yylex_destroy ( yyscan_t yyscanner ) ;

   The function `yylex_init' must be called before calling any other
function. The argument to `yylex_init' is the address of an
uninitialized pointer to be filled in by `flex'. The contents of
`ptr_yy_globals' need not be initialized, since `flex' will overwrite
it anyway. The value stored in `ptr_yy_globals' should thereafter be
passed to `yylex()' and yylex_destroy().  Flex does not save the
argument passed to `yylex_init', so it is safe to pass the address of a
local pointer to `yylex_init'.  The function `yylex' should be familiar
to you by now. The reentrant version takes one argument, which is the
value returned (via an argument) by `yylex_init'.  Otherwise, it
behaves the same as the non-reentrant version of `yylex'.

   `yylex_init' returns 0 (zero) on success, or non-zero on failure, in
which case, errno is set to one of the following values:

   * ENOMEM Memory allocation error. *Note memory-management::.

   * EINVAL Invalid argument.

   The function `yylex_destroy' should be called to free resources used
by the scanner. After `yylex_destroy' is called, the contents of
`yyscanner' should not be used.  Of course, there is no need to destroy
a scanner if you plan to reuse it.  A `flex' scanner (both reentrant
and non-reentrant) may be restarted by calling `yyrestart'.

   Below is an example of a program that creates a scanner, uses it,
then destroys it when done:


         int main ()
         {
             yyscan_t scanner;
             int tok;
     
             yylex_init(&scanner);
     
             while ((tok=yylex()) > 0)
                 printf("tok=%d  yytext=%s\n", tok, yyget_text(scanner));
     
             yylex_destroy(scanner);
             return 0;
         }


File: flex.info,  Node: Accessor Methods,  Next: Extra Data,  Prev: Init and Destroy Functions,  Up: Reentrant Detail

Accessing Variables with Reentrant Scanners
-------------------------------------------

   Accessor methods (get/set functions) provide access to common `flex'
variables.

   Many scanners that you build will be part of a larger project.
Portions of your project will need access to `flex' values, such as
`yytext'.  In a non-reentrant scanner, these values are global, so
there is no problem accessing them. However, in a reentrant scanner,
there are no global `flex' values. You can not access them directly.
Instead, you must access `flex' values using accessor methods (get/set
functions). Each accessor method is named `yyget_NAME' or `yyset_NAME',
where `NAME' is the name of the `flex' variable you want. For example:


         /* Set the last character of yytext to NULL. */
         void chop ( yyscan_t scanner )
         {
             int len = yyget_leng( scanner );
             yyget_text( scanner )[len - 1] = '\0';
         }

   The above code may be called from within an action like this:


         %%
         .+\n    { chop( yyscanner );}

   You may find that `%option header-file' is particularly useful for
generating prototypes of all the accessor functions. *Note
option-header::.


File: flex.info,  Node: Extra Data,  Next: About yyscan_t,  Prev: Accessor Methods,  Up: Reentrant Detail

Extra Data
----------

   User-specific data can be stored in `yyextra'.

   In a reentrant scanner, it is unwise to use global variables to
communicate with or maintain state between different pieces of your
program.  However, you may need access to external data or invoke
external functions from within the scanner actions.  Likewise, you may
need to pass information to your scanner (e.g., open file descriptors,
or database connections).  In a non-reentrant scanner, the only way to
do this would be through the use of global variables.  `Flex' allows
you to store arbitrary, "extra" data in a scanner.  This data is
accessible through the accessor methods `yyget_extra' and `yyset_extra'
from outside the scanner, and through the shortcut macro `yyextra' from
within the scanner itself. They are defined as follows:


         #define YY_EXTRA_TYPE  void*
         YY_EXTRA_TYPE  yyget_extra ( yyscan_t scanner );
         void           yyset_extra ( YY_EXTRA_TYPE arbitrary_data , yyscan_t scanner);

   By default, `YY_EXTRA_TYPE' is defined as type `void *'.  You will
have to cast `yyextra' and the return value from `yyget_extra' to the
appropriate value each time you access the extra data.  To avoid
casting, you may override the default type by defining `YY_EXTRA_TYPE'
in section 1 of your scanner:


         /* An example of overriding YY_EXTRA_TYPE. */
         %{
         #include <sys/stat.h>
         #include <unistd.h>
         #define YY_EXTRA_TYPE  struct stat*
         %}
         %option reentrant
         %%
     
         __filesize__     printf( "%ld", yyextra->st_size  );
         __lastmod__      printf( "%ld", yyextra->st_mtime );
         %%
         void scan_file( char* filename )
         {
             yyscan_t scanner;
             struct stat buf;
     
             yylex_init ( &scanner );
             yyset_in( fopen(filename,"r"), scanner );
     
             stat( filename, &buf);
             yyset_extra( &buf, scanner );
             yylex ( scanner );
             yylex_destroy( scanner );
        }


File: flex.info,  Node: About yyscan_t,  Prev: Extra Data,  Up: Reentrant Detail

About yyscan_t
--------------

   `yyscan_t' is defined as:


          typedef void* yyscan_t;

   It is initialized by `yylex_init()' to point to an internal
structure. You should never access this value directly. In particular,
you should never attempt to free it (use `yylex_destroy()' instead.)


File: flex.info,  Node: Reentrant Functions,  Prev: Reentrant Detail,  Up: Reentrant

Functions and Macros Available in Reentrant C Scanners
======================================================

   The following Functions are available in a reentrant scanner:


         char *yyget_text ( yyscan_t scanner );
         int yyget_leng ( yyscan_t scanner );
         FILE *yyget_in ( yyscan_t scanner );
         FILE *yyget_out ( yyscan_t scanner );
         int yyget_lineno ( yyscan_t scanner );
         YY_EXTRA_TYPE yyget_extra ( yyscan_t scanner );
         int  yyget_debug ( yyscan_t scanner );
     
         void yyset_debug ( int flag, yyscan_t scanner );
         void yyset_in  ( FILE * in_str , yyscan_t scanner );
         void yyset_out  ( FILE * out_str , yyscan_t scanner );
         void yyset_lineno ( int line_number , yyscan_t scanner );
         void yyset_extra ( YY_EXTRA_TYPE user_defined , yyscan_t scanner );

   There are no "set" functions for yytext and yyleng. This is
intentional.

   The following Macro shortcuts are available in actions in a reentrant
scanner:


         yytext
         yyleng
         yyin
         yyout
         yylineno
         yyextra
         yy_flex_debug

   In a reentrant C scanner, support for yylineno is always present
(i.e., you may access yylineno), but the value is never modified by
`flex' unless `%option yylineno' is enabled. This is to allow the user
to maintain the line count independently of `flex'.

   The following functions and macros are made available when `%option
bison-bridge' (`--bison-bridge') is specified:


         YYSTYPE * yyget_lval ( yyscan_t scanner );
         void yyset_lval ( YYSTYPE * yylvalp , yyscan_t scanner );
         yylval

   The following functions and macros are made available when `%option
bison-locations' (`--bison-locations') is specified:


         YYLTYPE *yyget_lloc ( yyscan_t scanner );
         void yyset_lloc ( YYLTYPE * yyllocp , yyscan_t scanner );
         yylloc

   Support for yylval assumes that `YYSTYPE' is a valid type.  Support
for yylloc assumes that `YYSLYPE' is a valid type.  Typically, these
types are generated by `bison', and are included in section 1 of the
`flex' input.


File: flex.info,  Node: Lex and Posix,  Next: Memory Management,  Prev: Reentrant,  Up: Top

Incompatibilities with Lex and Posix
************************************

   `flex' is a rewrite of the AT&T Unix _lex_ tool (the two
implementations do not share any code, though), with some extensions and
incompatibilities, both of which are of concern to those who wish to
write scanners acceptable to both implementations.  `flex' is fully
compliant with the POSIX `lex' specification, except that when using
`%pointer' (the default), a call to `unput()' destroys the contents of
`yytext', which is counter to the POSIX specification.  In this section
we discuss all of the known areas of incompatibility between `flex',
AT&T `lex', and the POSIX specification.  `flex''s `-l' option turns on
maximum compatibility with the original AT&T `lex' implementation, at
the cost of a major loss in the generated scanner's performance.  We
note below which incompatibilities can be overcome using the `-l'
option.  `flex' is fully compatible with `lex' with the following
exceptions:

   * The undocumented `lex' scanner internal variable `yylineno' is not
     supported unless `-l' or `%option yylineno' is used.

   * `yylineno' should be maintained on a per-buffer basis, rather than
     a per-scanner (single global variable) basis.

   * `yylineno' is not part of the POSIX specification.

   * The `input()' routine is not redefinable, though it may be called
     to read characters following whatever has been matched by a rule.
     If `input()' encounters an end-of-file the normal `yywrap()'
     processing is done.  A "real" end-of-file is returned by `input()'
     as `EOF'.

   * Input is instead controlled by defining the `YY_INPUT()' macro.

   * The `flex' restriction that `input()' cannot be redefined is in
     accordance with the POSIX specification, which simply does not
     specify any way of controlling the scanner's input other than by
     making an initial assignment to `yyin'.

   * The `unput()' routine is not redefinable.  This restriction is in
     accordance with POSIX.

   * `flex' scanners are not as reentrant as `lex' scanners.  In
     particular, if you have an interactive scanner and an interrupt
     handler which long-jumps out of the scanner, and the scanner is
     subsequently called again, you may get the following message:


              fatal @code{flex} scanner internal error--end of buffer missed

     To reenter the scanner, first use:


              yyrestart( yyin );

     Note that this call will throw away any buffered input; usually
     this isn't a problem with an interactive scanner. *Note
     Reentrant::, for `flex''s reentrant API.

   * Also note that `flex' C++ scanner classes _are_ reentrant, so if
     using C++ is an option for you, you should use them instead.
     *Note Cxx::, and *Note Reentrant::  for details.

   * `output()' is not supported.  Output from the ECHO macro is done
     to the file-pointer `yyout' (default `stdout)'.

   * `output()' is not part of the POSIX specification.

   * `lex' does not support exclusive start conditions (%x), though they
     are in the POSIX specification.

   * When definitions are expanded, `flex' encloses them in parentheses.
     With `lex', the following:


              NAME    [A-Z][A-Z0-9]*
              %%
              foo{NAME}?      printf( "Found it\n" );
              %%

     will not match the string `foo' because when the macro is expanded
     the rule is equivalent to `foo[A-Z][A-Z0-9]*?'  and the precedence
     is such that the `?' is associated with `[A-Z0-9]*'.  With `flex',
     the rule will be expanded to `foo([A-Z][A-Z0-9]*)?' and so the
     string `foo' will match.

   * Note that if the definition begins with `^' or ends with `$' then
     it is _not_ expanded with parentheses, to allow these operators to
     appear in definitions without losing their special meanings.  But
     the `<s>', `/', and `<<EOF>>' operators cannot be used in a `flex'
     definition.

   * Using `-l' results in the `lex' behavior of no parentheses around
     the definition.

   * The POSIX specification is that the definition be enclosed in
     parentheses.

   * Some implementations of `lex' allow a rule's action to begin on a
     separate line, if the rule's pattern has trailing whitespace:


              %%
              foo|bar<space here>
                { foobar_action();}

     `flex' does not support this feature.

   * The `lex' `%r' (generate a Ratfor scanner) option is not
     supported.  It is not part of the POSIX specification.

   * After a call to `unput()', _yytext_ is undefined until the next
     token is matched, unless the scanner was built using `%array'.
     This is not the case with `lex' or the POSIX specification.  The
     `-l' option does away with this incompatibility.

   * The precedence of the `{,}' (numeric range) operator is different.
     The AT&T and POSIX specifications of `lex' interpret `abc{1,3}'
     as match one, two, or three occurrences of `abc'", whereas `flex'
     interprets it as "match `ab' followed by one, two, or three
     occurrences of `c'".  The `-l' and `--posix' options do away with
     this incompatibility.

   * The precedence of the `^' operator is different.  `lex' interprets
     `^foo|bar' as "match either 'foo' at the beginning of a line, or
     'bar' anywhere", whereas `flex' interprets it as "match either
     `foo' or `bar' if they come at the beginning of a line".  The
     latter is in agreement with the POSIX specification.

   * The special table-size declarations such as `%a' supported by
     `lex' are not required by `flex' scanners..  `flex' ignores them.

   * The name `FLEX_SCANNER' is `#define''d so scanners may be written
     for use with either `flex' or `lex'.  Scanners also include
     `YY_FLEX_MAJOR_VERSION',  `YY_FLEX_MINOR_VERSION' and
     `YY_FLEX_SUBMINOR_VERSION' indicating which version of `flex'
     generated the scanner. For example, for the 2.5.22 release, these
     defines would be 2,  5 and 22 respectively. If the version of
     `flex' being used is a beta version, then the symbol `FLEX_BETA'
     is defined.

   * The symbols `[[' and `]]' in the code sections of the input may
     conflict with the m4 delimiters. *Note M4 Dependency::.


   The following `flex' features are not included in `lex' or the POSIX
specification:

   * C++ scanners

   * %option

   * start condition scopes

   * start condition stacks

   * interactive/non-interactive scanners

   * yy_scan_string() and friends

   * yyterminate()

   * yy_set_interactive()

   * yy_set_bol()

   * YY_AT_BOL()    <<EOF>>

   * <*>

   * YY_DECL

   * YY_START

   * YY_USER_ACTION

   * YY_USER_INIT

   * #line directives

   * %{}'s around actions

   * reentrant C API

   * multiple actions on a line

   * almost all of the `flex' command-line options

   The feature "multiple actions on a line" refers to the fact that
with `flex' you can put multiple actions on the same line, separated
with semi-colons, while with `lex', the following:


         foo    handle_foo(); ++num_foos_seen;

   is (rather surprisingly) truncated to


         foo    handle_foo();

   `flex' does not truncate the action.  Actions that are not enclosed
in braces are simply terminated at the end of the line.


File: flex.info,  Node: Memory Management,  Next: Serialized Tables,  Prev: Lex and Posix,  Up: Top

Memory Management
*****************

   This chapter describes how flex handles dynamic memory, and how you
can override the default behavior.

* Menu:

* The Default Memory Management::
* Overriding The Default Memory Management::
* A Note About yytext And Memory::


File: flex.info,  Node: The Default Memory Management,  Next: Overriding The Default Memory Management,  Prev: Memory Management,  Up: Memory Management

The Default Memory Management
=============================

   Flex allocates dynamic memory during initialization, and once in a
while from within a call to yylex(). Initialization takes place during
the first call to yylex(). Thereafter, flex may reallocate more memory
if it needs to enlarge a buffer. As of version 2.5.9 Flex will clean up
all memory when you call `yylex_destroy' *Note faq-memory-leak::.

   Flex allocates dynamic memory for four purposes, listed below (1)

16kB for the input buffer.
     Flex allocates memory for the character buffer used to perform
     pattern matching.  Flex must read ahead from the input stream and
     store it in a large character buffer.  This buffer is typically
     the largest chunk of dynamic memory flex consumes. This buffer
     will grow if necessary, doubling the size each time.  Flex frees
     this memory when you call yylex_destroy().  The default size of
     this buffer (16384 bytes) is almost always too large.  The ideal
     size for this buffer is the length of the longest token expected,
     in bytes, plus a little more.  Flex will allocate a few extra
     bytes for housekeeping. Currently, to override the size of the
     input buffer you must `#define YY_BUF_SIZE' to whatever number of
     bytes you want. We don't plan to change this in the near future,
     but we reserve the right to do so if we ever add a more robust
     memory management API.

64kb for the REJECT state. This will only be allocated if you use REJECT.
     The size is the large enough to hold the same number of states as
     characters in the input buffer. If you override the size of the
     input buffer (via `YY_BUF_SIZE'), then you automatically override
     the size of this buffer as well.

100 bytes for the start condition stack.
     Flex allocates memory for the start condition stack. This is the
     stack used for pushing start states, i.e., with yy_push_state().
     It will grow if necessary.  Since the states are simply integers,
     this stack doesn't consume much memory.  This stack is not present
     if `%option stack' is not specified.  You will rarely need to tune
     this buffer. The ideal size for this stack is the maximum depth
     expected.  The memory for this stack is automatically destroyed
     when you call yylex_destroy(). *Note option-stack::.

40 bytes for each YY_BUFFER_STATE.
     Flex allocates memory for each YY_BUFFER_STATE. The buffer state
     itself is about 40 bytes, plus an additional large character
     buffer (described above.)  The initial buffer state is created
     during initialization, and with each call to yy_create_buffer().
     You can't tune the size of this, but you can tune the character
     buffer as described above. Any buffer state that you explicitly
     create by calling yy_create_buffer() is _NOT_ destroyed
     automatically. You must call yy_delete_buffer() to free the
     memory. The exception to this rule is that flex will delete the
     current buffer automatically when you call yylex_destroy(). If you
     delete the current buffer, be sure to set it to NULL.  That way,
     flex will not try to delete the buffer a second time (possibly
     crashing your program!) At the time of this writing, flex does not
     provide a growable stack for the buffer states.  You have to
     manage that yourself.  *Note Multiple Input Buffers::.

84 bytes for the reentrant scanner guts
     Flex allocates about 84 bytes for the reentrant scanner structure
     when you call yylex_init(). It is destroyed when the user calls
     yylex_destroy().


   ---------- Footnotes ----------

   (1) The quantities given here are approximate, and may vary due to
host architecture, compiler configuration, or due to future
enhancements to flex.