gperf_4.html   [plain text]


<HTML>
<HEAD>
<!-- This HTML file has been created by texi2html 1.51
     from gperf.texi on 31 March 2007 -->

<TITLE>Perfect Hash Function Generator - 2  Static search structures and GNU gperf</TITLE>
</HEAD>
<BODY>
Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_3.html">previous</A>, <A HREF="gperf_5.html">next</A>, <A HREF="gperf_10.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
<P><HR><P>


<H1><A NAME="SEC6" HREF="gperf_toc.html#TOC6">2  Static search structures and GNU <CODE>gperf</CODE></A></H1>
<P>
<A NAME="IDX2"></A>

</P>
<P>
A <STRONG>static search structure</STRONG> is an Abstract Data Type with certain
fundamental operations, e.g., <EM>initialize</EM>, <EM>insert</EM>,
and <EM>retrieve</EM>.  Conceptually, all insertions occur before any
retrievals.  In practice, <CODE>gperf</CODE> generates a <EM>static</EM> array
containing search set keywords and any associated attributes specified
by the user.  Thus, there is essentially no execution-time cost for the
insertions.  It is a useful data structure for representing <EM>static
search sets</EM>.  Static search sets occur frequently in software system
applications.  Typical static search sets include compiler reserved
words, assembler instruction opcodes, and built-in shell interpreter
commands.  Search set members, called <STRONG>keywords</STRONG>, are inserted into
the structure only once, usually during program initialization, and are
not generally modified at run-time.

</P>
<P>
Numerous static search structure implementations exist, e.g.,
arrays, linked lists, binary search trees, digital search tries, and
hash tables.  Different approaches offer trade-offs between space
utilization and search time efficiency.  For example, an <VAR>n</VAR> element
sorted array is space efficient, though the average-case time
complexity for retrieval operations using binary search is
proportional to log <VAR>n</VAR>.  Conversely, hash table implementations
often locate a table entry in constant time, but typically impose
additional memory overhead and exhibit poor worst case performance.

</P>
<P>
<A NAME="IDX3"></A>
<EM>Minimal perfect hash functions</EM> provide an optimal solution for a
particular class of static search sets.  A minimal perfect hash
function is defined by two properties:

</P>

<UL>
<LI>

It allows keyword recognition in a static search set using at most
<EM>one</EM> probe into the hash table.  This represents the "perfect"
property.
<LI>

The actual memory allocated to store the keywords is precisely large
enough for the keyword set, and <EM>no larger</EM>.  This is the
"minimal" property.
</UL>

<P>
For most applications it is far easier to generate <EM>perfect</EM> hash
functions than <EM>minimal perfect</EM> hash functions.  Moreover,
non-minimal perfect hash functions frequently execute faster than
minimal ones in practice.  This phenomena occurs since searching a
sparse keyword table increases the probability of locating a "null"
entry, thereby reducing string comparisons.  <CODE>gperf</CODE>'s default
behavior generates <EM>near-minimal</EM> perfect hash functions for
keyword sets.  However, <CODE>gperf</CODE> provides many options that permit
user control over the degree of minimality and perfection.

</P>
<P>
Static search sets often exhibit relative stability over time.  For
example, Ada's 63 reserved words have remained constant for nearly a
decade.  It is therefore frequently worthwhile to expend concerted
effort building an optimal search structure <EM>once</EM>, if it
subsequently receives heavy use multiple times.  <CODE>gperf</CODE> removes
the drudgery associated with constructing time- and space-efficient
search structures by hand.  It has proven a useful and practical tool
for serious programming projects.  Output from <CODE>gperf</CODE> is currently
used in several production and research compilers, including GNU C, GNU
C++, GNU Java, GNU Pascal, and GNU Modula 3.  The latter two compilers are
not yet part of the official GNU distribution.  Each compiler utilizes
<CODE>gperf</CODE> to automatically generate static search structures that
efficiently identify their respective reserved keywords.

</P>
<P><HR><P>
Go to the <A HREF="gperf_1.html">first</A>, <A HREF="gperf_3.html">previous</A>, <A HREF="gperf_5.html">next</A>, <A HREF="gperf_10.html">last</A> section, <A HREF="gperf_toc.html">table of contents</A>.
</BODY>
</HTML>