accessmethods.html   [plain text]


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <title>Access Methods</title>
    <link rel="stylesheet" href="gettingStarted.css" type="text/css" />
    <meta name="generator" content="DocBook XSL Stylesheets V1.62.4" />
    <link rel="home" href="index.html" title="Getting Started with Berkeley DB" />
    <link rel="up" href="introduction.html" title="Chapter 1. Introduction to Berkeley DB " />
    <link rel="previous" href="concepts.html" title="Berkeley DB Concepts" />
    <link rel="next" href="databaseLimits.html" title="Database Limits and Portability" />
  </head>
  <body>
    <div class="navheader">
      <table width="100%" summary="Navigation header">
        <tr>
          <th colspan="3" align="center">Access Methods</th>
        </tr>
        <tr>
          <td width="20%" align="left"><a accesskey="p" href="concepts.html">Prev</a> </td>
          <th width="60%" align="center">Chapter 1. Introduction to Berkeley DB </th>
          <td width="20%" align="right"> <a accesskey="n" href="databaseLimits.html">Next</a></td>
        </tr>
      </table>
      <hr />
    </div>
    <div class="sect1" lang="en" xml:lang="en">
      <div class="titlepage">
        <div>
          <div>
            <h2 class="title" style="clear: both"><a id="accessmethods"></a>Access Methods</h2>
          </div>
        </div>
        <div></div>
      </div>
      <p>
        While this manual will focus primarily on the BTree access method, it is
        still useful to briefly describe all of the access methods that DB
        makes available.
    </p>
      <p>
        Note that an access method can be selected only when the database is
        created. Once selected,  actual API usage is generally
        identical across all access methods. That is, while some 
        exceptions exist, mechanically you interact with the library in the same
        way regardless of which access method you have selected.
    </p>
      <p>
        The access method that you should choose is gated first by what you want
        to use as a key, and then secondly by the performance that you see
        for a given access method. 
    </p>
      <p>
        The following are the available access methods:
    </p>
      <div class="informaltable">
        <table border="1" width="80%">
          <colgroup>
            <col align="left" />
            <col align="left" />
          </colgroup>
          <thead>
            <tr>
              <th align="center">Access Method</th>
              <th align="center">Description</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td align="left">BTree</td>
              <td align="left" valign="top">
                <p>
                    Data is stored in a sorted, balanced tree structure. 
                    Both the key and the data for BTree records can be
                    arbitrarily complex. That is, they can contain single values
                    such as an integer or a string, or complex types such as a
                    structure. Also, although not the default
                    behavior, it is possible for two records to
                    use keys that compare as equals. When this occurs, the
                    records are considered to be duplicates of one another.
                </p>
              </td>
            </tr>
            <tr>
              <td align="left">Hash</td>
              <td align="left" valign="top">
                <p>
                    Data is stored in an extended linear hash table.  Like
                    BTree, the key and the data used for Hash records can be of
                    arbitrarily complex data.  Also, like BTree, duplicate
                    records are optionally supported.
                </p>
              </td>
            </tr>
            <tr>
              <td align="left">Queue</td>
              <td align="left" valign="top">
                <p>
                    Data is stored in a queue as fixed-length records. Each
                    record uses a logical record number as its key. This access
                    method is designed for fast inserts at the tail of the
                    queue, and it has a special operation that deletes and
                    returns a record from the head of the queue.
                </p>
                <p>
                    This access method is unusual in that it provides record
                    level locking. This can provide
                    beneficial performance improvements in applications
                    requiring concurrent access to the queue.
                </p>
              </td>
            </tr>
            <tr>
              <td align="left">Recno</td>
              <td align="left" valign="top">
                <p>
                    Data is stored in either fixed or variable-length records.
                    Like Queue, Recno records use logical record numbers as keys.
                </p>
              </td>
            </tr>
          </tbody>
        </table>
      </div>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="selectAM"></a>Selecting Access Methods</h3>
            </div>
          </div>
          <div></div>
        </div>
        <p>
            To select an access method, you should first consider what you want
            to use as a key for you database records. If you want to use
            arbitrary data (even strings), then you should use either BTree or
            Hash. If you want to use logical record numbers (essentially
            integers) then you should use Queue or Recno.
        </p>
        <p>
            Once you have made this decision, you must choose between either
            BTree or Hash, or Queue or Recno. This decision is described next.
        </p>
      </div>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="BTreeVSHash"></a>Choosing between BTree and Hash</h3>
            </div>
          </div>
          <div></div>
        </div>
        <p>
            For small working datasets that fit entirely in memory, there is no
            difference between BTree and Hash. Both will perform just as well
            as the other. In this situation, you might just as well use BTree,
            if for no other reason than the majority of DB applications use
            BTree.
         </p>
        <p>
            Note that the main concern here is your
            working dataset, not your entire dataset. Many applications maintain
            large amounts of information but only need to access some small
            portion of that data with any frequency. So what you want to
            consider is the data that you will routinely use, not the sum total
            of all the data managed by your application.
         </p>
        <p>
            However, as your working dataset grows to the point
            where you cannot fit it all into memory, then you need to take more
            care when choosing your access method. Specifically, choose:
        </p>
        <div class="itemizedlist">
          <ul type="disc">
            <li>
              <p>
                    BTree if your keys have some locality of reference. That is,
                    if they sort well and you can expect that a query for a
                    given key will likely be followed by a query for one of its
                    neighbors. 
                </p>
            </li>
            <li>
              <p>
                    Hash if your dataset is extremely large. For any given
                    access method, DB must maintain a certain amount of internal
                    information. However, the amount of information that DB
                    must maintain for BTree is much greater than for Hash. The
                    result is that as your dataset grows, this internal
                    information can dominate the cache to the point where there
                    is relatively little space left for application data. 
                    As a result, BTree can be forced to perform disk I/O much more
                    frequently than would Hash given the same amount of data. 
                </p>
              <p>
                    Moreover, if your dataset becomes so large that DB will
                    almost certainly have to perform disk I/O to satisfy a
                    random request, then Hash will definitely out perform BTree
                    because it has fewer internal records to search through than
                    does BTree.
                </p>
            </li>
          </ul>
        </div>
      </div>
      <div class="sect2" lang="en" xml:lang="en">
        <div class="titlepage">
          <div>
            <div>
              <h3 class="title"><a id="QueueVSRecno"></a>Choosing between Queue and Recno</h3>
            </div>
          </div>
          <div></div>
        </div>
        <p>
            Queue or Recno are used when the application wants to use logical
            record numbers for the primary database key. Logical record numbers
            are essentially integers that uniquely identify the database
            record. They can be either mutable or fixed, where a mutable record
            number is one that might change as database records are stored or
            deleted. Fixed logical record numbers never change regardless of
            what database operations are performed.
        </p>
        <p>
            When deciding between Queue and Recno, choose:
        </p>
        <div class="itemizedlist">
          <ul type="disc">
            <li>
              <p>
                    Queue if your application requires high degrees of
                    concurrency. Queue provides record-level locking (as opposed
                    to the page-level locking that the other access methods
                    use), and this can result in significantly faster throughput
                    for highly concurrent applications.
                </p>
              <p>
                    Note, however, that Queue provides support only for fixed
                    length records. So if the size of the data that you want to
                    store varies widely from record to record, you should
                    probably choose an access method other than Queue.
                </p>
            </li>
            <li>
              <p>
                    Recno if you want mutable record numbers. Queue is only
                    capable of providing fixed record numbers. Also, Recno
                    provides support for databases whose permanent storage is a
                    flat text file. This is useful for applications looking for
                    fast, temporary storage while the data is being read or
                    modified.
                </p>
            </li>
          </ul>
        </div>
      </div>
    </div>
    <div class="navfooter">
      <hr />
      <table width="100%" summary="Navigation footer">
        <tr>
          <td width="40%" align="left"><a accesskey="p" href="concepts.html">Prev</a> </td>
          <td width="20%" align="center">
            <a accesskey="u" href="introduction.html">Up</a>
          </td>
          <td width="40%" align="right"> <a accesskey="n" href="databaseLimits.html">Next</a></td>
        </tr>
        <tr>
          <td width="40%" align="left" valign="top">Berkeley DB Concepts </td>
          <td width="20%" align="center">
            <a accesskey="h" href="index.html">Home</a>
          </td>
          <td width="40%" align="right" valign="top"> Database Limits and Portability</td>
        </tr>
      </table>
    </div>
  </body>
</html>