dbis.html [plain text]

<!--$Id: dbis.so,v 10.11 2006/09/19 16:21:42 bostic Exp $-->
<!--Copyright (c) 1997,2008 Oracle.  All rights reserved.-->
<!--See the file LICENSE for redistribution information.-->
<html>
<head>
<title>Berkeley DB Reference Guide: What is Berkeley DB?</title>
<meta name="description" content="Berkeley DB: An embedded database programmatic toolkit.">
<meta name="keywords" content="embedded,database,programmatic,toolkit,btree,hash,hashing,transaction,transactions,locking,logging,access method,access methods,Java,C,C++">
</head>
<body bgcolor=white>
<table width="100%"><tr valign=top>
<td><b><dl><dt>Berkeley DB Reference Guide:<dd>Introduction</dl></b></td>
<td align=right><a href="../intro/terrain.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../intro/dbisnot.html"><img src="../../images/next.gif" alt="Next"></a>
</td></tr></table>
<p align=center><b>What is Berkeley DB?</b></p>
<p>So far, we've discussed database systems in general terms. It's time
now to consider Berkeley DB in particular and see how it fits into the
framework we have introduced. The key question is, what kinds of
applications should use Berkeley DB?</p>
<p>Berkeley DB is an Open Source embedded database library that provides
scalable, high-performance, transaction-protected data management
services to applications.  Berkeley DB provides a simple function-call API for
data access and management.</p>
<p>By "Open Source," we mean Berkeley DB is distributed under a license that
conforms to the <a href="http://www.opensource.org/osd.html">Open
Source Definition</a>. This license guarantees Berkeley DB is freely available
for use and redistribution in other Open Source applications.  Oracle
Corporation sells commercial licenses allowing the redistribution of
Berkeley DB in proprietary applications.  In all cases the complete source
code for Berkeley DB is freely available for download and use.</p>
<p>Berkeley DB is "embedded" because it links directly into the application.  It
runs in the same address space as the application. As a result, no
inter-process communication, either over the network or between
processes on the same machine, is required for database operations.
Berkeley DB provides a simple function-call API for a number of programming
languages, including C, C++, Java, Perl, Tcl, Python, and PHP. All
database operations happen inside the library. Multiple processes, or
multiple threads in a single process, can all use the database at the
same time as each uses the Berkeley DB library. Low-level services like
locking, transaction logging, shared buffer management, memory
management, and so on are all handled transparently by the library.</p>
<p>The Berkeley DB library is extremely portable. It runs under almost all UNIX
and Linux variants, Windows, and a number of embedded real-time
operating systems. It runs on both 32-bit and 64-bit systems.  It has
been deployed on high-end Internet servers, desktop machines, and on
palmtop computers, set-top boxes, in network switches, and elsewhere.
Once Berkeley DB is linked into the application, the end user generally does
not know that there's a database present at all.</p>
<p>Berkeley DB is scalable in a number of respects. The database library itself
is quite compact (under 300 kilobytes of text space on common
architectures), but it can manage databases up to 256 terabytes in size.
It also supports high concurrency, with thousands of users operating on
the same database at the same time. Berkeley DB is small enough to run in
tightly constrained embedded systems, but can take advantage of
gigabytes of memory and terabytes of disk on high-end server machines.</p>
<p>Berkeley DB generally outperforms relational and object-oriented database
systems in embedded applications for a couple of reasons. First, because
the library runs in the same address space, no inter-process
communication is required for database operations. The cost of
communicating between processes on a single machine, or among machines
on a network, is much higher than the cost of making a function call.
Second, because Berkeley DB uses a simple function-call interface for all
operations, there is no query language to parse, and no execution plan
to produce.</p>
<b>Data Access Services</b>
<p>Berkeley DB applications can choose the storage structure that best suits the
application. Berkeley DB supports hash tables, Btrees, simple
record-number-based storage, and persistent queues. Programmers can
create tables using any of these storage structures, and can mix
operations on different kinds of tables in a single application.</p>
<p>Hash tables are generally good for very large databases that need
predictable search and update times for random-access records.  Hash
tables allow users to ask, "Does this key exist?" or to fetch a record
with a known key.  Hash tables do not allow users to ask for records
with keys that are close to a known key.</p>
<p>Btrees are better for range-based searches, as when the application
needs to find all records with keys between some starting and ending
value.  Btrees also do a better job of exploiting <i>locality
of reference</i>.  If the application is likely to touch keys near each
other at the same time, the Btrees work well. The tree structure keeps
keys that are close together near one another in storage, so fetching
nearby values usually doesn't require a disk access.</p>
<p>Record-number-based storage is natural for applications that need to
store and fetch records, but that do not have a simple way to generate
keys of their own. In a record number table, the record number is the
key for the record. Berkeley DB will generate these record numbers
automatically.</p>
<p>Queues are well-suited for applications that create records, and then
must deal with those records in creation order. A good example is
on-line purchasing systems. Orders can enter the system at any time,
but should generally be filled in the order in which they were placed.</p>
<b>Data management services</b>
<p>Berkeley DB offers important data management services, including concurrency,
transactions, and recovery. All of these services work on all of the
storage structures.</p>
<p>Many users can work on the same database concurrently. Berkeley DB handles
locking transparently, ensuring that two users working on the same
record do not interfere with one another.</p>
<p>The library provides strict ACID transaction semantics, by default.
However, applications are allowed to relax the isolation guarantees
the database system makes.</p>
<p>Multiple operations can be grouped into a single transaction, and can
be committed or rolled back atomically. Berkeley DB uses a technique called
<i>two-phase locking</i> to be sure that concurrent transactions
are isolated from one another, and a technique called
<i>write-ahead logging</i> to guarantee that committed changes
survive application, system, or hardware failures.</p>
<p>When an application starts up, it can ask Berkeley DB to run recovery.
Recovery restores the database to a clean state, with all committed
changes present, even after a crash. The database is guaranteed to be
consistent and all committed changes are guaranteed to be present when
recovery completes.</p>
<p>An application can specify, when it starts up, which data management
services it will use. Some applications need fast, single-user,
non-transactional Btree data storage. In that case, the application can
disable the locking and transaction systems, and will not incur the
overhead of locking or logging. If an application needs to support
multiple concurrent users, but doesn't need transactions, it can turn
on locking without transactions. Applications that need concurrent,
transaction-protected database access can enable all of the
subsystems.</p>
<p>In all these cases, the application uses the same function-call API to
fetch and update records.</p>
<b>Design</b>
<p>Berkeley DB was designed to provide industrial-strength database services to
application developers, without requiring them to become database
experts.  It is a classic C-library style <i>toolkit</i>, providing
a broad base of functionality to application writers.  Berkeley DB was designed
by programmers, for programmers: its modular design surfaces simple,
orthogonal interfaces to core services, and it provides mechanism (for
example, good thread support) without imposing policy (for example, the
use of threads is not required).  Just as importantly, Berkeley DB allows
developers to balance performance against the need for crash recovery
and concurrent use.  An application can use the storage structure that
provides the fastest access to its data and can request only the degree
of logging and locking that it needs.</p>
<p>Because of the tool-based approach and separate interfaces for each
Berkeley DB subsystem, you can support a complete transaction environment for
other system operations. Berkeley DB even allows you to wrap transactions
around the standard UNIX file read and write operations!  Further, Berkeley DB
was designed to interact correctly with the native system's toolset, a
feature no other database package offers.  For example, Berkeley DB supports
hot backups (database backups while the database is in use), using
standard UNIX system utilities, for example, dump, tar, cpio, pax or
even cp.</p>
<p>Finally, because scripting language interfaces are available for Berkeley DB
(notably Tcl and Perl), application writers can build incredibly powerful
database engines with little effort.  You can build transaction-protected
database applications using your favorite scripting languages, an
increasingly important feature in a world using CGI scripts to deliver
HTML.</p>
<table width="100%"><tr><td><br></td><td align=right><a href="../intro/terrain.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../intro/dbisnot.html"><img src="../../images/next.gif" alt="Next"></a>
</td></tr></table>
<p><font size=1>Copyright (c) 1996,2008 Oracle.  All rights reserved.</font>
</body>
</html>