Berkeley DB 2.5.9 Change Log

Interface Additions in Berkeley DB 2.5.9:

  1. A new flag is available in this release for the Berkeley DB cursor key/data retrieval interface: DB_NEXT_DUP. This flag causes the DBcursor->c_get routine to return the next duplicate in a list of duplicates, and DB_NOTFOUND if there are no additional duplicates to return.

  2. A new flag is available in this release for the Berkeley DB cursor key/data retrieval interfaces: DB_GET_BOTH. This flag causes the DB->get and DBcursor->c_get routines to return success only if both the specified key and data items match the entry in the database.

  3. A new flag is available in this release for the Berkeley DB key/data retrieval interfaces: DB_RMW. This flag causes the DB->get and DBcursor->c_get routines to acquire write locks instead of read locks when doing the retrieval. Setting this flag may decrease the likelihood of deadlock during a read-modify-write cycle by immediately acquiring the write lock during the read part of the cycle, so that another thread of control acquiring a read lock for the same item, in its own read-modify-write cycle, will not result in deadlock.

  4. A new flag is available in this release: DB_DUPSORT. This flag causes duplicate records to be maintained in sorted order. By default, the sort order is the same default lexical sort used by the Btree access method. A new field, in the DB_INFO structure passed to the db_open routine, is available in this release as well: dup_compare. This field is a sort function that is optionally used to sort duplicate data items. It is intended to allow applications to maintain duplicates in a non-standard sort order.

  5. A new interface is available in this release, DB->join. This interface takes a set of Berkeley DB cursors as arguments, and returns a specialized Berkeley DB cursor whose get function performs a database join on the records referenced by the set of cursors.

  6. A new field is available in this release, set in the Berkeley DB structure returned by db_open, DB->byteswapped. This field is set if the underlying database was not in the native host byte order, and can be used by the application to determine if its stored data will require host-order cleanups before use.

  7. The dbmclose() interface has been added to the Berkeley DB dbm/ndbm compatibility interface, for application compatibility with the Sun Microsystems Solaris and other dbm/ndbm interfaces.

Interface Changes in Berkeley DB 2.5.9:

  1. Previous Berkeley DB releases have been inconsistent with respect to which DBT structure flags may be specified to which Berkeley DB interfaces. For example, calling DB->put with the DB_DBT_MALLOC flag specified makes no sense, and DB has been inconsistent historically as to whether this was treated as an error or simply ignored.

    As of this release, inappropriate flags in the DBT structure will simply be ignored. This is intended to make it easy to, for example, retrieve a key/data pair and then use the data DBT as the key DBT for another database without having to reinitialize the flags in the DBT.

    Previous Berkeley DB releases also required that threaded applications always set the DB_DBT_MALLOC or DB_DBT_USERMEM flags in DBT structures when retrieving key/data items. As of this release, specifying DB_DBT_MALLOC or DB_DBT_USERMEM is only required when using the non-cursor DB interfaces (for example, DB->get). When using cursor interfaces (for example, DBcursor->c_get), the flags are not required, as memory in which the key/data items are returned is allocated and maintained on a per-cursor basis.

  2. Berkeley DB log files are now named log.XXXXXXXXXX instead of log.XXXXX, in order to ensure that applications do not run out of log filename space.

    This change is transparent to applications, but may NOT be transparent to local shell scripts and utilities.

  3. Previous Berkeley DB releases returned statistics for the Btree databases that were only valid for the lifetime of the handle with which they were requested, that is, these statistics as returned for a particular Berkeley DB handle would only reflect database operations done by that Berkeley DB handle and any cursors associated with it.

    The following statistics have been removed from the returned Btree statistical information: bt_freed, bt_pfxsaved, bt_split, bt_rootsplit, bt_fastsplit, bt_added, bt_deleted, bt_get, bt_cache_hit, bt_cache_miss. If any of these are sufficiently useful to application writers that they should be put back into the system, please let us know.

    This change is NOT transparent to applications.

  4. Previous Berkeley DB releases did not support embedded white space in Berkeley DB environment configuration strings. As of this release, configuration NAME/VALUE strings are still separated by one or more whitespace characters (which are discarded), but the VALUE string may contain embedded whitespace characters and is terminated by trailing whitespace characters and a newline character, both of which are also discarded. In addition, empty lines and lines whose first character is a whitespace or hash (#) character, in the Berkeley DB configuration, file are discarded.

    This change is potentially NOT transparent to applications.

  5. The DB_REGION_INIT flag to the db_value_set interface has been enhanced to write a byte to each page in the region. This allows applications to use DB_REGION_INIT to ensure that there is sufficient disk space for the backing region file.

Berkeley DB Environment failures:

There exists a class of errors that Berkeley DB considers fatal to an entire Berkeley DB environment. An example of this type of error is a log write failure due to the disk being out of free space. The only way to recover from these failures is for the application to exit, run recovery of the Berkeley DB environment, and re-enter DB. (It is not strictly necessary that the application exit, although that is the only way to recover system resources, for example, file descriptors and memory, currently allocated by Berkeley DB.)

In previous Berkeley DB releases, the only way an application could determine that a fatal error had occurred was to monitor Berkeley DB function return values, looking for unexpected ones, such as ENOSPC, or EPERM (which has historically been returned by Berkeley DB to indicate a potential underlying database corruption).

As of this release, we have added a new error return value, DB_RUNRECOVERY. This error can be returned by any Berkeley DB interface. If a fatal error occurs, DB_RUNRECOVERY will then be returned from all subsequent DB calls made by any threads or processes participating in the DB environment.

The EPERM error return no longer has a special meaning in Berkeley DB.

Optionally, applications may also specify a fatal-error callback function by setting the db_paniccall field of the DB_ENV structure before initializing the environment with db_appinit (DbEnv::appinit). This callback function will be called with two arguments: the DB_ENV structure associated with the environment and the errno value associated with the underlying error that caused the problem.

Applications can handle fatal errors in one of two ways: by checking for DB_RUNRECOVERY as part of their normal Berkeley DB error return checking, or, in applications that have no cleanup processing of their own, by simply exiting the application when the callback function is called.

We would be very interested in any comments that you'd care to make on this interface change, in particular, any comments on the sufficiency of the interface for your Berkeley DB application.

This change is NOT transparent to applications.

Documentation Changes:

  1. The Berkeley DB documentation has been completely reworked. It is no longer available in flat text, UNIX roff or PostScript formats, but is now only available in HTML format. To use the Berkeley DB documentation, point your browser to the Berkeley DB distribution or installation directory db-2.5.9/docs/index.html. This release also includes the beginnings of the Berkeley DB Reference Guide, as well as the manual pages.

B+tree Access Method Bug Fixes:

  1. Cursor delete operations were not necessarily being undone after deadlock, potentially leading to incorrect data.

  2. Deleted, off-page duplicate items could be recovered incorrectly, potentially leading to incorrect data.

  3. Log records could be written outside of a transaction under some circumstances, potentially corrupting the log so that recovery would fail.

  4. Completely emptying large trees could cause corruption of the database root page during the final reverse split.

  5. Failure during page split could leave cursors referencing incorrect data.

  6. Retrieving records based on logical record number could return incorrect data if logically adjacent records had previously been deleted.

  7. The Btree access method is more aggressive in this release about discarding locks within transactions that are not needed for correctness. This change significantly decreases the probability of deadlock for some applications.

Hash Access Method Bug Fixes:

  1. Storing duplicate data items using the DB_CURRENT flag could result in incorrect data.

  2. Cursors and their locks did not always return unchanged on operation failure.

  3. Entering a sufficient number of duplicate data items into the database could result in incorrect cursor positioning and/or a corrupted database.

Recno Access Method Bug Fixes:

  1. The logical record number returned from DB_APPEND calls was stored into library memory instead of into the user-specified memory.

  2. The memory in which the backing source filename was stored could be freed multiple times, potentially leading to application core dump.

General Access Method Bug Fixes:

  1. Incorrectly treated setting the database cachesize in the presence of a Berkeley DB environment to be an error, even if the environment didn't not initialize a shared memory buffer pool.

  2. Using the DBcursor->c_get interface with the DB_KEYFIRST or DB_KEYLAST flags to insert a new key into the database would fail.

  3. In previous Berkeley DB releases each cursor operation (when not part of a transaction) potentially used a different locker ID, making it possible for cursor operations to lock against themselves. In the 2.5.9 release, the cursor locker ID is maintained for the life of the cursor, instead.

  4. The optional user-specified transaction recovery function was not being called during Berkeley DB recovery, and when using the db_printlog utility.

  5. Pages of duplicate data items were incorrectly split and logged/recovered, potentially leading to database corruption.

  6. During recovery, deleted database files could cause recovery to fail.

C++ API Changes and Bug Fixes

  1. A DbEnv::version method has been added to allow access to major, minor and patch numbers for the current version.

  2. The DbEnv class has been cleaned up so that inappropriate get and set methods have been removed:

    DbEnv::get_data_cnt
    DbEnv::get_data_dir
    DbEnv::get_data_next
    DbEnv::get_flags
    DbEnv::get_home
    DbEnv::get_log_dir
    DbEnv::get_tmp_dir
    DbEnv::set_data_cnt
    DbEnv::set_data_dir
    DbEnv::set_data_next
    DbEnv::set_flags
    DbEnv::set_home
    DbEnv::set_log_dir
    DbEnv::set_tmp_dir

    These methods are unneeded because the constructor with arguments, or the appinit() method, can be used to set this information.

    DbEnv::get_errcall
    DbEnv::get_errfile
    DbEnv::get_error_model
    DbEnv::get_error_stream
    DbEnv::get_errpfx
    DbEnv::get_lg_max
    DbEnv::get_lk_conflicts
    DbEnv::get_lk_detect
    DbEnv::get_lk_max
    DbEnv::get_lk_modes
    DbEnv::get_lorder
    DbEnv::get_mp_mmapsize
    DbEnv::get_mp_size
    DbEnv::get_tx_max
    DbEnv::get_tx_recover
    DbEnv::get_verbose

    These get methods accessed information that was never set by Berkeley DB.

  3. Remaining DbEnv::set_* methods may throw an exception if they are called after the environment has been initialized (either via appinit or the constructor with arguments).

  4. The DbInfo class has been reworked so that inappropriate get methods have been removed. These get methods accessed information that was never set by Berkeley DB.

    DbInfo::get_bt_compare
    DbInfo::get_bt_maxkey
    DbInfo::get_bt_minkey
    DbInfo::get_bt_prefix
    DbInfo::get_cachesize
    DbInfo::get_flags
    DbInfo::get_h_ffactor
    DbInfo::get_h_hash
    DbInfo::get_h_nelem
    DbInfo::get_lorder
    DbInfo::get_malloc
    DbInfo::get_pagesize
    DbInfo::get_re_delim
    DbInfo::get_re_len
    DbInfo::get_re_pad
    DbInfo::get_re_source

    Methods to get and set underlying lock identifiers in a DbLock have been removed, as lock identifiers should be completely opaque to the application.

Java API Changes and Bug Fixes

  1. The DB_SET_RANGE flag did not correctly return data items.

  2. Db.stat() is now declared to return an Object. The object returned is of type DbBtreeStat if the file was created using Db.DB_BTREE. In the future, this will return other types, for example, DbHashStat.

    The DbBtreeStat, DbLockStat, DbMpoolFStat, DbMpoolStat and DbTxnStat classes have been changed to allow direct access to their data members. DbLogStat is a new class.

  3. The DbEnv class has been reworked and all inappropriate get and set methods have been removed:

    DbEnv.get_data_cnt
    DbEnv.get_data_next
    DbEnv.get_flags
    DbEnv.get_home
    DbEnv.get_log_dir
    DbEnv.get_tmp_dir
    DbEnv.set_data_cnt
    DbEnv.set_data_next
    DbEnv.set_flags
    DbEnv.set_home
    DbEnv.set_log_dir
    DbEnv.set_tmp_dir

    These methods are unneeded because the constructor with arguments, or the appinit() method, can be used to set this information.

    DbEnv.get_errcall
    DbEnv.get_errpfx
    DbEnv.get_lg_max
    DbEnv.get_lk_conflicts
    DbEnv.get_lk_detect
    DbEnv.get_lk_max
    DbEnv.get_lk_modes
    DbEnv.get_lorder
    DbEnv.get_mp_mmapsize
    DbEnv.get_mp_size
    DbEnv.get_tx_max
    DbEnv.get_verbose

    These get methods used to access information that was never set by Berkeley DB.

    The DbEnv.get_java_version_string method has been removed, and the Java part of Berkeley DB no longer maintains its own version information.

  4. Remaining DbEnv.set_* methods may throw a DbException if they are called after the environment has been initialized (either via appinit or the constructor with arguments).

  5. The DbInfo class has been reworked so that inappropriate get methods have been removed. These get methods used to access information that was never set by Berkeley DB.

    DbInfo.get_bt_maxkey
    DbInfo.get_bt_minkey
    DbInfo.get_cachesize
    DbInfo.get_flags
    DbInfo.get_h_ffactor
    DbInfo.get_h_hash
    DbInfo.get_h_nelem
    DbInfo.get_lorder
    DbInfo.get_pagesize
    DbInfo.get_re_delim
    DbInfo.get_re_len
    DbInfo.get_re_pad
    DbInfo.get_re_source

  6. Methods to get and set underlying lock identifiers in a DbLock have been removed, as lock identifiers should be completely opaque to the application.

  7. The DbRunRecoveryException class has been added as a subclass of DbException. A DbRunRecoveryException object will be thrown when a fatal error occurs in Berkeley DB, requiring recovery to be performed.

Shared Memory Buffer Pool Subsystem Bug Fixes:

  1. It was possible for threads opening and closing databases in a fairly full buffer cache to free memory that was still in use, resulting in application failure.

  2. If the memp_trickle() interface was unable to find a single buffer to flush in the entire buffer list, it would return with the shared memory region mutex locked.

  3. Opening underlying files of certain sizes in the buffer pool would incorrectly fail.

Locking Subsystem Bug Fixes:

  1. If the a locker being forced to wait does not currently hold any locks, the deadlock detector is no longer run.

Logging Subsystem Bug Fixes:

  1. If the first log get call after database recovery used the DB_NEXT flag, it would fail.

  2. If databases were opened multiple times without intervening closes, recovery could fail.

  3. A memory leak in the log_archive interface has been fixed.

Additional Bug Fixes:

  1. Windows/NT: using shared anonymous memory did not work correctly between processes sharing the database.

  2. Utilities: Signal handling and Berkeley DB region exit was incorrect. Among other issues, the db_load utility could exit holding a region mutex.

  3. Dbm/Ndbm: System error values were not always being correctly returned.

System Porting and Build Procedure Changes:

  1. A alpha-release port to VMS has been added to the Berkeley DB distribution. The port has not yet run the Berkeley DB test suite, but there are no known problems.

  2. Berkeley DB now uses the pstat_getdynamic(2) interface on Hewlett-Packard HP/UX systems to detect the presence of multiple processors.

  3. For performance reasons, the Berkeley DB release now uses the Sun Microsystems Solaris pread(2) and pwrite(2) UNIX interfaces, if they are available.

  4. Berkeley DB now compiles with the -D_THREAD_SAFE C preprocessor flag and loads with the libc_r.a C library by default on FreeBSD systems.

  5. For portability reasons, shared memory segments allocated using the UNIX shmget(2) function are now allocated as IPC_PRIVATE. (Apparently, marking them as IPC_PRIVATE does not affect that they are available to other processes.)

  6. The standard UNIX install for the Berkeley DB library now installs Berkeley DB into its own hierarchy instead of into separate local directories. By default, the install locations are:

    Location Contents
    /usr/local/BerkeleyDB/bin binaries
    /usr/local/BerkeleyDB/include include files
    /usr/local/BerkeleyDB/lib libraries
    /usr/local/BerkeleyDB/docs HTML documentation

  7. For portability reasons, the standard UNIX Berkeley DB library archive is built with the -cr options in this release, instead of the -cq options as done previously.

  8. The standard UNIX Berkeley DB configuration will now automatically detect and use gcc if no compiler named cc is found.

  9. When the C pre-processor DIAGNOSTIC value is #defined, memory is overwritten with a 0xdb pattern instead of a 0xff pattern.

Additional Changes: db_dump

  1. The db_dump utility has a new option, -N. This option allows db_dump to be run without acquiring any shared region mutexes. This option is intended for debugging use only.

  2. The db_dump utility now allows a Berkeley DB environment directory to be specified (the -h option) at the same time as the "debugging output" option (the -d option).

  3. The db_dump utility now uses the shared memory buffer pool region if a Berkeley DB environment directory is specified, which allows users to see the current state of the database instead of only the database state that has already been flushed to disk.