Server — How the server works

Server — How the server works The term “server” is ambiguous, because it has at least two different meanings: it can refer to a powerful computer which offers services to users on a network, or it can refer to a CPU process designed to receive network requests. In Subversion, however, the server is just a set of libraries that implements repositories and makes them available to other programs. No networking is required. There are two main libraries: the Subversion Filesystem library, and the Subversion Repository library. Filesystem Filesystem Overview Requires: some writable disk space (for now) Berkeley DB library Provides: a repository for storing files concurrent client transactions enforcement of user & group permissions [someday, not yet] This library implements a hierarchical filesystem which supports atomic changes to directory trees, and records a complete history of the changes. In addition to recording changes to file and directory contents, the Subversion Filesystem records changes to file meta-data (see discussion of properties in ). API There are two main files that describe the Subversion filesystem. First, read the section below () for a general overview of how the filesystem works. Once you've done this, read Jim Blandy's own structural overview, which explains how nodes and revisions are organized (among other things) in the filesystem implementation: subversion/libsvn_fs/structure. Finally, read the well-documented API in subversion/include/svn_fs.h. Repository Structure Schema To begin, please be sure that you're already casually familiar with Subversion's ideas of files, directories, and revision histories. If not, see . We can now offer precise, technical descriptions of the terms introduced there. A text string is a string of Unicode characters which is canonically decomposed and ordered, according to the rules described in the Unicode standard. A string of bytes is what you'd expect. A property list is an unordered list of properties. A property is a pair (name, value), where name is a text string, and value is a string of bytes. No two properties in a property list have the same name. A file is a property list and a string of bytes. A node is either a file or a directory. (We define a directory below.) Nodes are distinguished unions — you can always tell whether a node is a file or a directory. A node table is an array mapping some set of positive integers, called node numbers, onto nodes. If a node table maps some number i to some node n, then i is a valid node number in that table, and node iis n. Otherwise, i is an invalid node number in that table. A directory entry is a triple (name, props, node), where name is a text string, props is a property list, and node is a node number. A directory is an unordered list of directory entries, and a property list. A revision is a node number and a property list. A history is an array of revisions, indexed by a contiguous range of non-negative integers containing 0. A repository consists of node table and a history. Now that we've explained the form of the data, we make some restrictions on that form. Every revision has a root directory. Every revision's node number is a valid node number, and the node it refers to is always a directory. We call this the revision's root directory. Revision 0 always contains an empty root directory. This baseline makes it easy to check out whole projects from the repository. Directories contain only valid links. Every directory entry's node is a valid node number. Directory entries can be identified by name. For any directory d, every directory entry in d has a distinct name. There are no cycles of directories. No node is its own child. Directories can have more than one parent. The Unix file system does not allow more than one hard link to a directory, but Subversion does allow the analogous situation. Thus, the directories in a Subversion repository form a directed acyclic graph (DAG), not a tree. However, it would be distracting and unhelpful to replace the familiar term “directory tree” with the unfamiliar term “directory DAG”, so we still call it a “directory tree” here. There are no dead nodes. Every node is a child of some revision's root directory. Bubble-Up Method This section provides a conversational explanation of how the repository actually stores and revisions file trees. It's not critical knowledge for a programmer using the Subversion Filesystem API, but most people probably still want to know what's going on “under the hood” of the repository. Suppose we have a new project, at revision 1, looking like this (using CVS syntax): prompt$ svn checkout myproj U myproj/ U myproj/B U myproj/A U myproj/A/fish U myproj/A/fish/tuna prompt$ Only the file tuna is a regular file, everything else in myproj is a directory. Let's see what this looks like as an abstract data structure in the repository, and how that structure works in various operations (such as update, commit, and branch). In the diagrams that follow, lines represent parent-to-child connections in a directory hierarchy. Boxes are "nodes". A node is either a file or a directory – a letter in the upper left indicates which kind. A file node has a byte-string for its content, whereas directory nodes have a list of dir_entries, each pointing to another node. Parent-child links go both ways (i.e., a child knows who all its parents are), but a node's name is stored only in its parent, because a node with multiple parents may have different names in different parents. At the top of the repository is an array of revision numbers, stretching off to infinity. Since the project is at revision 1, only index 1 points to anything; it points to the root node of revision 1 of the project: ( myproj's revision array ) ______________________________________________________ |___1_______2________3________4________5_________6_____... | | ___|_____ |D | | | | A | /* Two dir_entries, `A' and `B'. */ | \ | | B \ | |__/___\__| / \ | \ | \ ___|___ ___\____ |D | |D | | | | | | | | fish | /* One dir_entry, `fish'. */ |_______| |___\____| \ \ ___\____ |D | | | | tuna | /* One dir_entry, `tuna'. */ |___\____| \ \ ___\____ |F | | | | | /* (Contents of tuna not shown.) */ |________| What happens when we modify tuna and commit? First, we make a new tuna node, containing the latest text. The new node is not connected to anything yet, it's just hanging out there in space: ________ |F | | | | | |________| Next, we create a new revision of its parent directory: ________ |D | | | | tuna | |___\____| \ \ ___\____ |F | | | | | |________| We continue up the line, creating a new revision of the next parent directory: ________ |D | | | | fish | |___\____| \ \ ___\____ |D | | | | tuna | |___\____| \ \ ___\____ |F | | | | | |________| Now it gets more tricky: we need to create a new revision of the root directory. This new root directory needs an entry to point to the “new” directory A, but directory B hasn't changed at all. Therefore, our new root directory also has an entry that still points to the old directory B node! ______________________________________________________ |___1_______2________3________4________5_________6_____... | | ___|_____ ________ |D | |D | | | | | | A | | A | | \ | | \ | | B \ | | B \ | |__/___\__| |__/___\_| / \ / \ | ___\_____________/ \ | / \ \ ___|__/ ___\____ ___\____ |D | |D | |D | | | | | | | | | | fish | | fish | |_______| |___\____| |___\____| \ \ \ \ ___\____ ___\____ |D | |D | | | | | | tuna | | tuna | |___\____| |___\____| \ \ \ \ ___\____ ___\____ |F | |F | | | | | | | | | |________| |________| Finally, after all our new nodes are written, we finish the “bubble up” process by linking this new tree to the next available revision in the history array. In this case, the new tree becomes revision 2 in the repository. ______________________________________________________ |___1_______2________3________4________5_________6_____... | \ | \__________ ___|_____ __\_____ |D | |D | | | | | | A | | A | | \ | | \ | | B \ | | B \ | |__/___\__| |__/___\_| / \ / \ | ___\_____________/ \ | / \ \ ___|__/ ___\____ ___\____ |D | |D | |D | | | | | | | | | | fish | | fish | |_______| |___\____| |___\____| \ \ \ \ ___\____ ___\____ |D | |D | | | | | | tuna | | tuna | |___\____| |___\____| \ \ \ \ ___\____ ___\____ |F | |F | | | | | | | | | |________| |________| Generalizing on this example, you can now see that each “revision” in the repository history represents a root node of a unique tree (and an atomic commit to the whole filesystem.) There are many trees in the repository, and many of them share nodes. Many nice behaviors come from this model: Easy reads. If a filesystem reader wants to locate revision X of file foo.c, it need only traverse the repository's history, locate revision X's root node, then walk down the tree to foo.c. Writers don't interfere with readers. Writers can continue to create new nodes, bubbling their way up to the top, and concurrent readers cannot see the work in progress. The new tree only becomes visible to readers after the writer makes its final “link” to the repository's history. File structure is versioned. Unlike CVS, the very structure of each tree is being saved from revision to revision. File and directory renames, additions, and deletions are part of the repository's history. Let's demonstrate the last point by renaming the tuna to book. We start by creating a new parent “fish” directory, except that this parent directory has a different dir_entry, one which points the same old file node, but has a different name: ______________________________________________________ |___1_______2________3________4________5_________6_____... | \ | \__________ ___|_____ __\_____ |D | |D | | | | | | A | | A | | \ | | \ | | B \ | | B \ | |__/___\__| |__/___\_| / \ / \ | ___\_____________/ \ | / \ \ ___|__/ ___\____ ___\____ |D | |D | |D | | | | | | | | | | fish | | fish | |_______| |___\____| |___\____| \ \ \ \ ___\____ ___\____ ________ |D | |D | |D | | | | | | | | tuna | | tuna | | book | |___\____| |___\____| |_/______| \ \ / \ \ / ___\____ ___\____ / |F | |F | | | | | | | | | |________| |________| From here, we finish with the bubble-up process. We make new parent directories up to the top, culminating in a new root directory with two dir_entries (one points to the old “B” directory node we've had all along, the other to the new revision of “A”), and finally link the new tree to the history as revision 3: ______________________________________________________ |___1_______2________3________4________5_________6_____... | \ \_________________ | \__________ \ ___|_____ __\_____ __\_____ |D | |D | |D | | | | | | | | A | | A | | A | | \ | | \ | | \ | | B \ | | B \ | | B \ | |__/___\__| |__/___\_| |__/___\_| / ___________________/_____\_________/ \ | / ___\_____________/ \ \ | / / \ \ \ ___|/_/ ___\____ ___\____ _____\__ |D | |D | |D | |D | | | | | | | | | | | | fish | | fish | | fish | |_______| |___\____| |___\____| |___\____| \ \ \ \ \ \ ___\____ ___\____ ___\____ |D | |D | |D | | | | | | | | tuna | | tuna | | book | |___\____| |___\____| |_/______| \ \ / \ \ / ___\____ ___\____ / |F | |F | | | | | | | | | |________| |________| For our last example, we'll demonstrate the way “tags” and “branches” are implemented in the repository. In a nutshell, they're one and the same thing. Because nodes are so easily shared, we simply create a new directory entry that points to an existing directory node. It's an extremely cheap way of copying a tree; we call this new entry a clone, or more colloquially, a “cheap copy”. Let's go back to our original tree, assuming that we're at revision 6 to begin with: ______________________________________________________ ...___6_______7________8________9________10_________11_____... | | ___|_____ |D | | | | A | | \ | | B \ | |__/___\__| / \ | \ | \ ___|___ ___\____ |D | |D | | | | | | | | fish | |_______| |___\____| \ \ ___\____ |D | | | | tuna | |___\____| \ \ ___\____ |F | | | | | |________| Let's “tag” directory A. To make the clone, we create a new dir_entry T in our root, pointing to A's node: ______________________________________________________ |___6_______7________8________9________10_________11_____... | \ | \ ___|_____ __\______ |D | |D | | | | | | A | | A | | \ | | | | | B \ | | B | T | |__/___\__| |_/__|__|_| / \ / | | | ___\__/ / / | / \ / / ___|__/ ___\__/_ / |D | |D | | | | | | | | fish | |_______| |___\____| \ \ ___\____ |D | | | | tuna | |___\____| \ \ ___\____ |F | | | | | |________| Now we're all set. In the future, the contents of directories A and B may change quite a lot. However, assuming we never make any changes to directory T, it will always point to a particular pristine revision of directory A at some point in time. Thus, T is a tag. (In theory, we can use some kind of authorization system to prevent anyone from writing to directory T. In practice, a well-laid out repository should encourage “tag directories” to live in one place, so that it's clear to all users that they're not meant to change.) However, if we do decide to allow commits in directory T, and now our repository tree increments to revision 8, then T becomes a branch. Specifically, it's a branch of directory A which shares history with A up to a certain point, and then “broke off” from the main line at revision 8. Diffy Storage You may have been thinking, “Gee, this bubble up method seems nice, but it sure wastes a lot of space. Every commit to the repository creates an entire line of new directory nodes!” Like many other revision control systems, Subversion stores changes as differences. It doesn't make complete copies of nodes; instead, it stores the latest revision as a full text, and previous revisions as a succession of reverse diffs (the word "diff" is used loosely here – for files, it means vdeltas, for directories, it means a format that expresses changes to directories). Implementation For the initial release of Subversion, The filesystem will be implemented as a library on Unix. The filesystem's data will probably be stored in a collection of .db files, using the Berkeley Database library. In the future, ofcourse, contributors are free modify the Subversion filesystem tooperate with more powerful SQL database. (For more information, see Sleepycat Software.) Repository Library A subversion repository is a directory that contains a number of components: a versioned filesystem (typically a collection of .db files) some hook scripts (for executing before or after commits) a locking area (used by Berkeley DB or other processes) a configuration area (for changing global behaviors) The Subversion filesystem is just that: a filesystem. But it's also useful to provide an API that acts at the level of the repository. The repository library (libsvn_repos) does this. In particular, it wraps a few libsvn_fs routines, such as those for beginning and ending commits, so that hook-scripts can run. A pre-commit-hook script might check for a valid log message, and a post-commit-hook script might send an email to a mailing list. Additionally, the repository library provides convenience routines for examining and manipulating the filesystem. For example, a routine to generate a tree-delta by comparing two revisions, routines for constructing new transactions, routines for querying log messages, and routines for exporting and importing filesystem data.