<chapter id="model"> <title>Model — The versioning model used by Subversion</title> <simplesect> <para>This chapter explains the user's view of Subversion — what “objects” you interact with, how they behave, and how they relate to each other.</para> </simplesect> <sect1 id="model.wc-and-repos"> <title>Working Directories and Repositories</title> <para>Suppose you are using Subversion to manage a software project. There are two things you will interact with: your working directory, and the repository.</para> <para>Your <firstterm>working directory</firstterm> is an ordinary directory tree, on your local system, containing your project's sources. You can edit these files and compile your program from them in the usual way. Your working directory is your own private work area: Subversion never changes the files in your working directory, or publishes the changes you make there, until you explicitly tell it to do so.</para> <para>After you've made some changes to the files in your working directory, and verified that they work properly, Subversion provides commands to publish your changes to the other people working with you on your project. If they publish their own changes, Subversion provides commands to incorporate those changes into your working directory.</para> <para>A working directory contains some extra files, created and maintained by Subversion, to help it carry out these commands. In particular, these files help Subversion recognize which files contain unpublished changes, and which files are out-of-date with respect to others' work.</para> <para>While your working directory is for your use alone, the <firstterm>repository</firstterm> is the common public record you share with everyone else working on the project. To publish your changes, you use Subversion to put them in the repository. (What this means, exactly, we explain below.) Once your changes are in the repository, others can tell Subversion to incorporate your changes into their working directories. In a collaborative environment like this, each user will typically have their own working directory (or perhaps more than one), and all the working directories will be backed by a single repository, shared amongst all the users.</para> <para>A Subversion repository holds a single directory tree, and records the history of changes to that tree. The repository retains enough information to recreate any prior state of the tree, compute the differences between any two prior trees, and report the relations between files in the tree — which files are derived from which other files.</para> <para>A Subversion repository can hold the source code for several projects; usually, each project is a subdirectory in the tree. In this arrangement, a working directory will usually correspond to a particular subtree of the repository.</para> <para>For example, suppose you have a repository laid out like this:</para> <programlisting> /trunk/paint/Makefile canvas.c brush.c write/Makefile document.c search.c </programlisting> <para>In other words, the repository's root directory has a single subdirectory named <filename>trunk</filename>, which itself contains two subdirectories: <filename>paint</filename> and <filename>write</filename>.</para> <para>To get a working directory, you must <firstterm>check out</firstterm> some subtree of the repository. If you check out <filename>/trunk/write</filename>, you will get a working directory like this:</para> <programlisting> write/Makefile document.c search.c .svn/ </programlisting> <para>This working directory is a copy of the repository's <filename>/trunk/write</filename> directory, with one additional entry — <filename>.svn</filename> — which holds the extra information needed by Subversion, as mentioned above.</para> <para>Suppose you make changes to <filename>search.c</filename>. Since the <filename>.svn</filename> directory remembers the file's modification date and original contents, Subversion can tell that you've changed the file. However, Subversion does not make your changes public until you explicitly tell it to.</para> <para>To publish your changes, you can use Subversion's ‘<literal>commit</literal>’ command:</para> <programlisting> $ pwd /home/jimb/write $ ls -a .svn/ Makefile document.c search.c $ svn commit search.c $ </programlisting> <para>Now your changes to <filename>search.c</filename> have been committed to the repository; if another user checks out a working copy of <filename>/trunk/write</filename>, they will see your text.</para> <para>Suppose you have a collaborator, Felix, who checked out a working directory of <filename>/trunk/write</filename> at the same time you did. When you commit your change to <filename>search.c</filename>, Felix's working copy is left unchanged; Subversion only modifies working directories at the user's request.</para> <para>To bring his working directory up to date, Felix can use the Subversion ‘<literal>update</literal>’ command. This will incorporate your changes into his working directory, as well as any others that have been committed since he checked it out.</para> <programlisting> $ pwd /home/felix/write $ ls -a .svn/ Makefile document.c search.c $ svn update U search.c $ </programlisting> <para>The output from the ‘<literal>svn update</literal>’ command indicates that Subversion updated the contents of <filename>search.c</filename>. Note that Felix didn't need to specify which files to update; Subversion uses the information in the <filename>.svn</filename> directory, and further information in the repository, to decide which files need to be brought up to date.</para> <para>We explain below what happens when both you and Felix make changes to the same file.</para> </sect1> <sect1 id="model.txns-and-revnums"> <title>Transactions and Revision Numbers</title> <para>A Subversion ‘<literal>commit</literal>’ operation can publish changes to any number of files and directories as a single atomic transaction. In your working directory, you can change files' contents, create, delete, rename and copy files and directories, and then commit the completed set of changes as a unit.</para> <para>In the repository, each commit is treated as an atomic transaction: either all the commit's changes take place, or none of them take place. Subversion tries to retain this atomicity in the face of program crashes, system crashes, network problems, and other users' actions. We may call a commit a <firstterm>transaction</firstterm> when we want to emphasize its indivisible nature.</para> <para>Each time the repository accepts a transaction, this creates a new state of the tree, called a <firstterm>revision</firstterm>. Each revision is assigned a unique natural number, one greater than the number of the previous revision. The initial revision of a freshly created repository is numbered zero, and consists of an empty root directory.</para> <para>Since each transaction creates a new revision, with its own number, we can also use these numbers to refer to transactions; transaction <replaceable>n</replaceable> is the transaction which created revision <replaceable>n</replaceable>. There is no transaction numbered zero.</para> <para>Unlike those of many other systems, Subversion's revision numbers apply to an entire tree, not individual files. Each revision number selects an entire tree.</para> <para>It's important to note that working directories do not always correspond to any single revision in the repository; they may contain files from several different revisions. For example, suppose you check out a working directory from a repository whose most recent revision is 4:</para> <programlisting> write/Makefile:4 document.c:4 search.c:4 </programlisting> <para>At the moment, this working directory corresponds exactly to revision 4 in the repository. However, suppose you make a change to <filename>search.c</filename>, and commit that change. Assuming no other commits have taken place, your commit will create revision 5 of the repository, and your working directory will look like this:</para> <programlisting> write/Makefile:4 document.c:4 search.c:5 </programlisting> <para>Suppose that, at this point, Felix commits a change to <filename>document.c</filename>, creating revision 6. If you use ‘<literal>svn update</literal>’ to bring your working directory up to date, then it will look like this:</para> <programlisting> write/Makefile:6 document.c:6 search.c:6 </programlisting> <para>Felix's changes to <filename>document.c</filename> will appear in your working copy of that file, and your change will still be present in <filename>search.c</filename>. In this example, the text of <filename>Makefile</filename> is identical in revisions 4, 5, and 6, but Subversion will mark your working copy with revision 6 to indicate that it is still current. So, after you do a clean update at the root of your working directory, your working directory will generally correspond exactly to some revision in the repository.</para> </sect1> <sect1 id="model.how-wc"> <title>How Working Directories Track the Repository</title> <para>For each file in a working directory, Subversion records two essential pieces of information:</para> <itemizedlist mark="bullet"> <listitem><para>what revision of what repository file your working copy is based on (this is called the file's <firstterm>base revision</firstterm>), and</para></listitem> <listitem><para>a timestamp recording when the local copy was last updated.</para></listitem> </itemizedlist> <para>Given this information, by talking to the repository, Subversion can tell which of the following four states a file is in:</para> <itemizedlist mark="bullet"> <listitem><para><emphasis role="bold">Unchanged, and current.</emphasis> The file is unchanged in the workingdirectory, and no changes to that file have been committed to therepository since its base revision.</para></listitem> <listitem><para><emphasis role="bold">Locally changed, and current</emphasis>. The file has been changed in theworking directory, and no changes to that file have been committed tothe repository since its base revision. There are local changes thathave not been committed to the repository.</para></listitem> <listitem><para><emphasis role="bold">Unchanged, and out-of-date</emphasis>. The file has not been changed in theworking directory, but it has been changed in the repository. The fileshould eventually be updated, to make it current with the publicrevision.</para></listitem> <listitem><para><emphasis role="bold">Locally changed, and out-of-date</emphasis>. The file has been changed bothin the working directory, and in the repository. The file should beupdated; Subversion will attempt to merge the public changes with thelocal changes. If it can't complete the merge in a plausible wayautomatically, Subversion leaves it to the user to resolve the conflict.</para></listitem> </itemizedlist> </sect1> <sect1 id="model.nolock"> <title>Subversion Does Not Lock Files</title> <para>Subversion does not prevent two users from making changes to the same file at the same time. For example, if both you and Felix have checked out working directories of <filename>/trunk/write</filename>, Subversion will allow both of you to change <filename>write/search.c</filename> in your working directories. Then, the following sequence of events will occur:</para> <itemizedlist mark="bullet"> <listitem><para>Suppose Felix tries to commit his changes to <filename>search.c</filename> first. Hiscommit will succeed, and his text will appear in the latest revision inthe repository.</para></listitem> <listitem><para>When you attempt to commit your changes to <filename>search.c</filename>, Subversionwill reject your commit, and tell you that you must update<filename>search.c</filename> before you can commit it.</para></listitem> <listitem><para>When you update <filename>search.c</filename>, Subversion will try to merge Felix'schanges from the repository with your local changes. By default,Subversion merges as if it were applying a patch: if your local changesdo not overlap textually with Felix's, then all is well; otherwise,Subversion leaves it to you to resolve the overlappingchanges. In either case,Subversion carefully preserves a copy of the original pre-merge text.</para></listitem> <listitem><para>Once you have verified that Felix's changes and your changes have beenmerged correctly, you can commit the new revision of <filename>search.c</filename>,which now contains everyone's changes.</para></listitem> </itemizedlist> <para>Some version control systems provide “locks”, which prevent others from changing a file once one person has begun working on it. In our experience, merging is preferable to locks, because:</para> <itemizedlist mark="bullet"> <listitem><para>changes usually do not conflict, so Subversion's behavior does the rightthing by default, while locking can interfere with legitimate work;</para></listitem> <listitem><para>locking can prevent conflicts within a file, but not conflicts betweenfiles (say, between a C header file and another file that includes it),so it doesn't really solve the problem; and finally,</para></listitem> <listitem><para>people often forget that they are holding locks, resulting inunnecessary delays and friction.</para></listitem> </itemizedlist> <para>Of course, the merge process needs to be under the users' control. Patch is not appropriate for files with rigid formats, like images or executables. Subversion allows users to customize its merging behavior on a per-file basis. You can direct Subversion to refuse to merge changes to certain files, and simply present you with the two original texts to choose from. (Or, someday, you can direct Subversion to merge using a tool which respects the semantics of the file format.)</para> </sect1> <sect1 id="model.props"> <title>Properties</title> <para>Files generally have interesting attributes beyond their contents: mime-types, executable permissions, EOL styles, and so on. Subversion attempts to preserve these attributes, or at least record them, when doing so would be meaningful. However, different operating systems support very different sets of file attributes: Windows NT supports access control lists, while Linux provides only the simpler traditional Unix permission bits.</para> <para>In order to interoperate well with clients on many different operating systems, Subversion supports <firstterm>property lists</firstterm>, a simple, general-purpose mechanism which clients can use to store arbitrary out-of-band information about files.</para> <para>A property list is a set of name / value pairs. A property name is an arbitrary text string, expressed as a Unicode UTF-8 string, canonically decomposed and ordered. A property value is an arbitrary string of bytes. Property values may be of any size, but Subversion may not handle very large property values efficiently. No two properties in a given a property list may have the same name. Although the word `list' usually denotes an ordered sequence, there is no fixed order to the properties in a property list; the term `property list' is historical.</para> <para>Each revision number, file, directory, and directory entry in the Subversion repository, has its own property list. Subversion puts these property lists to several uses:</para> <itemizedlist mark="bullet"> <listitem><para>Clients can use properties to store file attributes, as described above.</para></listitem> <listitem><para>The Subversion server uses properties to hold attributes of its own,and allow clients to read and modify them. For example, someday ahypothetical ‘<literal>svn-acl</literal>’ property might hold an access control listwhich the Subversion server uses to regulate access to repositoryfiles.</para></listitem> <listitem><para>Users can invent properties of their own, to store arbitrary informationfor use by scripts, build environments, and so on. Names of userproperties should be URI's, to avoid conflicts between organizations.</para></listitem> </itemizedlist> <para>Property lists are versioned, just like file contents. You can change properties in your working directory, but those changes are not visible in the repository until you commit your local changes. If you do commit a change to a property value, other users will see your change when they update their working directories.</para> </sect1> <sect1 id="model.merging-and-ancestry"> <title>Merging and Ancestry</title> <para>[WARNING: this section was written in May 2000, at the very beginning of the Subversion project. This functionality probably will not exist in Subversion 1.0, but it's planned for post-1.0. The problem should be reasonably solvable by recording merge data in 'properties'.]</para> <para>Subversion defines merges the same way CVS does: to merge means to take a set of previously committed changes and apply them, as a patch, to a working copy. This change can then be committed, like any other change. (In Subversion's case, the patch may include changes to directory trees, not just file contents.)</para> <para>As defined thus far, merging is equivalent to hand-editing the working copy into the same state as would result from the patch application. In fact, in CVS there <emphasis>is</emphasis> no difference – it is equivalent to just editing the files, and there is no record of which ancestors these particular changes came from. Unfortunately, this leads to conflicts when users unintentionally merge the same changes again. (Experienced CVS users avoid this problem by using branch- and merge-point tags, but that involves a lot of unwieldy bookkeeping.)</para> <para>In Subversion, merges are remembered by recording <firstterm>ancestry sets</firstterm>. A revision's ancestry set is the set of all changes "accounted for" in that revision. By maintaining ancestry sets, and consulting them when doing merges, Subversion can detect when it would apply the same patch twice, and spare users much bookkeeping. Ancestry sets are stored as properties.</para> <para>In the examples below, bear in mind that revision numbers usually refer to changes, rather than the full contents of that revision. For example, "the change A:4" means "the delta that resulted in A:4", not "the full contents of A:4".</para> <para>The simplest ancestor sets are associated with linear histories. For example, here's the history of a file A:</para> <programlisting><![CDATA[ _____ _____ _____ _____ _____ | | | | | | | | | | | A:1 |----->| A:2 |----->| A:3 |----->| A:4 |----->| A:5 | |_____| |_____| |_____| |_____| |_____| ]]></programlisting> <para>The ancestor set of A:5 is:</para> <programlisting> { A:1, A:2, A:3, A:4, A:5 } </programlisting> <para>That is, it includes the change that brought A from nothing to A:1, the change from A:1 to A:2, and so on to A:5. From now on, ranges like this will be represented with a more compact notation:</para> <programlisting> { A:1-5 } </programlisting> <para>Now assume there's a branch B based, or "rooted", at A:2. (This postulates an entirely different revision history, of course, and the global revision numbers in the diagrams will change to reflect it.) Here's what the project looks like with the branch:</para> <programlisting><![CDATA[ _____ _____ _____ _____ _____ _____ | | | | | | | | | | | | | A:1 |----->| A:2 |----->| A:4 |----->| A:6 |----->| A:8 |----->| A:9 | |_____| |_____| |_____| |_____| |_____| |_____| \ \ \ _____ _____ _____ \| | | | | | | B:3 |----->| B:5 |----->| B:7 | |_____| |_____| |_____| ]]></programlisting> <para>If we produce A:9 by merging the B branch back into the trunk</para> <programlisting><![CDATA[ _____ _____ _____ _____ _____ _____ | | | | | | | | | | | | | A:1 |----->| A:2 |----->| A:4 |----->| A:6 |----->| A:8 |---.->| A:9 | |_____| |_____| |_____| |_____| |_____| / |_____| \ | \ | \ _____ _____ _____ / \| | | | | | / | B:3 |----->| B:5 |----->| B:7 |--->-' |_____| |_____| |_____| ]]></programlisting> <para>then what will A:9's ancestor set be?</para> <programlisting> { A:1, A:2, A:4, A:6, A:8, A:9, B:3, B:5, B:7} </programlisting> <para>or more compactly:</para> <programlisting> { A:1-9, B:3-7 } </programlisting> <para>(It's all right that each file's ranges seem to include non-changes; this is just a notational convenience, and you can think of the non-changes as either not being included, or being included but being null deltas as far as that file is concerned).</para> <para>All changes along the B line are accounted for (changes B:3-7), and so are all changes along the A line, including both the merge and any non-merge-related edits made before the commit.</para> <para>Although this merge happened to include all the branch changes, that needn't be the case. For example, the next time we merge the B line</para> <programlisting><![CDATA[ _____ _____ _____ _____ _____ _____ _____ | | | | | | | | | | | | | | | A:1 |-->| A:2 |-->| A:4 |-->| A:6 |-->| A:8 |-.->| A:9 |-.->|A:11 | |_____| |_____| |_____| |_____| |_____| | |_____| | |_____| \ / | \ / | \ _____ _____ _____ / _____ | \| | | | | | / | | / | B:3 |-->| B:5 |-->| B:7 |-->|B:10 |->-' |_____| |_____| |_____| |_____| ]]></programlisting> <para>Subversion will know that A's ancestry set already contains B:3-7, so only the difference between B:7 and B:10 will be applied. A's new ancestry will be</para> <programlisting> { A:1-11, B:3-10 } </programlisting> <para>But why limit ourselves to contiguous ranges? An ancestry set is truly a set – it can be any subset of the changes available:</para> <programlisting><![CDATA[ _____ _____ _____ _____ _____ _____ | | | | | | | | | | | | | A:1 |----->| A:2 |----->| A:4 |----->| A:6 |----->| A:8 |--.-->|A:10 | |_____| |_____| |_____| |_____| |_____| / |_____| | / | ______________________.__/ | / | | / | \ __/_ _|__ \ { } { } \ _____ _____ _____ _____ \| | | | | | | | | B:3 |----->| B:5 |----->| B:7 |----->| B:9 |-----> |_____| |_____| |_____| |_____| ]]></programlisting> <para>In this diagram, the change from B:3-5 and the change from B:7-9 are merged into a working copy whose ancestry set (so far) is { A:1-8 } plus any local changes. After committing, A:10's ancestry set is</para> <programlisting> { A:1-10, B:5, B:9 } </programlisting> <para>Clearly, saying "Let's merge branch B into A" is a little ambiguous. It usually means "Merge all the changes accounted for in B's tip into A", but it <emphasis>might</emphasis> mean "Merge the single change that resulted in B's tip into A".</para> <para>Any merge, when viewed in detail, is an application of a particular set of changes – not necessarily adjacent ones – to a working copy. The user-level interface may allow some of these changes to be specified implicitly. For example, many merges involve a single, contiguous range of changes, with one or both ends of the range easily deducible from context (i.e., branch root to branch tip). These inference rules are not specified here, but it should be clear in most contexts how they work.</para> <para>Because each node knows its ancestors, Subversion never merges the same change twice (unless you force it to). For example, if after the above merge, you tell Subversion to merge all B changes into A, Subversion will notice that two of them have already been merged, and so merge only the other two changes, resulting in a final ancestry set of:</para> <programlisting> { A:1-10, B:3-9 } </programlisting> <!-- Heh, what about this: B:3 adds line 3, with the text "foo". B:5 deletes line 3. B:7 adds line 3, with the text "foo". B:9 deletes line 3. The user first merges B:5 and B:9 into A. If A had that line, it goes away now, nothing more. Next, user merges B:3 and B:7 into A. The second merge must conflict. I'm not sure we need to care about this, I just thought I'd note how even merges that seem like they ought to be easily composable can still suck. :-) --> <para>This description of merging and ancestry applies to both intra- and inter-repository merges. However, inter-repository merging will probably not be implemented until a future release of Subversion.</para> </sect1> </chapter>