The diff API is for programs that compare two sets of files (e.g. two trees, one tree and the index) and present the found difference in various ways. The calling program is responsible for feeding the API pairs of files, one from the "old" set and the corresponding one from "new" set, that are different. The library called through this API is called diffcore, and is responsible for two things.

  • finding total rewrites (-B), renames (-M) and copies (-C), and changes that touch a string (-S), as specified by the caller.

  • outputting the differences in various formats, as specified by the caller.

Calling sequence

  • Prepare struct diff_options to record the set of diff options, and then call diff_setup() to initialize this structure. This sets up the vanilla default.

  • Fill in the options structure to specify desired output format, rename detection, etc. diff_opt_parse() can be used to parse options given from the command line in a way consistent with existing git-diff family of programs.

  • Call diff_setup_done(); this inspects the options set up so far for internal consistency and make necessary tweaking to it (e.g. if textual patch output was asked, recursive behaviour is turned on).

  • As you find different pairs of files, call diff_change() to feed modified files, diff_addremove() to feed created or deleted files, or diff_unmerge() to feed a file whose state is unmerged to the API. These are thin wrappers to a lower-level diff_queue() function that is flexible enough to record any of these kinds of changes.

  • Once you finish feeding the pairs of files, call diffcore_std(). This will tell the diffcore library to go ahead and do its work.

  • Calling diff_flush() will produce the output.

Data structures

  • struct diff_filespec

This is the internal representation for a single file (blob). It records the blob object name (if known — for a work tree file it typically is a NUL SHA-1), filemode and pathname. This is what the diff_addremove(), diff_change() and diff_unmerge() synthesize and feed diff_queue() function with.

  • struct diff_filepair

This records a pair of struct diff_filespec; the filespec for a file in the "old" set (i.e. preimage) is called one, and the filespec for a file in the "new" set (i.e. postimage) is called two. A change that represents file creation has NULL in one, and file deletion has NULL in two.

A filepair starts pointing at one and two that are from the same filename, but diffcore_std() can break pairs and match component filespecs with other filespecs from a different filepair to form new filepair. This is called rename detection.

  • struct diff_queue

This is a collection of filepairs. Notable members are:

queue

An array of pointers to struct diff_filepair. This dynamically grows as you add filepairs;

alloc

The allocated size of the queue array;

nr

The number of elements in the queue array.

  • struct diff_options

This describes the set of options the calling program wants to affect the operation of diffcore library with.

Notable members are:

output_format

The output format used when diff_flush() is run.

context

Number of context lines to generate in patch output.

break_opt, detect_rename, rename-score, rename_limit

Affects the way detection logic for complete rewrites, renames and copies.

abbrev

Number of hexdigits to abbreviate raw format output to.

pickaxe

A constant string (can and typically does contain newlines to look for a block of text, not just a single line) to filter out the filepairs that do not change the number of strings contained in its preimage and postimage of the diff_queue.

flags

This is mostly a collection of boolean options that affects the operation, but some do not have anything to do with the diffcore library.

BINARY, TEXT

Affects the way how a file that is seemingly binary is treated.

FULL_INDEX

Tells the patch output format not to use abbreviated object names on the "index" lines.

FIND_COPIES_HARDER

Tells the diffcore library that the caller is feeding unchanged filepairs to allow copies from unmodified files be detected.

COLOR_DIFF

Output should be colored.

COLOR_DIFF_WORDS

Output is a colored word-diff.

NO_INDEX

Tells diff-files that the input is not tracked files but files in random locations on the filesystem.

ALLOW_EXTERNAL

Tells output routine that it is Ok to call user specified patch output routine. Plumbing disables this to ensure stable output.

QUIET

Do not show any output.

REVERSE_DIFF

Tells the library that the calling program is feeding the filepairs reversed; one is two, and two is one.

EXIT_WITH_STATUS

For communication between the calling program and the options parser; tell the calling program to signal the presence of difference using program exit code.

HAS_CHANGES

Internal; used for optimization to see if there is any change.

SILENT_ON_REMOVE

Affects if diff-files shows removed files.

RECURSIVE, TREE_IN_RECURSIVE

Tells if tree traversal done by tree-diff should recursively descend into a tree object pair that are different in preimage and postimage set.

(JC)