
them to the repository (with the commit command). Expla-
nations of Git use the term “tracked files” to refer to files
that have been added. This confuses novices, since such files
are tracked only in the sense that the status command will
notice that changes to them have not been committed. Con-
trary to initial expectation, if a tracked file is updated, a sub-
sequent commit will save the older version of the file (repre-
senting its state the last time the add command was called),
and not the latest version.
The situation is made more complicated by the fact that
tracked files may not have corresponding versions in the
staging area. Following a commit, a file that had been pre-
viously added remains tracked, but the version in the stag-
ing area is removed. The term “staged file” often used inter-
changeably with “tracked file” is thus subtly different: in this
case, we have a file that is tracked but no longer staged.
Files that are not tracked are not included on a commit.
Separately, a file may be marked as “assumed unchanged.”
Such a file behaves for the most part like an untracked file,
but will not even be recognized by the add command; to
make it tracked again this marking has to be removed. Fi-
nally, a set of files (given implicitly by a path-specifier in
a special file) may be designated as “ignored.” This feature
enables the user to prevent files from being committed by
naming them before they even exist, and is used, for exam-
ple, to prevent the committing of non-source files.
At any one time, the user is working in a particular branch
of development. Switching to another branch enables the
user to put aside one development task and work on another
(for example, to pursue the implementation of a particular
feature, or fix a particular bug). Switching branches is a com-
plex matter, because, although the branches are maintained
separately in the repository, there is only one working di-
rectory and one staging area. As a result, when switching
branches, files may be unexpectedly overwritten. Git fails
with an error if there are any conflicting changes, effectively
preventing the user from switching branches in these cases.
To mitigate this problem, Git provides a way to save versions
of files to yet another storage area, called the “stash,” using
a special command issued prior to switching branches.
4. A Conceptual Model of Git
The view of Git embodied by our discussion and model has
been obtained from popular references and discussions, and
from observation (especially of the output from the so-called
“porcelain”
7
commands such as git status).
Recall that a “conceptual model” to us is a specification
that focuses on concepts, not on implementation details. And
7
As Chacon in [16], Chapter 9.1, puts it “Git was initially a toolkit for a
VCS [version control system] rather than a full user-friendly VCS, it has
a bunch of verbs [commands] that do low-level work and were designed
to be chained together UNIX style or called from scripts.” These are the
commands that are generally referred to as “plumbing” commands to dif-
ferentiate them with the current, more user-friendly commands referred to
as the “porcelain” commands.
that, in our view, concepts correspond to how users think
about the application.
Due to the complexity of Git and the lack of a succinct
and clear user manual (and the fact that we ourselves are not
Git devotees), there are doubtless some errors in our model.
But the conceptual model conveyed by an application – in
its documentation, marketing materials, implied by its user
interface, and even in the culture that surrounds it – has to
be regarded as inseparable from the application itself. So
to the extent that consensus is missing on an application’s
conceptual model, it is arguably the application itself that is
at fault.
4.1 An Overview of Git’s Conceptual Model
Ideally, a description of a conceptual design should be
implementation-independent; it should be easy to under-
stand; it should be precise enough to support objective anal-
ysis; and it should be lightweight, presenting little inertia to
the exploration of different points in the design space.
For these reasons, we have chosen (initially at least) to
use a very standard state-machine model of computation,
in which named actions (performed by the user or some-
times by the application) produce transitions between ab-
stract states. The abstract state space is described by a re-
lational data model, using the variant of extended entity-
relationship diagrams developed for the Alloy modeling lan-
guage [29]. The actions are crudely specified by naming
them and describing their effects on the state informally.
This form of description is pretty conventional; instead, we
might have chosen any of the well-known “model based”
specification languages (such as Z [40], B [9], VDM [32],
or Alloy). Our own preference is for a diagrammatic repre-
sentation of the state space, but it may not be essential. We
might have used state machine diagrams (such as Statecharts
[25]) instead, but for applications like Git, such a notation
would not have been suitable, because it does not support
richly structured state.
Concepts correspond to state components. To connect the
abstract state components with the user’s understanding of
them as concepts we use Michael Jackson’s notion of a “des-
ignation” [31]: a necessarily informal statement that acts as a
kind of recognition rule. For example, in a conceptual model
of an application for managing university course registra-
tions, we would likely need a designation for the concept
of “student.” Designations are invariably more challenging
(and more interesting) than they first appear to be; the stu-
dents registered for a course, for example, might include not
only regular enrolled students but also special students, vis-
itors, and even staff and faculty members.
The concepts that form the abstract state of Git are
shown in the relational data model of Fig. 1. Each box
represents a set of objects, and the arcs represent rela-
tions. A large, open-headed arrow denotes a classifica-
tion relationship; thus Tracked File, Untracked File,
Assumed Unchanged File, Ignored File and Ignore