An introduction to Git submodules

Git is a fantastic version control system but for many people following the common migration from Subversion it can appear to lack something SVN did reasonably well: externals. However, Git does have an equivalent - in fact, a superior implementation: submodules. Unfortunately as with many git features, the documentation is as heavy going as it is comprehensive - the previous link does eventually get to something like an introduction after a few paragraphs, but it's a little wordy:

Git's submodule support allows a repository to contain, as a subdirectory, a checkout of an external project. Submodules maintain their own identity; the submodule support just stores the submodule repository location and commit ID, so other developers who clone the containing project ("superproject") can easily clone all the submodules at the same revision.

Put simply, submodules allow you to include another repository within a parent project, and allow that parent project to maintain which commit reference the submodule points to. It's this second aspect which is incredibly useful, so we'll discuss it in more detail later.

Why use them?

The most common use case for using submodules (or indeed externals) I've come across is the classic scenario whereby a software project has a library of "shared" code - core code which shouldn't change much from project to project, but code which if it does change wants distributing to any project using the library in a controlled manner. Code contained within the library is likely to be of critical importance, so whilst other projects will probably want the latest code when they're next worked on, they need to have some choice as to when they get it.

Before (decent) version control systems you might have achieved this using a symbolic link to point to a physically shared filesystem:

nick@desktop:~/supermario nick$ ls -lah
total 8
drwxr-xr-x   7 nick  staff   238B 30 Sep 16:34 .
drwxr-xr-x+ 62 nick  staff   2.1K 30 Sep 16:24 ..
drwxr-xr-x   2 nick  staff    68B 30 Sep 16:25 config
drwxr-xr-x   2 nick  staff    68B 30 Sep 16:25 data
drwxr-xr-x   2 nick  staff    68B 30 Sep 16:25 game
lrwxr-xr-x   1 nick  staff     9B 30 Sep 16:25 library -> /mnt/shared/code/library/
-rw-r--r--   1 nick  staff   1.0M 30 Sep 16:34 mario.rom

Simple and effective you might think, but hopelessly error prone - it only takes something as simple a typo from one person and nobody can execute their code, not to mention people accidentally tripping over each other when editing the library code. Subversion's solution, as you might expect, is a lot more 21st Century than this: svn:externals is in fact quite similar to Git's submodule implementation, with one key difference: running svn update on the parent project also fetches the latest changes to the exernal project. If the external project is configured to point at a branch which has since had commits made to it then the parent project gets all the changes, whether it likes them or not:

nick@desktop:~/supermario nick$ svn up
A  foo
M  config/game.ini
X library
M mario.rom
Updated to revision 220.

Fetching external item into library
A  library/newfeature
M library/baz
M library/bar
Updated to revision 854.

The new functionality or modification to library/baz and library/bar could quite easily break our parent project. Of course, this can be mitigated by ensuring the external always points to a stable branch / tag, but this requires discipline and maintenance - and inevitably introduces a manual overhead.

The Git solution

So, enter Git and how it handles submodules. Earlier I highlighted the fact that a submodule always points to a specific commit, so let's examine this aspect in more detail starting off with a rather crude illustration of commits made over time to the library project. Naturally, each commit ID would in practice be a SHA1 hash, but for simplicity's sake we're just using letters of the alphabet. We're also assuming that there is only one master branch to worry about:

Diagram of commits made to the library project

It's important to remember that submodules are fully fledged git repositories in their own right. Although your library code might not work outside the context of a project, the repository has absolutely no awareness of the fact it's being used as a submodule elsewhere. Let's now look at it being used as a submodule by five imaginary projects. For the sake of space, only Project 1 shows an expanded folder tree:

Diagram of projects using the library as a submodule

This illustrates that the library object stored by each project is in effect nothing more than a commit ID - e.g. Projects 1 and 4 simply associate "library" with a commit ID of "E". Therefore each project can quite safely git pull without fear of accidentally winding on the submodule to a potentially unstable or otherwise incompatible revision - because the submodule is a snapshot of the library at a given point in time when we (hopefully) know it worked with our project. Unless the commit ID stored by the parent project has changed, performing a git pull won't change it - you can almost think of the submodule as if it were any other version controlled file within the parent repository, except that instead of containing text or binary data, it contains a reference to a commit instead.

The fact that the submodule is completely unaware of how it is used, or by who, is a huge benefit. No longer do you have to get bogged down in tagging / branching the library code when you're about to make an unsafe change in a branch depended on by external projects, nor do you have to even keep track of these projects - the responsibility lies solely with the parent project itself.

Now we've discussed the basics, let's look at three common situations we'll encounter when dealing with submodules.

1) I want to update the submodule to point to a different commit

What if we know we need a bugfix made in a later commit, or changes from another branch (or tag, or single commit for that matter)? Let's explain this first with a bit of background. From now on we'll use actual real projects rather than made-up Mario examples. Let's say we've just cloned the source code for this website (hint: use git clone --recursive if you wish to do this), in which the jaoss library is a submodule. If we change into the submodule's directory and run a couple of commands we'll get something like:

nick@nick-desktop:~/www/pdsite/jaoss$ git branch
* (no branch)
nick@nick-desktop:~/www/pdsite/jaoss$ git status
# Not currently on any branch.
nothing to commit (working directory clean)

This is known as being a 'detached head' state - your submodule is not tracking any particular branch. However, if you type git log you'll see that you get the history of commits up until the revision pointed at by the parent project - the submodule is still versioned and perfectly aware of history and any changes you make; you're just not on a branch.

So, back to our scenario - let's say we know a bug has been fixed upstream in the master branch of the jaoss library, and it's a fix that we want, or we just want keep up-to-date with the latest commits anyway as we're on a bleeding edge project. One very simple way to do this is to switch to the appropriate branch in your submodule, and then perform a git pull as normal:

nick@nick-desktop:~/www/pdsite/jaoss$ git checkout master
Switched to branch 'master'

nick@nick-desktop:~/www/pdsite/jaoss$ git pull
remote: Counting objects: 21, done.
remote: Compressing objects: 100% (9/9), done.
remote: Total 14 (delta 7), reused 11 (delta 4)
Unpacking objects: 100% (14/14), done.
   eaf764b..765ef98  master     -> origin/master
Updating eaf764b..765ef98
 library/cli/cli.php        |    7 ++++++-
 library/cli/cmd/create.php |    1 -
 library/cli/cmd/help.php   |   19 +++++++++++++++++++
 tools/{jaoss => jcli}      |    0
 4 files changed, 25 insertions(+), 2 deletions(-)
 create mode 100644 library/cli/cmd/help.php
 rename tools/{jaoss => jcli} (100%)

Brilliant - and all as we'd expect, bearing in mind of course this is simply a normal git repository. So how has this changed things for the parent project? Let's change back into the parent directory and have a look:

nick@nick-desktop:~/www/pdsite/jaoss$ cd ../
nick@nick-desktop:~/www/pdsite$ git status
# On branch master
# Changes not staged for commit:
#   (use "git add ..." to update what will be committed)
#   (use "git checkout -- ..." to discard changes in working directory)
#	modified:   jaoss (new commits)
no changes added to commit (use "git add" and/or "git commit -a")

nick@nick-desktop:~/www/pdsite$ git diff
diff --git a/jaoss b/jaoss
index eaf764b..765ef98 160000
--- a/jaoss
+++ b/jaoss
@@ -1 +1 @@
-Subproject commit eaf764b4072a518c024b7572b18877fd6b098d58
+Subproject commit 765ef98ce3b5873ce089cc910ef4446f1cfeb38b

Perfect - git knows that the submodule now points to a different commit to the one it was expecting so it shows the fact it has been modified in the same way as it would with any other file, except for the helpful addition of "new commits" which expands upon why the submodule has changed (the reasons for which can vary). We can now simply add, commit and push these changes and they'll show up just like any other commit message (note: the commit in the link is different, but it illustrates the point).

2) Someone else has changed the submodule reference

If you're working on a project with a submodule and someone else has update its commit ID (e.g. followed the above steps), the next time you perform a git pull you'll get something like:

Updating 9d28d23..554b035
 jaoss |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

To update your submodule to point to this new commit ID you simply type (strangely enough) git submodule update, resulting in something like

Submodule path 'jaoss': checked out '765ef98ce3b5873ce089cc910ef4446f1cfeb38b'

3) I want to make changes to the submodule whilst working on a project

Ahh - we've saved the best and most practical use 'til last. In the use case we've discussed - the "shared code" scenario - it's rare that changes will ever actually be made to the library outside the context of a project - e.g. changes to the library code are triggered by the need for them in an external project. Quite naturally, we can do more than just update our submodule to point to a change made externally - we can change the library's code, commit, push, pull, reset - and everything else you'd expect to be able to do, all within the submodule. Let's fix an annoying bug in the jaoss library (a hypothetical one, of course!) which is preventing from working properly:

do some bugfixing here
nick@nick-desktop:~/www/paynedigital$ git status
# On branch master
# Changes not staged for commit:
#   (use "git add ..." to update what will be committed)
#   (use "git checkout -- ..." to discard changes in working directory)
#   (commit or discard the untracked or modified content in submodules)
#	modified:   jaoss (modified content)
no changes added to commit (use "git add" and/or "git commit -a")

nick@nick-desktop:~/www/paynedigital$ git diff
diff --git a/jaoss b/jaoss
--- a/jaoss
+++ b/jaoss
@@ -1 +1 @@
-Subproject commit 765ef98ce3b5873ce089cc910ef4446f1cfeb38b
+Subproject commit 765ef98ce3b5873ce089cc910ef4446f1cfeb38b-dirty

Again, git knows the nature of the changes, and knows that the jaoss working tree is now 'dirty' - e.g. it has changes within it which are not staged for commit. Let's change into the jaoss directory and see what's what:

nick@nick-desktop:~/www/paynedigital/jaoss$ git status
# On branch master
# Changes not staged for commit:
#   (use "git add ..." to update what will be committed)
#   (use "git checkout -- ..." to discard changes in working directory)
#	modified:   library/request.php
no changes added to commit (use "git add" and/or "git commit -a")

As we'd expect, the submodule is more explicit as to what exactly has changed (because - you guessed it - it's just a git repository). From here, we could commit the bug fix into the appropriate branch (if you're still in detached head, make sure you switch to a branch first!), push it to a remote repository, at which point running git status in the parent project will show the same as the first scenario - that the submodule has been modified with new commits, ready for the parent project to git add jaoss and commit the newly updated submodule reference.


Although a long article, this really is just an introduction. Many of Git's intricacies - the details which make so much of the official documentation so necessarily complex - have been omitted in the interest of trying to focus on how submodules can be used without getting bogged down in low-level technicalities. The scenarios described here are all real life situations I encounter daily, and I simply can't think how any project I work on would function without submodules.


The most tricky part is to define your sub-modules directory structure - and remember that you can't include subfolder of a submodule. In most cases single 'vendor/' folder for all your submodules will do the trick.
@wojtek yep, definitely. I think Symfony used to use a similar structure (their github repo has changed recently though). Personally I use 'deps/' for external submodules, but it all amounts to the same thing :)

Comments are now closed.