This tutorial demonstrates how to create and populate a Git repository without using Git itself, as a way of explaining how Git stores project files and history. There are some Git commands along the way, but they are optional; you could omit all of them and still obtain a valid repository.
Prerequisites — before going through it, you need to understand the purpose of hashing (which Git heavily relies on) and be reasonably familiar with the Unix command-line environment. This is important for making sense of all the steps shown below.
00-prep.txtBefore we begin, we need a way to compress and write files in the zlib format. Unfortunately, the most common command-line programs that provide Deflate compression all use different enclosing formats (gzip, zip). But you may be able to access zlib via some scripting languages; here's an example for Perl:
alias deflate='perl -MIO::Compress::Deflate -e '\''IO::Compress::Deflate::deflate "-" => "-";'\' alias inflate='perl -MIO::Uncompress::Inflate -e '\''IO::Uncompress::Inflate::inflate "-" => "-";'\'
Some subsequent commands will use the deflate alias to
zlib-compress data from standard input to standard output, so make sure you
have one.
These commands create the minimum of files and directories Git expects in a repository:
mkdir git-example
cd git-example
mkdir .git
mkdir .git/objects
mkdir .git/refs
cat <<EOF > .git/config
[core]
repositoryformatversion = 0
EOF
echo 'ref: refs/heads/master' > .git/HEAD
If you remove any of them, Git will no longer recognize it as a repository.
Let's create the first file in our working tree:
tee file1.txt <<EOF | wc -c Line 1 Line 2 Line 3 EOF
Take note of the file's size:
21
Here's how to add it to the repository (imitating git add
file1.txt):
printf 'blob 21\x00' > /tmp/blob1 cat file1.txt >> /tmp/blob1 sha1sum /tmp/blob1
6ad36e52f0002937ed2de6a1c15d8a0ae5df056a /tmp/blob1
mkdir .git/objects/6a deflate < /tmp/blob1 > .git/objects/6a/d36e52f0002937ed2de6a1c15d8a0ae5df056a rm /tmp/blob1
Notes:
Now we create a ‘tree’ object for the project's root directory:
printf '100644 file1.txt\x00\x6a\xd3\x6e\x52\xf0\x00\x29\x37\xed\x2d\xe6\xa1\xc1\x5d\x8a\x0a\xe5\xdf\x05\x6a' | tee /tmp/tree1-data | wc -c
37
printf 'tree 37\x00' > /tmp/tree1 cat /tmp/tree1-data >> /tmp/tree1 sha1sum /tmp/tree1
d20f1946b531ca91c8e08744c48811593092f23f /tmp/tree1
mkdir .git/objects/d2 deflate < /tmp/tree1 > .git/objects/d2/0f1946b531ca91c8e08744c48811593092f23f rm /tmp/tree1 /tmp/tree1-data
Trees store information about a directory, one entry per file or subdirectory:
100644: non-executable file100755: executable file40000: subdirectory120000: symbolic link (the link target is stored in the blob)Notes:
Time to create the first commit:
date +%s
(timestamp)
1769456599
tee /tmp/commit1-data <<EOF | wc -c tree d20f1946b531ca91c8e08744c48811593092f23f author Your Name <your.email@example.com> 1769456599 +0100 committer Your Name <your.email@example.com> 1769456599 +0100 First commit. EOF
(length of commit data)
182
printf 'commit 182\x00' > /tmp/commit1 cat /tmp/commit1-data >> /tmp/commit1 sha1sum /tmp/commit1
09a07a5a0fcba882f3947a63a1aecd8b529a8437 /tmp/commit1
mkdir .git/objects/09 deflate < /tmp/commit1 > .git/objects/09/a07a5a0fcba882f3947a63a1aecd8b529a8437 rm /tmp/commit1 /tmp/commit1-data
A ‘commit’ object contains:
Notes:
cherry-pick and rebase) will update the
committer field but retain the original author.Create the master branch and point it to the commit we've just
created:
mkdir .git/refs/heads echo 09a07a5a0fcba882f3947a63a1aecd8b529a8437 > .git/refs/heads/master
This is a good time to check the results — see how Git interprets what we've done so far.
git read-tree master # or: git read-tree 09a07a5 # or: git read-tree d20f194
read-tree copies a tree into the index. You don't need
to do this, but it will make the index match what we've committed and what we
have in our working tree so that git status will report a clean
state. If you want to undo it here, you can rm .git/index.
git status
On branch master nothing to commit, working tree clean
And let's see the commit:
git log
commit 09a07a5a0fcba882f3947a63a1aecd8b529a8437 (HEAD -> master)
Author: Your Name <your.email@example.com>
Date: Mon Jan 26 20:43:19 2026 +0100
First commit.
You've seen that a branch is just a reference to a commit. Suppose that
someone insists that we rename the master branch to
main. All we need to do is:
mv .git/refs/heads/master .git/refs/heads/main
We did have our HEAD refer to it, though. So in this case:
rm .git/HEAD echo 'ref: refs/heads/main' > .git/HEAD
To build on our example, we'll add a new file, this time putting it inside of a subdirectory.
(To avoid repeating every step you've already seen, some of these commands combine several steps in order to write an object, though it is very error-prone and I don't recommend doing it this way even when experimenting.)
mkdir .git/objects/3b printf 'blob 8\x00foo\nbar\n' | deflate > .git/objects/3b/d1f0e29744a1f32b08d5650e62e2e62afb177c mkdir .git/objects/3a printf 'tree 37\x00100644 file2.txt\x00\x3b\xd1\xf0\xe2\x97\x44\xa1\xf3\x2b\x08\xd5\x65\x0e\x62\xe2\xe6\x2a\xfb\x17\x7c' | deflate > .git/objects/3a/48677d945744110502acc9eef0714b6d913ccb mkdir .git/objects/c3 printf 'tree 68\x0040000 dir1\x00\x3a\x48\x67\x7d\x94\x57\x44\x11\x05\x02\xac\xc9\xee\xf0\x71\x4b\x6d\x91\x3c\xcb100644 file1.txt\x00\x6a\xd3\x6e\x52\xf0\x00\x29\x37\xed\x2d\xe6\xa1\xc1\x5d\x8a\x0a\xe5\xdf\x05\x6a' | deflate > .git/objects/c3/55284440779c4ab5c6192b41fe251d49cae038 mkdir .git/objects/16 printf 'commit 241\x00tree c355284440779c4ab5c6192b41fe251d49cae038\nparent 09a07a5a0fcba882f3947a63a1aecd8b529a8437\nauthor Your Name <your.email@example.com> 1769459560 +0100\ncommitter Your Name <your.email@example.com> 1769459560 +0100\n\nAdd dir1 with file2.txt.\n' | deflate > .git/objects/16/47ac5f1eb66df46879bb5121a5e261fab0b2ae
Now we have a new commit that adds dir1/file2.txt. Note that
this time…
dir1
and file1.txt) and one for dir1 (containing
file2.txt);Instead of updating master/main, let's put our
commit on a new branch:
echo 1647ac5f1eb66df46879bb5121a5e261fab0b2ae > .git/refs/heads/new-file-and-dir
If you want to see the result, try git log --all --graph.
As a final example, we're going to create another branch (side-by-side with the one we've just created), followed by a merge commit that joins the two branches together.
mkdir .git/objects/e6 printf 'blob 0\x00' | deflate > .git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 mkdir .git/objects/b4 printf 'tree 74\x00100644 empty.txt\x00\xe6\x9d\xe2\x9b\xb2\xd1\xd6\x43\x4b\x8b\x29\xae\x77\x5a\xd8\xc2\xe4\x8c\x53\x91100644 file1.txt\x00\x6a\xd3\x6e\x52\xf0\x00\x29\x37\xed\x2d\xe6\xa1\xc1\x5d\x8a\x0a\xe5\xdf\x05\x6a' | deflate > .git/objects/b4/d3cd0a8230ed0c2dc15d26946acc3e12d011f8 mkdir .git/objects/d1 printf 'commit 232\x00tree b4d3cd0a8230ed0c2dc15d26946acc3e12d011f8\nparent 09a07a5a0fcba882f3947a63a1aecd8b529a8437\nauthor Your Name <your.email@example.com> 1769461503 +0100\ncommitter Your Name <your.email@example.com> 1769461503 +0100\n\nAdd empty file.\n' | deflate > .git/objects/d1/17657bc81c10f7d9350d80831a5d0dd66ee9e6 echo d117657bc81c10f7d9350d80831a5d0dd66ee9e6 > .git/refs/heads/add-empty-file
The first commit just adds an empty file. There's one tree, the root,
containing empty.txt and file1.txt.
So now we have three commits in total. Two are on separate branches, and
they both have our initial commit as their parent (again, git log --all
--graph if you want to see the result).
mkdir .git/objects/a7 printf 'tree 105\x0040000 dir1\x00\x3a\x48\x67\x7d\x94\x57\x44\x11\x05\x02\xac\xc9\xee\xf0\x71\x4b\x6d\x91\x3c\xcb100644 empty.txt\x00\xe6\x9d\xe2\x9b\xb2\xd1\xd6\x43\x4b\x8b\x29\xae\x77\x5a\xd8\xc2\xe4\x8c\x53\x91100644 file1.txt\x00\x6a\xd3\x6e\x52\xf0\x00\x29\x37\xed\x2d\xe6\xa1\xc1\x5d\x8a\x0a\xe5\xdf\x05\x6a' | deflate > .git/objects/a7/fafdefb748ff4646c1e85d58e1be90b03ff2a8 mkdir .git/objects/a8 printf 'commit 307\x00tree a7fafdefb748ff4646c1e85d58e1be90b03ff2a8\nparent 1647ac5f1eb66df46879bb5121a5e261fab0b2ae\nparent d117657bc81c10f7d9350d80831a5d0dd66ee9e6\nauthor Your Name <your.email@example.com> 1769462126 +0100\ncommitter Your Name <your.email@example.com> 1769462126 +0100\n\nMerge add-empty-file and new-file-and-dir.\n' | deflate > .git/objects/a8/8b6bca831d5fd9644595317e1638b3dd3d18ff
This merge commit combines all the changes we've made so far: it contains
dir1 (with file2.txt, the same we already had),
empty.txt, and file1.txt. Finally, we'll update
main to point to it:
rm .git/refs/heads/main echo a88b6bca831d5fd9644595317e1638b3dd3d18ff > .git/refs/heads/main
Note that in the last couple of examples, we've neglected not just the
index, but also our working tree, which still only contains
file1.txt. We can have Git itself fix this for us:
git read-tree main git checkout .
To let Git verify everything we've done and show us the final repository state:
git fsck --verbose git log --all --graph git branch git status
This concludes the tutorial. You may be wondering about some of the things Git does that I didn't show. I'll mention a few here:
.git/objects/pack/*) store multiple objects
inside a single file; these can contain deltas (storing an object as
a difference to another one), but those are only used to reconstruct object
(blob/tree/commit) data. Pack files merely store the same information that
loose objects do, but in a different format. Commands like git
diff compute differences between objects regardless of which way they
are stored; packfile deltas are irrelevant to this. References (branches and
tags) can also be packed by Git (.git/packed-refs)..git/index), mainly a ‘staging
area’ for commits — its format is more complicated and we've mostly
ignored it, in order to keep things simple.tag) stores annotated tags;
these are not worth covering here. Lightweight tags
(.git/refs/tags/*) are references to commits, just like
branches..git/logs/*) are like ‘undo lists’
for branches and the HEAD; whenever a branch is updated, an entry is added to
its corresponding reflog. Reflogs are local and only for references (hence the
name); they are not part of the project version history and are not taken from
or shared with remote repositories.Git itself has plenty of documentation on all these things; there's no point in providing yet another explanation here.
I believe Git is unusual compared to most other software in that its data
format is both relatively simple and relatively exposed; several of its
subcommands (like cat-file, hash-object, and
mktree) display or manipulate these objects directly, while the
more common, high-level subcommands like add and
commit are built on top of them.
Because of that, it really helps to understand that data format before learning about all of the functionality built around it. Trying to just learn all the commands without knowing about what's underneath might keep you confused about Git forever. I hope that actually seeing the data stored within these objects — the basic elements of Git — helps to clear things up and make the rest of Git easier to understand.