(Back to main page)

Git repository from scratch

This tutorial demonstrates how to create and populate a Git repository without using Git itself, as a way of explaining how Git stores project files and history. There are some Git commands along the way, but they are optional; you could omit all of them and still obtain a valid repository.

Prerequisites — before going through it, you need to understand the purpose of hashing (which Git heavily relies on) and be reasonably familiar with the Unix command-line environment. This is important for making sense of all the steps shown below.

00-prep.txt

Before we begin, we need a way to compress and write files in the zlib format. Unfortunately, the most common command-line programs that provide Deflate compression all use different enclosing formats (gzip, zip). But you may be able to access zlib via some scripting languages; here's an example for Perl:

alias deflate='perl -MIO::Compress::Deflate -e '\''IO::Compress::Deflate::deflate "-" => "-";'\'
alias inflate='perl -MIO::Uncompress::Inflate -e '\''IO::Uncompress::Inflate::inflate "-" => "-";'\'

Some subsequent commands will use the deflate alias to zlib-compress data from standard input to standard output, so make sure you have one.


01-init.txt

These commands create the minimum of files and directories Git expects in a repository:

mkdir git-example
cd git-example
mkdir .git
mkdir .git/objects
mkdir .git/refs
cat <<EOF > .git/config
[core]
    repositoryformatversion = 0
EOF
echo 'ref: refs/heads/master' > .git/HEAD

If you remove any of them, Git will no longer recognize it as a repository.


02-add-file.txt

Let's create the first file in our working tree:

tee file1.txt <<EOF | wc -c
Line 1
Line 2
Line 3
EOF

Take note of the file's size:

21

Here's how to add it to the repository (imitating git add file1.txt):

printf 'blob 21\x00' > /tmp/blob1
cat file1.txt >> /tmp/blob1
sha1sum /tmp/blob1
6ad36e52f0002937ed2de6a1c15d8a0ae5df056a  /tmp/blob1
mkdir .git/objects/6a
deflate < /tmp/blob1 > .git/objects/6a/d36e52f0002937ed2de6a1c15d8a0ae5df056a
rm /tmp/blob1

Notes:


03-add-tree.txt

Now we create a ‘tree’ object for the project's root directory:

printf '100644 file1.txt\x00\x6a\xd3\x6e\x52\xf0\x00\x29\x37\xed\x2d\xe6\xa1\xc1\x5d\x8a\x0a\xe5\xdf\x05\x6a' | tee /tmp/tree1-data | wc -c
37
printf 'tree 37\x00' > /tmp/tree1
cat /tmp/tree1-data >> /tmp/tree1
sha1sum /tmp/tree1
d20f1946b531ca91c8e08744c48811593092f23f  /tmp/tree1
mkdir .git/objects/d2
deflate < /tmp/tree1 > .git/objects/d2/0f1946b531ca91c8e08744c48811593092f23f
rm /tmp/tree1 /tmp/tree1-data

Trees store information about a directory, one entry per file or subdirectory:

Notes:


04-commit-1.txt

Time to create the first commit:

date +%s

(timestamp)

1769456599
tee /tmp/commit1-data <<EOF | wc -c
tree d20f1946b531ca91c8e08744c48811593092f23f
author Your Name <your.email@example.com> 1769456599 +0100
committer Your Name <your.email@example.com> 1769456599 +0100

First commit.
EOF

(length of commit data)

182
printf 'commit 182\x00' > /tmp/commit1
cat /tmp/commit1-data >> /tmp/commit1
sha1sum /tmp/commit1
09a07a5a0fcba882f3947a63a1aecd8b529a8437  /tmp/commit1
mkdir .git/objects/09
deflate < /tmp/commit1 > .git/objects/09/a07a5a0fcba882f3947a63a1aecd8b529a8437
rm /tmp/commit1 /tmp/commit1-data

A ‘commit’ object contains:

Notes:

Create the master branch and point it to the commit we've just created:

mkdir .git/refs/heads
echo 09a07a5a0fcba882f3947a63a1aecd8b529a8437 > .git/refs/heads/master

05-verify-1.txt

This is a good time to check the results — see how Git interprets what we've done so far.

git read-tree master
# or: git read-tree 09a07a5
# or: git read-tree d20f194

read-tree copies a tree into the index. You don't need to do this, but it will make the index match what we've committed and what we have in our working tree so that git status will report a clean state. If you want to undo it here, you can rm .git/index.

git status
On branch master
nothing to commit, working tree clean

And let's see the commit:

git log
commit 09a07a5a0fcba882f3947a63a1aecd8b529a8437 (HEAD -> master)
Author: Your Name <your.email@example.com>
Date:   Mon Jan 26 20:43:19 2026 +0100

    First commit.

06-rename-branch.txt

You've seen that a branch is just a reference to a commit. Suppose that someone insists that we rename the master branch to main. All we need to do is:

mv .git/refs/heads/master .git/refs/heads/main

We did have our HEAD refer to it, though. So in this case:

rm .git/HEAD
echo 'ref: refs/heads/main' > .git/HEAD

07-new-file-and-dir.txt

To build on our example, we'll add a new file, this time putting it inside of a subdirectory.

(To avoid repeating every step you've already seen, some of these commands combine several steps in order to write an object, though it is very error-prone and I don't recommend doing it this way even when experimenting.)

mkdir .git/objects/3b
printf 'blob 8\x00foo\nbar\n' | deflate > .git/objects/3b/d1f0e29744a1f32b08d5650e62e2e62afb177c
mkdir .git/objects/3a
printf 'tree 37\x00100644 file2.txt\x00\x3b\xd1\xf0\xe2\x97\x44\xa1\xf3\x2b\x08\xd5\x65\x0e\x62\xe2\xe6\x2a\xfb\x17\x7c' | deflate > .git/objects/3a/48677d945744110502acc9eef0714b6d913ccb
mkdir .git/objects/c3
printf 'tree 68\x0040000 dir1\x00\x3a\x48\x67\x7d\x94\x57\x44\x11\x05\x02\xac\xc9\xee\xf0\x71\x4b\x6d\x91\x3c\xcb100644 file1.txt\x00\x6a\xd3\x6e\x52\xf0\x00\x29\x37\xed\x2d\xe6\xa1\xc1\x5d\x8a\x0a\xe5\xdf\x05\x6a' | deflate > .git/objects/c3/55284440779c4ab5c6192b41fe251d49cae038
mkdir .git/objects/16
printf 'commit 241\x00tree c355284440779c4ab5c6192b41fe251d49cae038\nparent 09a07a5a0fcba882f3947a63a1aecd8b529a8437\nauthor Your Name <your.email@example.com> 1769459560 +0100\ncommitter Your Name <your.email@example.com> 1769459560 +0100\n\nAdd dir1 with file2.txt.\n' | deflate > .git/objects/16/47ac5f1eb66df46879bb5121a5e261fab0b2ae

Now we have a new commit that adds dir1/file2.txt. Note that this time…

Instead of updating master/main, let's put our commit on a new branch:

echo 1647ac5f1eb66df46879bb5121a5e261fab0b2ae > .git/refs/heads/new-file-and-dir

If you want to see the result, try git log --all --graph.


08-merge.txt

As a final example, we're going to create another branch (side-by-side with the one we've just created), followed by a merge commit that joins the two branches together.

mkdir .git/objects/e6
printf 'blob 0\x00' | deflate > .git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
mkdir .git/objects/b4
printf 'tree 74\x00100644 empty.txt\x00\xe6\x9d\xe2\x9b\xb2\xd1\xd6\x43\x4b\x8b\x29\xae\x77\x5a\xd8\xc2\xe4\x8c\x53\x91100644 file1.txt\x00\x6a\xd3\x6e\x52\xf0\x00\x29\x37\xed\x2d\xe6\xa1\xc1\x5d\x8a\x0a\xe5\xdf\x05\x6a' | deflate > .git/objects/b4/d3cd0a8230ed0c2dc15d26946acc3e12d011f8
mkdir .git/objects/d1
printf 'commit 232\x00tree b4d3cd0a8230ed0c2dc15d26946acc3e12d011f8\nparent 09a07a5a0fcba882f3947a63a1aecd8b529a8437\nauthor Your Name <your.email@example.com> 1769461503 +0100\ncommitter Your Name <your.email@example.com> 1769461503 +0100\n\nAdd empty file.\n' | deflate > .git/objects/d1/17657bc81c10f7d9350d80831a5d0dd66ee9e6
echo d117657bc81c10f7d9350d80831a5d0dd66ee9e6 > .git/refs/heads/add-empty-file

The first commit just adds an empty file. There's one tree, the root, containing empty.txt and file1.txt.

So now we have three commits in total. Two are on separate branches, and they both have our initial commit as their parent (again, git log --all --graph if you want to see the result).

mkdir .git/objects/a7
printf 'tree 105\x0040000 dir1\x00\x3a\x48\x67\x7d\x94\x57\x44\x11\x05\x02\xac\xc9\xee\xf0\x71\x4b\x6d\x91\x3c\xcb100644 empty.txt\x00\xe6\x9d\xe2\x9b\xb2\xd1\xd6\x43\x4b\x8b\x29\xae\x77\x5a\xd8\xc2\xe4\x8c\x53\x91100644 file1.txt\x00\x6a\xd3\x6e\x52\xf0\x00\x29\x37\xed\x2d\xe6\xa1\xc1\x5d\x8a\x0a\xe5\xdf\x05\x6a' | deflate > .git/objects/a7/fafdefb748ff4646c1e85d58e1be90b03ff2a8
mkdir .git/objects/a8
printf 'commit 307\x00tree a7fafdefb748ff4646c1e85d58e1be90b03ff2a8\nparent 1647ac5f1eb66df46879bb5121a5e261fab0b2ae\nparent d117657bc81c10f7d9350d80831a5d0dd66ee9e6\nauthor Your Name <your.email@example.com> 1769462126 +0100\ncommitter Your Name <your.email@example.com> 1769462126 +0100\n\nMerge add-empty-file and new-file-and-dir.\n' | deflate > .git/objects/a8/8b6bca831d5fd9644595317e1638b3dd3d18ff

This merge commit combines all the changes we've made so far: it contains dir1 (with file2.txt, the same we already had), empty.txt, and file1.txt. Finally, we'll update main to point to it:

rm .git/refs/heads/main
echo a88b6bca831d5fd9644595317e1638b3dd3d18ff > .git/refs/heads/main

09-verify-2.txt

Note that in the last couple of examples, we've neglected not just the index, but also our working tree, which still only contains file1.txt. We can have Git itself fix this for us:

git read-tree main
git checkout .

To let Git verify everything we've done and show us the final repository state:

git fsck --verbose
git log --all --graph
git branch
git status

This concludes the tutorial. You may be wondering about some of the things Git does that I didn't show. I'll mention a few here:

Git itself has plenty of documentation on all these things; there's no point in providing yet another explanation here.

I believe Git is unusual compared to most other software in that its data format is both relatively simple and relatively exposed; several of its subcommands (like cat-file, hash-object, and mktree) display or manipulate these objects directly, while the more common, high-level subcommands like add and commit are built on top of them.

Because of that, it really helps to understand that data format before learning about all of the functionality built around it. Trying to just learn all the commands without knowing about what's underneath might keep you confused about Git forever. I hope that actually seeing the data stored within these objects — the basic elements of Git — helps to clear things up and make the rest of Git easier to understand.