Git repository from scratch

This tutorial demonstrates how to create and populate a Git repository without using Git itself, as a way of explaining how Git stores project files and history. There are some Git commands along the way, but they are optional; you could omit all of them and still obtain a valid repository.

Prerequisites — before going through it, you need to understand the purpose of hashing (which Git heavily relies on) and be reasonably familiar with the Unix command-line environment. This is important for making sense of all the steps shown below.

00-prep.txt

Before we begin, we need a way to compress and write files in the zlib format. Unfortunately, the most common command-line programs that provide Deflate compression all use different enclosing formats (gzip, zip). But you may be able to access zlib via some scripting languages; here's an example for Perl:

alias deflate='perl -MIO::Compress::Deflate -e '\''IO::Compress::Deflate::deflate "-" => "-";'\'
alias inflate='perl -MIO::Uncompress::Inflate -e '\''IO::Uncompress::Inflate::inflate "-" => "-";'\'

Some subsequent commands will use the deflate alias to zlib-compress data from standard input to standard output, so make sure you have one.

01-init.txt

These commands create the minimum of files and directories Git expects in a repository:

mkdir git-example
cd git-example
mkdir .git
mkdir .git/objects
mkdir .git/refs
cat <<EOF > .git/config
[core]
    repositoryformatversion = 0
EOF
echo 'ref: refs/heads/master' > .git/HEAD

If you remove any of them, Git will no longer recognize it as a repository.

02-add-file.txt

Let's create the first file in our working tree:

tee file1.txt <<EOF | wc -c
Line 1
Line 2
Line 3
EOF

Take note of the file's size:

Here's how to add it to the repository (imitating git add file1.txt):

printf 'blob 21\x00' > /tmp/blob1
cat file1.txt >> /tmp/blob1
sha1sum /tmp/blob1

6ad36e52f0002937ed2de6a1c15d8a0ae5df056a  /tmp/blob1

mkdir .git/objects/6a
deflate < /tmp/blob1 > .git/objects/6a/d36e52f0002937ed2de6a1c15d8a0ae5df056a
rm /tmp/blob1

Notes:

A ‘blob’ object is used only to store the contents of a file; the file's name and attributes are not part of it.
The name of any Git object is a SHA-1 hash that depends only on that object's type (blob, tree, or commit) and contents.
If the file changes (even by just one byte), the hash would be different, and so it would be stored as a new object.
On the other hand, whenever two files have the same contents (or more commonly, when a file is unchanged between two commits), Git can re-use the existing object; it is only stored once.

03-add-tree.txt

Now we create a ‘tree’ object for the project's root directory:

printf '100644 file1.txt\x00\x6a\xd3\x6e\x52\xf0\x00\x29\x37\xed\x2d\xe6\xa1\xc1\x5d\x8a\x0a\xe5\xdf\x05\x6a' | tee /tmp/tree1-data | wc -c

printf 'tree 37\x00' > /tmp/tree1
cat /tmp/tree1-data >> /tmp/tree1
sha1sum /tmp/tree1

d20f1946b531ca91c8e08744c48811593092f23f  /tmp/tree1

mkdir .git/objects/d2
deflate < /tmp/tree1 > .git/objects/d2/0f1946b531ca91c8e08744c48811593092f23f
rm /tmp/tree1 /tmp/tree1-data

Trees store information about a directory, one entry per file or subdirectory:

The file mode is a heritage from Unix, more or less; in Git, you should only ever see:
- 100644: non-executable file
- 100755: executable file
- 40000: subdirectory
- 120000: symbolic link (the link target is stored in the blob)
The file (or subdirectory) name (terminated by a NUL byte)
The SHA-1 hash, in binary form, of the ‘blob’ containing the file's contents

Notes:

Entries must be sorted by name
If a file's mode, name, or contents changes, then the tree data changes as a result; the tree in turn would then be stored as a new object with a different name (hash).
If the tree itself was for a subdirectory, then the tree data for its parent directory would change (referring to the new tree's SHA-1), and so on; such changes ripple all the way up to the project's root.
The tree for the project's root directory depends on (and only on) every file and subdirectory, so it represents a snapshot of the entire project at any single point.

04-commit-1.txt

Time to create the first commit:

date +%s

(timestamp)

1769456599

tee /tmp/commit1-data <<EOF | wc -c
tree d20f1946b531ca91c8e08744c48811593092f23f
author Your Name <your.email@example.com> 1769456599 +0100
committer Your Name <your.email@example.com> 1769456599 +0100

First commit.
EOF

(length of commit data)

printf 'commit 182\x00' > /tmp/commit1
cat /tmp/commit1-data >> /tmp/commit1
sha1sum /tmp/commit1

09a07a5a0fcba882f3947a63a1aecd8b529a8437  /tmp/commit1

mkdir .git/objects/09
deflate < /tmp/commit1 > .git/objects/09/a07a5a0fcba882f3947a63a1aecd8b529a8437
rm /tmp/commit1 /tmp/commit1-data

A ‘commit’ object contains:

A tree reference for the project's root directory
Any number of parent commit references (none for this initial commit)
The author and committer name, email, and commit timestamp
The commit message (preceded by an extra newline to separate it)

Notes:

Commit data (including the tree hash) is stored as text, unlike trees where hashes are stored in binary form.
As mentioned above, the tree reference essentially describes a single snapshot of the entire project.
The author and committer are usually the same, but some of the more advanced Git commands that recreate commits (like cherry-pick and rebase) will update the committer field but retain the original author.
Normally, a ‘commit’ object refers to a parent commit, which in turn refers to the parent before that, and so on; in this way, commits string together the entire version history of the project.
An initial commit (such as the one we just created) has no parent.
Merge commits have multiple (usually two) parents, since they join different lines of history together; we'll create one of these further on.

Create the master branch and point it to the commit we've just created:

mkdir .git/refs/heads
echo 09a07a5a0fcba882f3947a63a1aecd8b529a8437 > .git/refs/heads/master

05-verify-1.txt

This is a good time to check the results — see how Git interprets what we've done so far.

git read-tree master
# or: git read-tree 09a07a5
# or: git read-tree d20f194

read-tree copies a tree into the index. You don't need to do this, but it will make the index match what we've committed and what we have in our working tree so that git status will report a clean state. If you want to undo it here, you can rm .git/index.

git status

On branch master
nothing to commit, working tree clean

And let's see the commit:

git log

commit 09a07a5a0fcba882f3947a63a1aecd8b529a8437 (HEAD -> master)
Author: Your Name <your.email@example.com>
Date:   Mon Jan 26 20:43:19 2026 +0100

    First commit.

06-rename-branch.txt

You've seen that a branch is just a reference to a commit. Suppose that someone insists that we rename the master branch to main. All we need to do is:

mv .git/refs/heads/master .git/refs/heads/main

We did have our HEAD refer to it, though. So in this case:

rm .git/HEAD
echo 'ref: refs/heads/main' > .git/HEAD

07-new-file-and-dir.txt

To build on our example, we'll add a new file, this time putting it inside of a subdirectory.

(To avoid repeating every step you've already seen, some of these commands combine several steps in order to write an object, though it is very error-prone and I don't recommend doing it this way even when experimenting.)

mkdir .git/objects/3b
printf 'blob 8\x00foo\nbar\n' | deflate > .git/objects/3b/d1f0e29744a1f32b08d5650e62e2e62afb177c
mkdir .git/objects/3a
printf 'tree 37\x00100644 file2.txt\x00\x3b\xd1\xf0\xe2\x97\x44\xa1\xf3\x2b\x08\xd5\x65\x0e\x62\xe2\xe6\x2a\xfb\x17\x7c' | deflate > .git/objects/3a/48677d945744110502acc9eef0714b6d913ccb
mkdir .git/objects/c3
printf 'tree 68\x0040000 dir1\x00\x3a\x48\x67\x7d\x94\x57\x44\x11\x05\x02\xac\xc9\xee\xf0\x71\x4b\x6d\x91\x3c\xcb100644 file1.txt\x00\x6a\xd3\x6e\x52\xf0\x00\x29\x37\xed\x2d\xe6\xa1\xc1\x5d\x8a\x0a\xe5\xdf\x05\x6a' | deflate > .git/objects/c3/55284440779c4ab5c6192b41fe251d49cae038
mkdir .git/objects/16
printf 'commit 241\x00tree c355284440779c4ab5c6192b41fe251d49cae038\nparent 09a07a5a0fcba882f3947a63a1aecd8b529a8437\nauthor Your Name <your.email@example.com> 1769459560 +0100\ncommitter Your Name <your.email@example.com> 1769459560 +0100\n\nAdd dir1 with file2.txt.\n' | deflate > .git/objects/16/47ac5f1eb66df46879bb5121a5e261fab0b2ae

Now we have a new commit that adds dir1/file2.txt. Note that this time…

…we have two trees, one for the root (containing dir1 and file1.txt) and one for dir1 (containing file2.txt);
…the commit has one parent, pointing to our first commit.

Instead of updating master/main, let's put our commit on a new branch:

echo 1647ac5f1eb66df46879bb5121a5e261fab0b2ae > .git/refs/heads/new-file-and-dir

If you want to see the result, try git log --all --graph.

08-merge.txt

As a final example, we're going to create another branch (side-by-side with the one we've just created), followed by a merge commit that joins the two branches together.

mkdir .git/objects/e6
printf 'blob 0\x00' | deflate > .git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
mkdir .git/objects/b4
printf 'tree 74\x00100644 empty.txt\x00\xe6\x9d\xe2\x9b\xb2\xd1\xd6\x43\x4b\x8b\x29\xae\x77\x5a\xd8\xc2\xe4\x8c\x53\x91100644 file1.txt\x00\x6a\xd3\x6e\x52\xf0\x00\x29\x37\xed\x2d\xe6\xa1\xc1\x5d\x8a\x0a\xe5\xdf\x05\x6a' | deflate > .git/objects/b4/d3cd0a8230ed0c2dc15d26946acc3e12d011f8
mkdir .git/objects/d1
printf 'commit 232\x00tree b4d3cd0a8230ed0c2dc15d26946acc3e12d011f8\nparent 09a07a5a0fcba882f3947a63a1aecd8b529a8437\nauthor Your Name <your.email@example.com> 1769461503 +0100\ncommitter Your Name <your.email@example.com> 1769461503 +0100\n\nAdd empty file.\n' | deflate > .git/objects/d1/17657bc81c10f7d9350d80831a5d0dd66ee9e6
echo d117657bc81c10f7d9350d80831a5d0dd66ee9e6 > .git/refs/heads/add-empty-file

The first commit just adds an empty file. There's one tree, the root, containing empty.txt and file1.txt.

So now we have three commits in total. Two are on separate branches, and they both have our initial commit as their parent (again, git log --all --graph if you want to see the result).

mkdir .git/objects/a7
printf 'tree 105\x0040000 dir1\x00\x3a\x48\x67\x7d\x94\x57\x44\x11\x05\x02\xac\xc9\xee\xf0\x71\x4b\x6d\x91\x3c\xcb100644 empty.txt\x00\xe6\x9d\xe2\x9b\xb2\xd1\xd6\x43\x4b\x8b\x29\xae\x77\x5a\xd8\xc2\xe4\x8c\x53\x91100644 file1.txt\x00\x6a\xd3\x6e\x52\xf0\x00\x29\x37\xed\x2d\xe6\xa1\xc1\x5d\x8a\x0a\xe5\xdf\x05\x6a' | deflate > .git/objects/a7/fafdefb748ff4646c1e85d58e1be90b03ff2a8
mkdir .git/objects/a8
printf 'commit 307\x00tree a7fafdefb748ff4646c1e85d58e1be90b03ff2a8\nparent 1647ac5f1eb66df46879bb5121a5e261fab0b2ae\nparent d117657bc81c10f7d9350d80831a5d0dd66ee9e6\nauthor Your Name <your.email@example.com> 1769462126 +0100\ncommitter Your Name <your.email@example.com> 1769462126 +0100\n\nMerge add-empty-file and new-file-and-dir.\n' | deflate > .git/objects/a8/8b6bca831d5fd9644595317e1638b3dd3d18ff

This merge commit combines all the changes we've made so far: it contains dir1 (with file2.txt, the same we already had), empty.txt, and file1.txt. Finally, we'll update main to point to it:

rm .git/refs/heads/main
echo a88b6bca831d5fd9644595317e1638b3dd3d18ff > .git/refs/heads/main

09-verify-2.txt

Note that in the last couple of examples, we've neglected not just the index, but also our working tree, which still only contains file1.txt. We can have Git itself fix this for us:

git read-tree main
git checkout .

To let Git verify everything we've done and show us the final repository state:

git fsck --verbose
git log --all --graph
git branch
git status

This concludes the tutorial. You may be wondering about some of the things Git does that I didn't show. I'll mention a few here:

Actually performing a merge between branches involves more than just writing a commit with two parents; first, Git needs to construct the merged tree and file data (automatically where it can; with the user's help if it must due to conflicts). We skipped over this step and just wrote the resulting (already merged) data into the repository. Explaining how to merge files and directories is outside the scope of this tutorial.
Pack files (.git/objects/pack/*) store multiple objects inside a single file; these can contain deltas (storing an object as a difference to another one), but those are only used to reconstruct object (blob/tree/commit) data. Pack files merely store the same information that loose objects do, but in a different format. Commands like git diff compute differences between objects regardless of which way they are stored; packfile deltas are irrelevant to this. References (branches and tags) can also be packed by Git (.git/packed-refs).
The index (.git/index), mainly a ‘staging area’ for commits — its format is more complicated and we've mostly ignored it, in order to keep things simple.
A fourth type of object (tag) stores annotated tags; these are not worth covering here. Lightweight tags (.git/refs/tags/*) are references to commits, just like branches.
Reflogs (.git/logs/*) are like ‘undo lists’ for branches and the HEAD; whenever a branch is updated, an entry is added to its corresponding reflog. Reflogs are local and only for references (hence the name); they are not part of the project version history and are not taken from or shared with remote repositories.

Git itself has plenty of documentation on all these things; there's no point in providing yet another explanation here.

I believe Git is unusual compared to most other software in that its data format is both relatively simple and relatively exposed; several of its subcommands (like cat-file, hash-object, and mktree) display or manipulate these objects directly, while the more common, high-level subcommands like add and commit are built on top of them.

Because of that, it really helps to understand that data format before learning about all of the functionality built around it. Trying to just learn all the commands without knowing about what's underneath might keep you confused about Git forever. I hope that actually seeing the data stored within these objects — the basic elements of Git — helps to clear things up and make the rest of Git easier to understand.