Step 1: hex encoding
--------------------
The "hex" program converts each pair of ASCII hexadecimal nibbles
found into their corresponding binary byte form, unless it encounters
a hash sign (#), which the marks the beginning of a comment that
continues until the end of the line.

Input byte values 0-31 are ignored. This includes all ASCII whitespace
characters which can therefore be freely used together with comment
to make the input more readable.

It reads and writes one byte at a time. In theory, it handles input of
arbitrary size, but it's very slow.

Step 2: labeled hex (lhex)
--------------------------
The "lhex" program does the same as "hex", except that it also accepts
label definitions and references.

Label definitions are in the form of ":x", where x is a single character
identifying the label. Any byte is valid for x.

Label references are in the form of ".x", where x is the target label.
They are computed as 4 bytes according to the formula "p-q-4", where p
is the position of the label and q is the position of the reference. For
example, the following produces an infinite loop:

:x
e9 .x               # jmp :x       -->  e9 fb ff ff ff   # jmp -5

Technically, lhex works a bit differently than hex does. It reads up to
a maximum amount of input (64k, but easily adjustable in the source)
and passes over it twice -- once to process label definitions and count
the number of output bytes, then once more to produce the actual output.
This does permit both forward and backward references to labels.

Note: an earlier version of lhex actually only produced 1-byte relative
addresses, but let you specify a single-digit number to subtract from
the address, e.g. ":x eb .1x" or ":x e9 .4x ff ff ff". I do not
recommend doing lhex this way because sooner or later you will need to
jump distances larger than 256 bytes, and tracking and updating those
addresses, even just in 256-byte increments, is an unnecessary waste of
time.

Step 3: prepend ELF header
--------------------------
The "elfwrap" program prepends an ELF header (its own, actually) to its
input, then adjusts the size fields before outputting the combined
result. Like lhex, elfwrap buffers its input before processing it, since
it needs to know the total size.

Only the first three programs (hex, lhex, and elfwrap) have been given
their own ELF header. For cmeta (both cma and its output), elfwrap is
used to prepend the ELF header.

Step 4: character-based metacompiler (cmeta)
--------------------------------------------
Char-meta or "cmeta" is a meta-compiler language which is structurally
very similar to META II. [1]  The main difference is that it is
character-based, with no intrinsic support for strings.

Step 4a: minimal character-based assembly language for cmeta (cma)
------------------------------------------------------------------
Just like META II, cmeta compiles itself to an assembly-like language.
The "cma.txt" file contains a description of this language. The cmeta
assembler "cma" translates the assembly language to machine code.

There are actually two versions of cma. The first one, "cma1", is
written in lhex, and is used to assemble the initial "cmeta1" binary in
the next step. The second version, simply called "cma", is written in
cmeta's language and can be used once cmeta has been built.

Step 4b: hand-compiled cmeta assembly language program (cmeta1)
---------------------------------------------------------------
Without the cmeta program itself, an initial, manually translated
version of the cmeta program in "cma" assembly language is needed to
bootstrap the cmeta compiler.

Once cmeta is available, it can be used to generate subsequent versions
of cmeta1.cma. A cmeta program called "cmafmt" has also been included,
which automatically inserts newlines between opcodes and labels. It is
not necessary, but it may make the assembly code slightly more readable.

Step 4c: cmeta runtime code
---------------------------
Machine code produced by cma expects to have an environment with its
input and an output buffer already in place. The cmeta runtime ("cmrt")
takes care of this. It is the initial entry point of a compiler produced
by cmeta, and takes care of such things as allocating memory, reading
input and writing output.

Combining the result
--------------------
The cmeta program produces assembly from which cma generates machine
code. Next, the "cmrt" runtime code is prepended to this machine code.
Then, the combined code is run through elfwrap to produce a functional
compiler binary.


[1] "META II -- a syntax-oriented compiler writing language" --
    D. V. Schorre ... http://www.ibm-1401.info/Meta-II-schorre.pdf


Legal information
-----------------
Copyright (c) 2018-2019 Pim Goossens

Except where explicitly specified otherwise, files in this repository
are distributed under two alternative copyright licenses:

* The GNU Lesser General Public License (LGPL), version 3 or any later version
* The GNU General Public License (GPL), version 2 or any later version

In other words, they are dual-licensed. In order to legally redistribute
these files, all terms of at least one of these licenses must be
complied with.

This software is provided with NO WARRANTY of any kind.

See the file COPYING for the exact terms of the LGPL version 3. Note
that it makes references to another license, namely the GNU General
Public License (GPL) version 3. A copy of that license is included for
reference in the file "GPL-3.txt".

See the file "GPL-2.txt" for the GPL version 2.