Step 1: hex encoding -------------------- The "hex" program converts each pair of ASCII hexadecimal nibbles found into their corresponding binary byte form, unless it encounters a hash sign (#), which the marks the beginning of a comment that continues until the end of the line. Input byte values 0-31 are ignored. This includes all ASCII whitespace characters which can therefore be freely used together with comment to make the input more readable. It reads and writes one byte at a time. In theory, it handles input of arbitrary size, but it's very slow. Step 2: labeled hex (lhex) -------------------------- The "lhex" program does the same as "hex", except that it also accepts label definitions and references. Label definitions are in the form of ":x", where x is a single character identifying the label. Any byte is valid for x. Label references are in the form of ".x", where x is the target label. They are computed as 4 bytes according to the formula "p-q-4", where p is the position of the label and q is the position of the reference. For example, the following produces an infinite loop: :x e9 .x # jmp :x --> e9 fb ff ff ff # jmp -5 Technically, lhex works a bit differently than hex does. It reads up to a maximum amount of input (64k, but easily adjustable in the source) and passes over it twice -- once to process label definitions and count the number of output bytes, then once more to produce the actual output. This does permit both forward and backward references to labels. Note: an earlier version of lhex actually only produced 1-byte relative addresses, but let you specify a single-digit number to subtract from the address, e.g. ":x eb .1x" or ":x e9 .4x ff ff ff". I do not recommend doing lhex this way because sooner or later you will need to jump distances larger than 256 bytes, and tracking and updating those addresses, even just in 256-byte increments, is an unnecessary waste of time. Step 3: prepend ELF header -------------------------- The "elfwrap" program prepends an ELF header (its own, actually) to its input, then adjusts the size fields before outputting the combined result. Like lhex, elfwrap buffers its input before processing it, since it needs to know the total size. Only the first three programs (hex, lhex, and elfwrap) have been given their own ELF header. For cmeta (both cma and its output), elfwrap is used to prepend the ELF header. Step 4: character-based metacompiler (cmeta) -------------------------------------------- Char-meta or "cmeta" is a meta-compiler language which is structurally very similar to META II. [1] The main difference is that it is character-based, with no intrinsic support for strings. Step 4a: minimal character-based assembly language for cmeta (cma) ------------------------------------------------------------------ Just like META II, cmeta compiles itself to an assembly-like language. The "cma.txt" file contains a description of this language. The cmeta assembler "cma" translates the assembly language to machine code. There are actually two versions of cma. The first one, "cma1", is written in lhex, and is used to assemble the initial "cmeta1" binary in the next step. The second version, simply called "cma", is written in cmeta's language and can be used once cmeta has been built. Step 4b: hand-compiled cmeta assembly language program (cmeta1) --------------------------------------------------------------- Without the cmeta program itself, an initial, manually translated version of the cmeta program in "cma" assembly language is needed to bootstrap the cmeta compiler. Once cmeta is available, it can be used to generate subsequent versions of cmeta1.cma. A cmeta program called "cmafmt" has also been included, which automatically inserts newlines between opcodes and labels. It is not necessary, but it may make the assembly code slightly more readable. Step 4c: cmeta runtime code --------------------------- Machine code produced by cma expects to have an environment with its input and an output buffer already in place. The cmeta runtime ("cmrt") takes care of this. It is the initial entry point of a compiler produced by cmeta, and takes care of such things as allocating memory, reading input and writing output. Combining the result -------------------- The cmeta program produces assembly from which cma generates machine code. Next, the "cmrt" runtime code is prepended to this machine code. Then, the combined code is run through elfwrap to produce a functional compiler binary. [1] "META II -- a syntax-oriented compiler writing language" -- D. V. Schorre ... http://www.ibm-1401.info/Meta-II-schorre.pdf Legal information ----------------- Copyright (c) 2018-2019 Pim Goossens Except where explicitly specified otherwise, files in this repository are distributed under two alternative copyright licenses: * The GNU Lesser General Public License (LGPL), version 3 or any later version * The GNU General Public License (GPL), version 2 or any later version In other words, they are dual-licensed. In order to legally redistribute these files, all terms of at least one of these licenses must be complied with. This software is provided with NO WARRANTY of any kind. See the file COPYING for the exact terms of the LGPL version 3. Note that it makes references to another license, namely the GNU General Public License (GPL) version 3. A copy of that license is included for reference in the file "GPL-3.txt". See the file "GPL-2.txt" for the GPL version 2.