(Back to main page)

Compiler and bootstrapping projects — overview

This is a series of compilers and metacompilers initially built from scratch. That is, the first one in the series was built without using any existing compilers other than a small program that converts a sequence of hexadecimal digits to binary bytes.

The graph below shows the top-level relationships between the sub-projects. An arrow indicates that the compiler (or one of the compilers in that project) compiles the (first) one from the project it points to. For example, cmeta is used to build the first of the mcss compiler series.

cmeta blc mcss back prox asterix
A graph showing X-compiles-Y relationships between the projects
(.dot source)

Most of these compilers were built solely as stepping stones to bootstrap their follow-up project (and perhaps to gain some experience and learn from). Although they all have readmes with some basic information, only Asterix has any significant documentation describing the language itself (with examples) and how to use it.


Project descriptions

This section briefly describes the individual compilers and some of the accompanying tools, tests, and example programs.

cmeta

Inspired by: META II and bcompiler (1, 2 imports to GitHub)

The first metacompiler in this series, cmeta is bootstrapped from a pair of hexadecimal-to-binary converters, hex and lhex (hex with single-character labels), which are included in its repository to start things off.

It only reads or writes a single character at a time and does not backtrack, so it's very limited and generally only works with languages consisting of single-character tokens. An example of such a language is blc.

Links:

mcss

This is a series of follow-up compilers (metacompiler stepping stones) bootstrapped using cmeta. Each one adds some new capabilities or removes a limitation such as:

One of the examples in this repository (also used for testing) is a compiler for the original META II language. This enables META II programs to be compiled to native machine code.

Links:

prox

Prox was built using am1, the last stepping-stone compiler in the mcss repository. It's quite ugly. The language is untyped (like all the other compilers in this list); there's not much in the way of testing, verification, error reporting, or documentation; the first version, prox1, does not support named variables and all code written in it has to choose which specific registers to use for its work.

On the other hand, prox represents an important milestone: it is the first language in this series that combines (somewhat C-like) block-structured general-purpose elements with specialized constructs for writing compilers in a style similar to the mcss compilers (in turn inspired by META II).

This makes it very capable of compiling itself, and this time the runtime code does not all have to be written in lhex machine code.

The second version, prox2, adds support for named function arguments and local variables. It is self-describing (compiler, assembler, and runtime code) in about 1400-1500 lines of code.

Prox2 is the compiler first used to bootstrap Asterix.

Links:

Asterix

Asterix is a BNF-like, low- to medium-level programming language that has been bootstrapped using prox2, and is based on the experience gained from building prox2 and its predecessors (which are described on this page).

See the Asterix project page for more information and download links.


Old examples and side projects

The compilers listed below are mostly experiments, tests, and demonstrations of the capabilities (and limitations) of the languages and compilers used to create them (cmeta and mcss).

blc

(Not to be confused with John Tromp's “Binary Lambda Calculus” which is unrelated to this)

A toy language somewhere between lambda calculus and Lisp. It was built using cmeta and inherits several of its limitations, such as a 1-byte limit on variable names.

As an example, this is what the map function looks like in blc:

(: m
   (y 
     (\ m
        (\ f
           (\ p
              (? p
                 ((c (f (a p))) ((m f) (d p)))
                 ()))))))

Links:

back

“Not Forth”. This is a basic Reverse Polish Notation (or postfix) language that uses two separate stacks for data and control flow, like PostScript.

Through the use of an am1-to-back compiler (am1 is the ‘metacompiler with arithmetic’ from mcss), it can actually compile itself and potentially other am1 programs.

Example: this is the copy function implemented in terms of the language's primitive operators:

: copy   {dup {dup index exch} exch repeat pop}

Many of the operators it has are similar to those in PostScript, but not all of them.

Links:

(Back to main page)