Super-small Brainfuck compiler in sh

The following shell script is a compiler for the Brainfuck esoteric programming language, implemented as a Unix shell script using tr, sed, and a C compiler which is presumed to be available on your system under the name cc (tested with GCC and Clang). The entire program is 182 bytes.

#!/bin/sh
tr -Cd '][.,<>+-'|sed 's/\./putchar(*p);/g;s/,/*p=getchar();/g;s/[+-]/&&*p;/g;s/[<>]/&&p;/g;s/\[/while(*p){/g;y/]<>/}-+/;s/^/main(){int a[30000];int *p=a;/;s/$/}/'|cc -xc -

The compiler reads Brainfuck code from standard input and creates an object file on disk called a.out (or whatever your C compiler’s default output file name is). It performs almost no error checking or reporting. So usage is something like ./bfc < hello.bf && ./a.out.

Comparison to other compilers

Urban Müller invented Brainfuck so that he could write a compiler (for the Amiga) that was less than a binary kilobyte in size. Apparently he eventually managed to produce a compiler in 240 bytes of object code (later apparently less than 200).

The smallest Brainfuck interpreter appears to be 160 bytes of C source code, though since its data array is only 99 cells instead of the usual 30,000, as a ‘true’ implementation of Brainfuck it would be 163 bytes.

I’ve seen reports of compilers that are as 140 or so bytes of object code, though apparently the developers of these compilers decided to keep their tricks to themselves: I haven’t yet found any actual source code or object files for a compiler that is smaller than mine. I believe my compiler is the second- or third-smallest Brainfuck compiler in the world, and the smallest compiler for which source code is available. Given that all other compilers of similar size seem to produce machine-specific code directly, this might well also be the world’s smallest portable Brainfuck compiler.

My compiler can be reduced in size to 172 bytes by leaving off the shebang line (which means it has to be invoked with sh ./bfc instead of merely by ./bfc). I prefer to leave the shebang in as I think a byte-length comparison between my compiler, which is provided (and executed) as shell script source code, and others which are measured by size as a compiled object file, is slightly fairer when they are both invoked the same way. (The measurements of object-file size also presumably include the object file headers and other necessary padding comparable to the shebang.)

It’s also possible to save 9 bytes by removing the |cc -xc - from the end, causing the compiler to generate C code on standard output rather than an executable file. That is technically a ‘Brainfuck compiler’ since it compiles Brainfuck into C, but I feel such a thing is likewise cheating.

Other approaches

It might be possible to generate an object file with machine code directly using the same technique with sed. I initially rejected this as I thought the number of backslash-escape characters needed for the binary bytes would be prohibitive … but if you edited the compiler with a hex editor and embedded the bytes directly it might be possible. I have not done this (yet) as I don’t wish to dig into the specifics of machine code instructions and (in particular) object file header formats.