Brainfuck interpreter

10 May 2024

10 min

In this article, I will explain how I implemented a tiny brainfuck interpreter, which implementation choices I made and why.

But, what is brainfuck?

Brainfuck is an esoteric programming language, designed to have the smallest possible compiler. This language is quite easy to learn but can be pretty challenging to use as it has only 8 instructions and a data pointer that points to an 8-bit, 30.000 (at least) bytes long array.

Instructions

As said, the language relies on 8 instructions, which are :

+ : increment the value of the memory cell pointed by data pointer by one
- : decrement the value of the memory cell pointed by data pointer by one
> : increment the data pointer by one (point the neighbor cell to the right)
< : decrement the data pointer by one (point the neighbor cell to the left)
. : output the value of the memory cell pointed by data pointer
, : accept one-byte value as input and store its value in the memory cell pointed by data pointer
[ : if the value of the memory cell pointed by data pointer is zero jump to the corresponding ]. Else read the next instruction
] : if the value of the memory cell pointed by data pointer is non-zero jump to the corresponding [. Else read the next instruction

Every character other than the ones above is considered a comment and is ignored by the interpreter.

Here’s an example program that prints “Hello World!” on the screen

+++++++[>++++++++++<-]>++.--<+++[>++++++++++<-]>+.+++++++..+++.
<++++++++[>----------<-]>+.
<+++++[>++++++++++<-]>+++++.<++[>++++++++++<-]>++++.
+++.------.--------.<+++++++[>----------<-]>+++.

you can find the commented code on my GitHub.

So, how does the interpreter work?

For this project, I chose C for its memory manipulation capabilities and the simplicity of the language. The interpreter’s work can be divided into three steps:

First read
Optimization
Execution

First read

During this phase, the interpreter determines the size of the program, hence it allocates the right amount of memory and creates an array to store the program’s instructions. From now on, this array will be called instruction pointer.

Optimization

As the instruction pointer is parsed, the following optimizations are made:

Jump prediction

This optimization aims to find the position of the ] and [ instructions, so that the interpreter won’t need to find their positions at runtime (probably repeating this process many times as the various loops iterations progress).

To achieve this, a stack and an array are used. When a [ is found, its position is pushed into the stack and, when a ] is found, the last position (which is the position of the last open bracket) is popped from the stack. Now that the positions of the two matching brackets are known, the position of the ] is stored in the array at the index corresponding to the position of the [ and vice versa. We will call the aforementioned array prediction array.

The following example shows the steps of this process :

Let’s say we have this BF program

+++++[>++++++++++<-]>-.-. this program prints the number 10

The first [ is encountered at the index 5, so the stack’s content now is

stack [5]
index  0

When the matching ] is encountered at the index 19, the last element of the stack is popped. Now, we can store the number 19 in the prediction array at the index 5 and the number 5 at the index 19. This is what the content of prediction array should look like after the process

instruction pointer   +  +  +  +  +  [   _ _ _  -   ]   _ _ _
prediction array     [ ][ ][ ][ ][ ][19] _ _ _ [ ] [5]  _ _ _
arrays indexes        0  1  2  3  4   5  _ _ _ 18   19  _ _ _

This way, when a [ is parsed, the interpreter can immediately tell where its matching ] is by looking at the prediction array at the same index where it found the [ (keep in mind that the interpreter is parsing the program from the instruction pointer).

Repetition counter

This optimization calculates the amount of times an instruction repeats itself.

Let’s take a look at this example

+++++[>+++++<-]

This program is basically a loop that repeats a +5 addition five times. Without any optimization, the program would just repeat five times the + instruction. This isn’t very smart, as we can directly do the +5 addition instead. So, this optimization counts the repetitions in order to make just one operation instead of repeating the same instruction multiple times.

Note that this optimization is performed on all of these target instructions: +, -, <, >. The only difference between the +, - instructions and >, < instructions is that the former is implemented through a regular addition, the latter using pointer arithmetic. That’s because +, - increments (or decrements) the byte pointed by data pointer, and <, > increments (or decrements) the pointer itself.

As these instructions are found, the number of their repetition is stored in prediction array, at the index corresponding to the first occurrence.

This way, when the program is executed, both instruction pointer and prediction array will be read and the interpreter will know where to jump in the BF program (based on the number of repetitions) and how many times an instruction is repeated.

Here’s an example of what the content of prediction array should be, based on the example above

instruction pointer:  +  +  +  +  +   [  >  +  <  -  ]
prediction array:    [5][ ][ ][ ][ ][10][1][1][1][1][5]
arrays indexes:       0  1  2  3  4   5  6  7  8  9  10

Now that the optimizations are done, it’s time for the actual execution of the program.

Execution

This is probably the most straightforward step as the interpreter just needs to execute the instructions of the instruction pointer according to the optimization data stored in prediction array. In order to do that, the interpreter needs to allocate the space for the data array and initialize every byte of the array to zero. The allocation is pretty easy, as brainfuck just needs a 30.000 (or more) bytes long array. By default I chose, to make the array 65535 bytes long and add the possibility for the user to set a custom array size by passing the -s <size> command to the interpreter. The data array will act as the memory of the BF program, which can now be executed.

To implement the execution code block I used a switch-case statement, where every ‘case’ statement is responsible for the execution of a specific BF instruction.

For a better understanding, I suggest you check out my GitHub repository as the code is heavily commented and goes into implementation details :)