10 May 2024
10 min
In this article, I will explain how I implemented a tiny brainfuck interpreter, which implementation choices I made and why.
Brainfuck is an esoteric programming language, designed to have the smallest possible compiler. This language is quite easy to learn but can be pretty challenging to use as it has only 8 instructions and a data pointer that points to an 8-bit, 30.000 (at least) bytes long array.
As said, the language relies on 8 instructions, which are :
Every character other than the ones above is considered a comment and is ignored by the interpreter.
Here’s an example program that prints “Hello World!” on the screen
+++++++[>++++++++++<-]>++.--<+++[>++++++++++<-]>+.+++++++..+++.
<++++++++[>----------<-]>+.
<+++++[>++++++++++<-]>+++++.<++[>++++++++++<-]>++++.
+++.------.--------.<+++++++[>----------<-]>+++.
you can find the commented code on my GitHub.
For this project, I chose C for its memory manipulation capabilities and the simplicity of the language. The interpreter’s work can be divided into three steps:
During this phase, the interpreter determines the size of the program, hence it allocates the right amount of memory and creates an array to store the program’s instructions. From now on, this array will be called instruction pointer.
As the instruction pointer is parsed, the following optimizations are made:
This optimization aims to find the position of the ] and [ instructions, so that the interpreter won’t need to find their positions at runtime (probably repeating this process many times as the various loops iterations progress).
To achieve this, a stack and an array are used. When a [ is found, its position is pushed into the stack and, when a ] is found, the last position (which is the position of the last open bracket) is popped from the stack. Now that the positions of the two matching brackets are known, the position of the ] is stored in the array at the index corresponding to the position of the [ and vice versa. We will call the aforementioned array prediction array.
The following example shows the steps of this process :
Let’s say we have this BF program
+++++[>++++++++++<-]>-.-. this program prints the number 10
The first [ is encountered at the index 5, so the stack’s content now is
stack [5]
index 0
When the matching ] is encountered at the index 19, the last element of the stack is popped. Now, we can store the number 19 in the prediction array at the index 5 and the number 5 at the index 19. This is what the content of prediction array should look like after the process
instruction pointer + + + + + [ _ _ _ - ] _ _ _
prediction array [ ][ ][ ][ ][ ][19] _ _ _ [ ] [5] _ _ _
arrays indexes 0 1 2 3 4 5 _ _ _ 18 19 _ _ _
This way, when a [ is parsed, the interpreter can immediately tell where its matching ] is by looking at the prediction array at the same index where it found the [ (keep in mind that the interpreter is parsing the program from the instruction pointer).
This optimization calculates the amount of times an instruction repeats itself.
Let’s take a look at this example
+++++[>+++++<-]
This program is basically a loop that repeats a +5 addition five times. Without any optimization, the program would just repeat five times the + instruction. This isn’t very smart, as we can directly do the +5 addition instead. So, this optimization counts the repetitions in order to make just one operation instead of repeating the same instruction multiple times.
Note that this optimization is performed on all of these target instructions: +, -, <, >. The only difference between the +, - instructions and >, < instructions is that the former is implemented through a regular addition, the latter using pointer arithmetic. That’s because +, - increments (or decrements) the byte pointed by data pointer, and <, > increments (or decrements) the pointer itself.
As these instructions are found, the number of their repetition is stored in prediction array, at the index corresponding to the first occurrence.
This way, when the program is executed, both instruction pointer and prediction array will be read and the interpreter will know where to jump in the BF program (based on the number of repetitions) and how many times an instruction is repeated.
Here’s an example of what the content of prediction array should be, based on the example above
instruction pointer: + + + + + [ > + < - ]
prediction array: [5][ ][ ][ ][ ][10][1][1][1][1][5]
arrays indexes: 0 1 2 3 4 5 6 7 8 9 10
Now that the optimizations are done, it’s time for the actual execution of the program.
This is probably the most straightforward step as the interpreter just needs to execute the instructions of the instruction pointer according to the optimization data stored in prediction array. In order to do that, the interpreter needs to allocate the space for the data array and initialize every byte of the array to zero. The allocation is pretty easy, as brainfuck just needs a 30.000 (or more) bytes long array. By default I chose, to make the array 65535 bytes long and add the possibility for the user to set a custom array size by passing the -s <size> command to the interpreter. The data array will act as the memory of the BF program, which can now be executed.
To implement the execution code block I used a switch-case statement, where every ‘case’ statement is responsible for the execution of a specific BF instruction.
For a better understanding, I suggest you check out my GitHub repository as the code is heavily commented and goes into implementation details :)