A Step-By-Step Guide to Constructing a Programming Language

-

Constructing a programming language from scratch in a number of hours

Trees: The Hierarchical Heart of Computer Science. Source: Jeremy Bishop on Unsplash

The world is stuffed with programming languages with every kind of various use cases. Most of those languages, though, are built for very general purposes — sometimes, we will want to design a language to suit a really specific use case (e.g. Facebook designed React to make it easier to develop their web applications, and Apple recently developed Pkl, a language designed to make configurations easier. There are lots of more examples of this in various fields). As such, knowing easy methods to construct a programming language is a useful skill to have in your belt.

In this text, we are going to construct an interpreted programming language from scratch and learn slightly bit about each the lambda calculus and programming languages as a complete along the way in which. The language we construct here might be fairly esoteric, but the method should hopefully provide you with an idea of easy methods to design your personal use-case-specific languages (and teach you useful details about how programming languages function under the hood).

Since we’re constructing an interpreted language¹, our overarching flow will look something like this:

The broad flow chart for our language

Mainly, we start with some concrete syntax (code) written in our goal language (the language that we try to write down), pass it to some parser that converts it to an abstract syntax tree (a tree representation of the code that’s easy to work with), after which pass that to an interpreter that “runs” the abstract syntax tree to present us a end result. Note that the parser and interpreter are written in some already existing host language — C’s original parser and compiler, as an example, were written in assembly.

** Note: My use of “parser” here encapsulates all the parsing process. Normally, lexing is finished prior to “parsing”, but on this case parsing is just the means of taking concrete syntax to abstract syntax, whatever which will seem like.

For example, consider the next specification for a straightforward language for basic arithmetic:

EXPR = number
| EXPR + EXPR
| EXPR - EXPR
| EXPR * EXPR
| EXPR / EXPR
| (EXPR)

The above, by the way in which, is an EBNF for a context-free grammar². I won’t delve too deeply into what this implies here, but all programming languages written in a form like this are parse-able in polynomial time via the CYK algorithm. For this EBNF, something like (4 + 4) * 3 is a legitimate program, but something like def f(x): return 5; f(5) shouldn’t be.

Suppose we at the moment are given the concrete syntax (4 + 4) * 3 . After parsing, we should always get an abstract syntax tree (AST) like this:

AST for our concrete syntax

Then our interpreter will start at the basis and recursively go down the tree until we get our answer, namely 24.

Note quickly that this grammar is ambiguous — as an example, how should the expression 4 + 4 * 3 parse? It could either parse because the above (that’s, (4 + 4) * 3), or it could also parse as 4 + (4 * 3) — neither parsing is inherently more “correct” in the way in which that we’ve got specified the language, as each are valid parse trees. In cases like this, the parser could have to arbitrarily make a call about easy methods to parse our language.

By the flow chart, our first step should logically be to design our concrete syntax. What you select to make your syntax is totally as much as you. I made a decision to create EmojiLang, a (horrible) language that ensures an especially colourful screen while you’re typing. The grammar is below:

grammar EmojiLang;

program: '🏃‍♂️🏃‍♂️🏃‍♂️' expr '🛑🛑🛑' EOF;

expr: '(' (ID
| atom
| ifCond
| anonFunctionDefn
| funApplication
| varAssignment
| READ_FLOAT
| READ_STRING
| printExpr
| sequentialOp
| baseComputation) ')';

atom: NUMBER | BOOLEAN | STRING;
ifCond: '🤔' cond=expr '❓' ontrue=expr ':' onfalse=expr;
anonFunctionDefn: '🧑‍🏭' arg=ID '⚒️' body=expr;
funApplication: '🏭' fun=expr arg=expr;
varAssignment: '📦' var=ID '🔜' val=expr;
printExpr: '🖨️' expr;
sequentialOp: '📋' first=expr second=expr;
baseComputation: left=expr op=('➕' | '➖' | '✖️' | '➗' | '🟰' | '≤') right=expr;

ID: [a-zA-Z_][a-zA-Z0-9_]*;
NUMBER: [0-9]+ ('.' [0-9]+)?;
BOOLEAN: '👍' | '👎';
STRING: '"' ~[rn"]* '"';
READ_FLOAT: '🔢';
READ_STRING: '🔤';

WHITESPACE: [ trn]+ -> skip;
COMMENT: '😴' .*? '⏰' -> skip;
LINE_COMMENT: '🥱' ~[rn]* -> skip;

** Note: the above specification is written to be utilized in a tool called ANTLR, we’ll get back to this very soon.

This language is, after all, ridiculous, but it surely is interesting for a few reasons. Firstly, all of our expressions are required to be parenthesized. This makes it extremely annoying to code, but in addition makes our grammar non-ambiguous. Secondly, notice how we will only define anonymous functions — there isn’t any equivalent syntax for something like def in Python. Finally, all functions on this language (apart from the bottom computations) have exactly one argument. We’ll explore the implications of the last two design decisions after we mess around with our language in a bit.

We are able to, after all, write our own parser. Luckily though, there are tools that may parse arbitrary context-free grammars. For this tutorial, we are going to use ANTLR (you may download it here). ANTLR is a really nice and easy-to-use tool that takes grammar specifications just like the above and creates parsers in quite a lot of goal languages (on this tutorial, we might be using Python).

Using it’s fairly easy, just follow the next steps:

  1. Download the ANTLR Java binaries from here
  2. Create a .g4 file (just like the above) in your grammar. Note that ANTLR can’t actually handle emojis thoroughly, so when you plan to make use of emojis in your language, run the next python script to demojify your grammar:
import emoji
import sys

def demojify_file(input_file, output_file):
with open(input_file, "r", encoding="utf-8") as f:
input_text = f.read()

input_text = emoji.demojize(input_text)

with open(output_file, "w", encoding="utf-8") as f:
f.write(input_text)

if __name__ == "__main__":
input_file = sys.argv[1]
output_file = sys.argv[2]
demojify_file(input_file, output_file)

3. Run java -Xmx500M -cp org.antlr.v4.Tool -Dlanguage=Python3 to generate your parser

You may then import the generated parsing files and use them as follows:

from antlr4 import *
from EmojiLangLexer import EmojiLangLexer
from EmojiLangParser import EmojiLangParser
from EmojiLangListener import EmojiLangListener
import emoji

if __name__ == "__main__":
input_file = sys.argv[1]
with open(input_file, "r", encoding="utf-8") as f:
input_text = f.read()

input_text = emoji.demojize(input_text)
input_stream = InputStream(input_text)
lexer = EmojiLangLexer(input_stream)
token_stream = CommonTokenStream(lexer)
parser = EmojiLangParser(token_stream)
tree = parser.program()

if parser.getNumberOfSyntaxErrors() > 0:
exit(1)

You almost certainly won’t must do the demojizing step during which case you should utilize antlr4’s FileStream as an alternative of the InputStream , but it surely really doesn’t matter. Now, we’ve got a really nice abstract syntax tree that’s easy to work with, and we will move on to the hard part — interpreting³

Because we’re working with trees, our interpreter will naturally be a recursive entity. We do have some decisions though on how exactly we implement a few of its features. For this tutorial, we are going to construct an interpreter that uses environments that bind variables to addresses after which a mutable store that maps addresses to values. This concept is fairly common, though not ubiquitous, and it allows us to keep up proper scope and support variable mutation. For ease of implementation, we will even make our interpreter return a typical value structure.

Values, Stores, and the Environment

First, let’s define what our interpreter can output. We’ve three obvious base cases in our EBNF (namely booleans, strings, and numbers) so let’s create value objects for those:

class Value:
def __init__(self, value):
self.value = value

def __str__(self) -> str:
return str(self.value)

class NumValue(Value):
def __init__(self, value: float):
super().__init__(value)

class StringValue(Value):
def __init__(self, value: str):
super().__init__(value)

class BoolValue(Value):
def __init__(self, value: bool):
super().__init__(value)

To store mappings of variables to values, we will even create an environment and a store:

class EnvLookupException(Exception):
pass

class Environment:
def __init__(self):
self.vars = {}

def set_var(self, name, addr: int):
self.vars[name] = addr

def get_var(self, name):
if name not in self.vars:
raise EnvLookupException(f"Variable {name} not present in environment")
return self.vars[name]

def copy(self):
new_env = Environment()
new_env.vars = self.vars.copy()
return new_env

class Store:
def __init__(self):
self.store = []

def alloc(self, value: Value):
self.store.append(value)
return len(self.store) - 1

def get(self, addr: int):
if addr >= len(self.store):
raise EnvLookupException(f"Address {addr} not present in store")
return self.store[addr]

def set(self, addr: int, value: Value):
if addr >= len(self.store):
raise EnvLookupException(f"Address {addr} not present in store")
self.store[addr] = value

Effectively, our surroundings will store variable → address bindings, and our store will hold address → value bindings. The shop is probably not crucial with our current feature set, but when we allowed for mutation for pass-by-reference variables, it might be useful⁴.

Ideally, we’d also wish to pass functions as variables, so we’d like yet another value to represent them. To do that, we create a closure, which accommodates the function’s parameter, body, and the environment that it was created in:

class ClosureValue(Value):
# Body needs to be an antlr4 tree
def __init__(self, param: str, body: object, env: 'Environment'):
super().__init__(None)
self.param = param
self.body = body
self.env = env

def __str__(self) -> str:
return f""

Chances are you’ll reasonably ask about why we’d like the environment stored within the function. Suppose as an alternative that we had a function value like this:

class FunctionValue(Value):
# Body needs to be an antlr4 tree
def __init__(self, param: str, body: object):
super().__init__(None)
self.param = param
self.body = body

def __str__(self) -> str:
return f""

Now, suppose that we had code akin to the next in our language:

# ----------------
# ENV MUST PERSIST
# ----------------
def f(x):
def g(y):
return x + y
return g(x)

print((f(4))(5)) # 9

# ----------------
# ENV MUST CLEAR
# ----------------
def f2(x):
return x + y

def g2(y):
return f(5)

print(f(4)) # Should crash

To make sure that y remains to be in scope for g in the highest case, we’d must implement dynamic scoping (scope where variables are added to the environment as this system runs without being cleared) without closures, meaning that the underside code would actually run and print 9. For the underside code to properly crash though, we will’t implement dynamic scope. Thus we would like the functions to effectively remember what environment they were created in — hence why we add environments to our closure class⁵.

The Interpreter

Now, we’re ready to write down our actual interpreter. In ANTLR, our interpreter will extend the EmojiLangListener class. We will even have to create a top-level environment and provides the interpreter a store:

class EmojiLangException(Exception):
pass

TOP_ENV = Environment()

class Interpreter(EmojiLangListener):
def __init__(self):
self.store = Store()

Now, we’d like to create an interp method that handles all of our expression cases. It’s going to find yourself looking something as follows:

def interp(self, prog, env: Environment) -> Value:
if prog.ID():
return self.interp_id(prog.ID())
elif prog.atom():
return self.interp_atom(prog.atom())
elif prog.anonFunctionDefn():
return self.interp_function_defn(prog.anonFunctionDefn())
elif prog.READ_FLOAT():
return self.interp_read_float()
elif prog.READ_STRING():
return self.interp_read_string()
elif prog.printExpr():
return self.interp_print_expr()
elif prog.ifCond():
return self.interp_if(prog.ifCond(), env)
elif prog.sequentialOp():
return self.interp_sequential_op(prog.sequentialOp(), env)
elif prog.baseComputation():
return self.interp_base_computation(prog.baseComputation(), env)
elif prog.varAssignment():
return self.interp_var_assignment(prog.varAssignment(), env)
elif prog.funApplication():
return self.interp_fun_application(prog.funApplication(), env)

Our base cases (IDs, atoms, function definitions, reading, and printing) are fairly easy, so we will just write them in:

def interp(self, prog, env: Environment) -> Value:
if prog.ID():
return self.store.get(env.get_var(prog.ID().getText()))
elif prog.atom():
return self.interp_atom(prog.atom())
elif prog.anonFunctionDefn():
return ClosureValue(prog.anonFunctionDefn().arg.text, prog.anonFunctionDefn().body, env)
elif prog.READ_FLOAT():
try:
return NumValue(float(input("> ")))
except ValueError:
raise EmojiLangException("Expected float input")
elif prog.READ_STRING():
return StringValue(input("> "))
elif prog.printExpr():
value = self.interp(prog.printExpr().expr(), env)
if isinstance(value, StringValue):
# print without quotes
print(str(value)[1:-1])
else:
print(value)
return value
# ...

def interp_atom(self, atm):
if atm.NUMBER():
return NumValue(float(atm.NUMBER().getText()))
elif atm.BOOLEAN():
return BoolValue(atm.BOOLEAN().getText() == ":thumbs_up:")
elif atm.STRING():
return StringValue(atm.STRING().getText())
# THIS SHOULD NEVER HAPPEN
raise EmojiLangException(f"Unknown atom {atm.getText()}")

Our if condition can be fairly easy. We just have to interpret the condition after which return the results of interpreting the true case if it’s true and the false case if its false. The code is thus just

def interp_if(self, if_cond, env: Environment):
cond = self.interp(if_cond.cond, env)
if not isinstance(cond, BoolValue):
raise EmojiLangException(f"Expected boolean when evaluating if condition, got {cond}")
return self.interp(if_cond.ontrue if cond.value else if_cond.onfalse, env)

Sequential operations are similarly easy, they only have to interpret the primary expression after which the second. We are able to thus replace the code in that block as follows:

def interp(self, prog, env: Environment) -> Value:
# ...
elif prog.sequentialOp():
self.interp(prog.sequentialOp().first, env)
return self.interp(prog.sequentialOp().second, env)
# ...

Next are the bottom computations. It is a decent amount of code since we’d like to handle a variety of operations, but it surely’s not super complex:

def interp_base_computation(self, base_computation, env: Environment):
left, right = self.interp(base_computation.left, env), self.interp(base_computation.right, env)
if base_computation.op.text == ":plus:":
if isinstance(left, NumValue) and isinstance(right, NumValue):
return NumValue(left.value + right.value)
elif isinstance(left, StringValue) and isinstance(right, StringValue):
return StringValue(left.value + right.value)
raise EmojiLangException(f"Cannot add {left} and {right}")
if base_computation.op.text == ":heavy_equals_sign:":
if type(left) != type(right):
return BoolValue(False)
if isinstance(left, ClosureValue):
raise EmojiLangException("Cannot compare functions")
return BoolValue(left.value == right.value)

# NUMBERS ONLY COMPUTATIONS
if not isinstance(left, NumValue) or not isinstance(right, NumValue):
raise EmojiLangException(f"Expected numbers when evaluating base computation, got {left} and {right}")
if base_computation.op.text == ":minus:":
return NumValue(left.value - right.value)
elif base_computation.op.text == ":multiply:":
return NumValue(left.value * right.value)
elif base_computation.op.text == ":divide:":
if right.value == 0:
raise EmojiLangException("Division by zero")
return NumValue(left.value / right.value)
elif base_computation.op.text == "≤":
return BoolValue(left.value <= right.value)

Perhaps this will also be cleaned up a bit with the usage of e.g. dictionaries, but that’s not super necessary. Next we’ve got variable assignments, that are also not too complicated. We’ve a pair decisions here on what exactly we would like it to do — namely, should it allow recent variables to return into existence or only modify existing ones? I’ll select the latter case, which makes our code seem like this:

def interp_var_assignment(self, var_assign, env: Environment):
value = self.interp(var_assign.val, env)
addr = env.get_var(var_assign.var.text)
self.store.store[addr] = value
return value

Finally, we’ve got function applications. Here, we’ve got to do 4 steps. First, we interpret the function we’re calling to get our closure. Then, we interpret our argument. Then, we bind the argument into a duplicate of the closure’s environment. Finally, we interpret the closure’s body in the brand new environment. The code finally ends up looking as follows:

def interp_fun_application(self, fun_app, env: Environment):
closure = self.interp(fun_app.fun, env)
if not isinstance(closure, ClosureValue):
raise EmojiLangException(f"Expected function when evaluating function application, got {closure}")
arg = self.interp(fun_app.arg, env)
new_env = closure.env.copy()
new_env.set_var(closure.param, self.store.alloc(arg))
return self.interp(closure.body, new_env)

Now, our interp is fully functional! We’d like only modify our essential program to run it now on a program:

if __name__ == "__main__":
input_file = sys.argv[1]
with open(input_file, "r", encoding="utf-8") as f:
input_text = f.read()

# Preprocess input to switch emojis with demojized names
input_text = emoji.demojize(input_text)

input_stream = InputStream(input_text)
lexer = EmojiLangLexer(input_stream)
token_stream = CommonTokenStream(lexer)
parser = EmojiLangParser(token_stream)
tree = parser.program()

if parser.getNumberOfSyntaxErrors() > 0:
exit(1)

interpreter = Interpreter()
interpreter.interp(tree.expr(), TOP_ENV)

We at the moment are finally ready to start out writing programs in our language. Here’s a straightforward hello world program within the emoji language (EML):

🏃‍♂️🏃‍♂️🏃‍♂️
(🖨️ ("Hello World!"))
🛑🛑🛑

And to run it, we just do

> python emoji_lang.py helloworld.eml
Hello World!

(the above, after all, assumes that this system is present in a file called helloworld.eml )

Currying

Back in the primary section, I noted that our programming language is interesting because functions can only take one argument. How then will we create an effect much like multivariate functions? Consider, as an example, the next python code:

def f(x, y):
return x + y

print(f(3, 4))

f here has arity 2 — that’s, it takes two arguments. We are able to, nonetheless, write equivalent code that only uses functions of arity one (except addition) as follows:

def f(x):
def g(y):
return x + y
return g

print((f(3))(4))

The above concept of turning higher arity functions into unary functions is named currying. It really works for functions of any arity — for a function f of arity n, we will simply perform currying n-1 times. In emoji language, this enables us to write down a program like this:

🏃‍♂️🏃‍♂️🏃‍♂️
(📋
(🖨️ ("Enter two numbers to compute their sum."))
(🖨️
(🏭
(🏭
(🧑‍🏭 x ⚒️
(🧑‍🏭 y ⚒️
((x) ➕ (y))
)
)
(🔢))
(🔢))
)
)
🛑🛑🛑

the Python translation of which is

print("Enter two numbers to compute their sum.")
print(((lambda x: (lambda y: x + y)))(float(input()))(float(input())))

Or, more legibly,

print("Enter two numbers to compute their sum.")

def f(x):
def g(y):
return x + y
return g

x = float(input())
y = float(input())

print(f(x)(y))

Notice also how the primary python iteration used no named functions. It seems that we don’t really want them, though after all they’re useful for readability. Then we get

> python emoji_lang.py currying.eml
Enter two numbers to compute their sum
> 4
> 5
9.0

Recursion

How will we do a loop or recursion on this language though? We’ve no syntax for for or while , and without names for functions, it could appear to be recursion is not possible.

We are able to, nonetheless, do a neat little trick. Since functions are values, we will pass functions to themselves at any time when we call them, thus allowing them to call themselves.

Take, as an example, the next python code:

n = int(input())
while n > 0:
print(n)
n -= 1

We are able to convert this to a recursive version using something like this⁶:

def while_loop(condition, body):
"""
A recursive implementation of some time loop.

Arguments
-------------
- condition: Some function of zero arguments that returns a boolean value
- body: Some function of zero arguments to run while the condition returns true
"""
if condition():
body()
while_loop(condition, body)
else:
return False

class Box:
def __init__(self, n):
self.n = n

def set_n(self, n):
self.n = n

def get_n(self):
return self.n

n = Box(int(input()))

def body():
print(n.get_n())
n.set_n(n.get_n() - 1)

while_loop(lambda: n.get_n() > 0, body)

But we do have an issue here — namely, notice how while_loop calls itself. We are able to’t try this in our language with only anonymous functions, so how will we fix that? The reply is that we will do something like this:

def while_loop(self, condition, body):
if condition():
body()
self(self, condition, body)
else:
return False

# ...
# (define n as a box)
# ...

def body():
print(n.get_n())
n.set_n(n.get_n() - 1)

def call_while(loop_func, condition, body):
loop_func(loop_func, condition, body)

call_while(while_loop, lambda: n.get_n() > 0, body)

In effect, we make while_loop take itself as a parameter. Then, we will call self inside while_loop , allowing while_loop to call itself recursively.

We still, after all, have to lambda-fy and curry this for our language, so we’d like to make code akin to the next:

(((lambda while_loop: 
lambda n:
while_loop(while_loop)
(lambda bogus: n.get_n() > 0)
(lambda bogus: print(n.get_n()) or n.set_n(n.get_n() - 1)))
(lambda self:
lambda cond:
lambda body:
(body("BOGUS") or self(self)(cond)(body)) if cond("BOGUS") else False))
(Box(int(input()))))

It is a little bit of a doozy, but it surely does work. In emoji lang, this then becomes

🏃‍♂️🏃‍♂️🏃‍♂️
(🏭
(🏭
(🧑‍🏭 while ⚒️
(🧑‍🏭 n ⚒️
(🏭 (🏭 (🏭 (while) (while))
(🧑‍🏭 bogus ⚒️ (🤔 ((n) ≤ (0)) ❓ (👎) : (👍))))
(🧑‍🏭 bogus ⚒️ (📋
(🖨️ (n))
(📦 n 🔜 ((n) ➖ (1)))
)))
))
😴
Below is our while function. Note that it takes
itself as an argument - this enables for recursion
(there are other ways to do that, but recursion via self
passing is fairly easy)

ARGS:
1. self(itself)
2. condition_func (function of zero arguments that returns a boolean)
3. body (function of zero arguments that returns nothing to run while true)

RETURNS:
false when finished

(🧑‍🏭 self ⚒️
(🧑‍🏭 condition_func ⚒️
(🧑‍🏭 body ⚒️
(
🤔 (🏭 (condition_func) ("BOGUS")) ❓
(📋
(🏭 (body) ("BOGUS"))
(🏭 (🏭 (🏭 (self) (self))
(condition_func))
(body))) :
(👎)
))))
)
(🔢))
🛑🛑🛑

Then we get

> python emoji_lang.py while_loop.eml
> 4
4.0
3.0
2.0
1.0

Bonus: The Y Combinator

It’s somewhat annoying to pass in while to itself every time we would like to call it, so what if we could create a version of while that already had itself curried? It seems we will with something called the Y Combinator. The Y combinator is a function that appears as follows:

Y = lambda g: (lambda f: g(lambda arg: f(f)(arg))) (lambda f: g(lambda arg: f(f)(arg)))

It’s completely absurd, but it surely allows us to effectively “store” a function in itself. I won’t go into an excessive amount of detail about it, but when you’d wish to learn more I counsel this excellent article

With the combinator though, we will change our code to the next:

🏃‍♂️🏃‍♂️🏃‍♂️
(🏭
(🏭
(🏭
(🧑‍🏭 y_combinator ⚒️
(🧑‍🏭 while ⚒️
(🧑‍🏭 n ⚒️
(📋
🥱y-combinate our while
(📦 while 🔜 (🏭 (y_combinator) (while)))
(🏭 (🏭 (while)
(🧑‍🏭 bogus ⚒️ (🤔 ((n) ≤ (0)) ❓ (👎) : (👍))))
(🧑‍🏭 bogus ⚒️ (📋
(🖨️ (n))
(📦 n 🔜 ((n) ➖ (1)))
))
)
)
)
)
)
😴
Our y combinator function - this enables for recursion without e.g. self passing
by effectively currying the function and passing it to itself.

(🧑‍🏭 fn_nr ⚒️
(🏭
(🧑‍🏭 cc ⚒️
(🏭 (fn_nr)
(🧑‍🏭 x ⚒️ (🏭 (🏭 (cc) (cc)) (x)))
)
)
(🧑‍🏭 cc ⚒️
(🏭 (fn_nr)
(🧑‍🏭 x ⚒️ (🏭 (🏭 (cc) (cc)) (x)))
)
)
)
)
)
(🧑‍🏭 while ⚒️
(🧑‍🏭 condition_func ⚒️
(🧑‍🏭 body ⚒️
(
🤔 (🏭 (condition_func) ("BOGUS")) ❓
(📋
(🏭 (body) ("BOGUS"))
(🏭 (🏭 (while)
(condition_func))
(body))) :
(👎)
))))
)
(🔢))
🛑🛑🛑

Now, notice how our call to while after it has been y-combinated⁷ only involves passing the condition and the body — we don’t have to pass itself. and we still get

> python emoji_lang.py y_comb_while.eml
> 4
4.0
3.0
2.0
1.0

Congratulations! You’ve now hopefully built your personal programming language and coded some fun things in it. Though something like EmojiLang obviously doesn’t have much real use, you may imagine some cool use cases for constructing your personal languages, e.g. by creating a brilliant case-specific scripting language to hurry up common tasks.

In case you’d like some challenges, try the next exercises:

Exercise 1: Construct a straightforward parser and interpreter for the next language without using ANTLR. Be certain that parenthesis all the time take precedence, and that operations otherwise have equal precedence (e.g. 4 + 4 * 3 should evaluate to 24 , not 16 )

EXPR = number
| EXPR + EXPR
| EXPR - EXPR
| EXPR * EXPR
| EXPR / EXPR
| (EXPR)

Exercise 2: Modify your above code so as to add operator precedence

Exercise 3 (Tricky): We don’t have to make all of our functions anonymous. Try implementing an interpreter for the next language (you should utilize ANTLR, but you’ll have to write down your personal .g4 file):

program = (FUNDEF | EXPR)* // a number of function definitions or expressions

// NOTE: implies that something is a string
// also, be happy to disregard whitespace or add semicolons or parenthesize
// expressions/fundefs when you please

EXPR = number
| functionApplication
| computation

FUNDEF = 'def' '(' * '):' EXPR

functionApplication = '(' EXPR* ')' // e.g. f(1, 2+2, g(3))
computation = EXPR + EXPR
| EXPR - EXPR
| EXPR * EXPR
| EXPR / EXPR
| (EXPR)

Exercise 4 (Easy →Very Very Hard): .g4 files for a ton of real languages could be found here. Implement an interpreter for any one in every of them.

Please contact mchak@calpoly.edu for any inquiries

P.S. Thanks to Brian Jones, my CS 430 professor, for teaching me a variety of these items.

All images by creator unless otherwise stated.

ASK DUKE

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x