Find out how HackerEarth can boost your tech recruiting

Learn more

Building your own Lisp Parser Part I

Guest Author

Author

5 mins read

February 1, 2017

Last updated on

February 23, 2024

While writing a full-blown compiler for a programming language is a difficult and frustrating task, writing a smaller and more specific parser can be surprisingly easy if you know a small trick.

On the other hand, parsing problems pops up at several places in modern-day programming. So, learning this useful trick can be rewarding.

Source: http://mox.ingenierotraductor.com/2015/12/translation-is-like.html

Prerequisites

You need to know the basics of Python, more specifically, you should know the concepts of recursion and flow of control.

Objectives

After reading and understanding this post, you will be able to create simple calculators, interactive interpreters, parsers, very limited and small programming languages, etc. In general, you should be able to take input, tokenize it, perform whatever actions you want to on the tokens, and output the result of the process.

At the end of this post, you will have created a simple, Lisp-like prefix calculator. Following is a demonstration of how it’s going to look:

> ( + 3 2 )
= 5
> ( / 2 0 )
DivisionByZero
> ( – -3 2 )
= -5
> -2
= -2
> ( + ( * 3 2 ) 5 )
= 11

Step 1: Writing the Grammar

The first step to writing a parser is to write a clear grammar for its syntax. The grammar determines what is and what is not right. Once you have written the grammar, translating it to Python code is a trivial chore. The grammar will serve as our first pseudocode.

For our tiny calculator, we know that the input can come in two forms: a Number (-2, .5, +8, 8.5, 9.) or a more complicated Expression begins with a (, followed by an operator, etc.).

For writing a grammar, we need to identify different elements of the syntax. So far, we have Expression, Number, and Operator. The next important thing to do is to structure the elements (known as terms) into a hierarchical form. This is shown below:

Expression:
Number

( Operator Expression Expression )

Number:
a floating-point number ([-+][0-9][*\.][0-9]*)

Operators:

–

You will notice that Operator and Expression have no parent; they are independent terms.

A grammar is read from the bottom up and different choices appear on distinct lines. Our grammar says that:

an Operator is one of +, -, *, /.
a Number is a floating-point number which matches the RegEx [-+][0-9]*[\.][0-9]*
an Expression is either a Number or a ( followed by an Operator, followed by two other Expressions, and finally ends in a ). Note that the definition of an Expression is recursive.

Step 2: Translating the Grammar into Pseudocode

Pseudocode is fake code resembling English which is supposed to be an intermediate code that can easily be converted into real code. Although writing pseudocode is optional, it is really helpful.

The trick here is to put each term from our grammar into a separate function. Whenever we need to apply the grammar of a certain term, we only have to call the function. Following is the pseudocode implementing the grammar above:

https://gist.github.com/HackerEarthBlog/f0a5a4304326936142da39b0d853f944

This is our rough pseudo-code that should be good enough for our purpose. In the next step, we will write the real code.

3. Writing the Code

It is said very profoundly about Python that reading and writing Python feels like doing pseudocode. The same applies here, but there is one small caveat— Python doesn’t provide any function for “unreading” or putting a character back in the input buffer.

For this, I have created a small class which extends the file object to include this feature. To keep things simple, I have avoided inheritance and my class is not compatible with the file object provided by Python. Treat it like a black-box if you don’t want to understand it.

https://gist.github.com/HackerEarthBlog/6465f93e1ca155ded5e8b0c8294f16ba

Here is the buffer.py file which handles buffered input:

https://gist.github.com/HackerEarthBlog/5330e5f11f96a22608b45affa61fa858

Explanation

expression():

expression() is our top-level function and maps the Expression grammar term. We first ignore all the whitespace. After that, it takes a single non-whitespace character as input and checks it against several possibilities.

If the input string starts with +, -, ., or a digit, it is a number. We put the character back and input the entire number.

If the input string starts with (, a complete expression is to follow. We input the operator, two more expressions which will serve as the operands, and finally the closing parenthesis. We then calculate the result and return it.

number():

The number function maps the Number grammar term and is very simple—just a wrapper around getword. We input a whole word and if it converts to a float, we return it, otherwise the function returns an error message.

operator():

The operator function inputs a single character and tests it for equality against several known operators. Like the above two functions, it also maps a grammar term, i.e., Operator. In case the given operator is not valid, an error message is returned.

calc():

The calc function is actually not necessary but makes the code substantially better. In an ideal program, each function should do only one logical task. calc removes some burden from expression.

UngetableInput

Although Python 3 supports buffered input through stdin.buffer, Python 2 has no such facility. Plus, Python 3’s stdin.buffer would still require us to create some wrapper of our own.

The UngetableInput class wraps Python’s basic input to go through a buffer. We take input into the buffer and put a character back into the buffer when ungetc is called. Unless the buffer is empty, all input comes from the buffer.

Homework

This code works and leaves a lot of cleaning as homework for the reader. 🙂 Following is a list of things you can do to improve and extend the rudimentary calculator:

Improve buffer.py to handle input whitespace more accurately. Hint: You might want to use a string as the buffer.
Implement a function to get a single character while skipping all whitespace and replace the whitespace skipping loop with it.
Add the ability to create variables. Following the Lisp syntax, it should look something like the following:

( define var_name 839457.892 )

What’s Coming Next?

One of the most important parts of our program is the input buffer we created. Unfortunately, it’s not general purpose and can break when used in something more complicated than our tiny calculator program. In the next article, we will examine a bigger module which does this chore better.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

By Role

By need

Product

Content

Find out how HackerEarth can boost your tech recruiting

Building your own Lisp Parser Part I

Prerequisites

Objectives

Step 1: Writing the Grammar

Step 2: Translating the Grammar into Pseudocode

3. Writing the Code

Explanation

expression():

number():

operator():

calc():

UngetableInput

Homework

What’s Coming Next?

Get advanced recruiting insights delivered every month

Get advanced recruiting insights delivered every month

Get insightful articles from the world of tech recruiting straight to your inbox

Related reads

The complete guide to hiring a Full-Stack Developer using HackerEarth Assessments

Best Interview Questions For Assessing Tech Culture Fit in 2024

Best Hiring Platforms in 2024: Guide for All Recruiters

Best Assessment Software in 2024 for Tech Recruiting

Top Video Interview Softwares for Tech and Non-Tech Recruiting in 2024: A Comprehensive Review

8 Top Tech Skills to Hire For in 2024

Get advanced recruiting insights delivered every month

Top Products

Engage global developers through innovation

AI-driven advanced coding assessments

Real-time code editor for effective coding interviews

Tailored learning paths for continuous assessments