Building Your Own Lisp Parser Part II

You might remember the Lisp parser we created in the article (Building Your Own Lisp Parser Part I), which is a pre-requisite for this post. You will now see how the buffered input class (UngettableInput in the previous post) can be improved.

The Problems

Although the UngettableInput class works for our Lisp parser, it is highly specific and could break when used in other programs. In this section, I will talk about the problems with the UngettableInput class.

1. There's no method for emptying the input buffer.

Implementing a method to empty the buffer was trivial but we didn't need it in the Lisp parser we created earlier so we left it out. We will accomplish this in this post as an ideal input class should provide one.

2. Getting input of a particular kind is not possible.

UngettableInput knows only character and word; getting a number via UngettableInput is not possible. As a result, if the user types (+1 2), our program won't work. The reason is there's no space between “(“ and  “+” or “1” and “)”.

The Solutions:

This section lists possible solutions to the problems presented above. The code comes in the next section.

1. There's no method for emptying the buffer

The solution is really simple; assign an empty string to the buffer. But in the code section, you will notice a different implementation.

Although both work, a good programmer should think out of the box. What happens to the previous buffer content if we just assign an empty string to the buffer? We just lose it. What if the programmer would need it? The solution to this particular problem is simple but it takes a good programmer to even recognize that there is a problem. We have taken this into account in this post and implemented a method to empty the buffer while also returning the previous content.

2. Getting input of a particular kind is not possible

In other words, there's only getc (for a single character) and getword (group of characters without any whitespace in between). We should implement a method to get an integer and a float at least. You can later try to write methods to parse for other numeric types.

The Code

Try figuring out how to solve these problems before looking at the following BufferedInput class. You could even write some pseudocode.

Explanation

The code itself is simple, but there is at least one edge case we have taken care of in an ugly way (See the Homework section). The method-by-method explanation follows.

getc()

There isn't any change in the logic but we have used a helper method fill_buffer(). fill_buffer() reads more character from stdin if the buffer is empty.

getword()

getword() repeatedly gets a single character and appends it to “w” unless the character returned by getc() is whitespace. Note that the while loop should break only after encountering atleast one character.

In other words, getwords() gets a continuous group of characters delimited by whitespace.

getint()

getint() first skips whitespace. Then a character is read, which serves as the sign (+ or -). In case the sign is not one of + and -, we unget it.

Then we get the first digit of the number. The reason for doing this is that we want to make sure that the word following is a valid number.

Once confirmed, we read characters continuously and append it to num as long as the digit read is a valid digit (in the range 0 to 9). The read num, which is currently a str of digits, is converted to int using the built-in in function.

getfloat()

getfloat() works in three steps corresponding to the integer before the decimal point, the decimal point itself, and the digits after the decimal point. All these are concatenated and converted to float using the built-in function.

Note that our implementation of getfloat accurately handles x.y, x., x.

Please look at the rest of the code and if at any point some part of it is not understandable please let us know in the comments below.

Homework

Now our code is general enough to be called a general purpose buffered input library! Or is it not? No, it's not. There are obvious flaws. Fortunately, they are easy to fix and will be homework for the reader.

Our class works only with stdin. Modify it to handle files as well.
Sometimes we need to check what the next character in the buffer is, and then unget it depending on what we are trying to do. A better idea is to have a peek method which returns the next character in the buffer without actually reading it.
Modify getfloat to handle numbers of the form .x (such as .5, .1, etc.)

Challenge

The getint and getfloat methods get digits and strings, append them, and generate a number which is returned. This is fine but it is slow. As a challenge, implement a version of getint and getfloat such that they generate the number as the user types in the expression. The answer will change as more characters come in. That will be way cooler.!!!

About the Author

Guest Author
Our guest articles are a collection of the best contributions made by members of the developer community on our blog. Discover articles on a wide range of topics, shared by top programmers across the world.
Share