This is a command line spreadsheet evaluator that uses postfix notation. Cells accept postfix notation, so instead of writing 2 + 2, you should write 2 2 +.
Write a program that evaluates a CSV (Comma seperated value) file as a spreadsheet, with the following requirements:
- Cells may be postfix expressions
- Tokens in cells are seperated by whitespace
- The four basic arithmetic operations, + - * /, are allowed
- Cells are referred to in {LETTER}{NUMBER} notation. For example, A4 refers to colum 1, row 4. In my implementation, columns are not case sensitive. Cells have a default value of 0, if they exist in the csv.
To invoke the program, run (using python 3):
python spreadsheet_improved.py input.csv
The above command will output the results to stdout. Optionally output can be written to a file using:
python spreadsheet_improved.py input.csv -o output.csv
A sample input file could be:
c3 b1 +, 3 c1 *, 3, +
a1 1 +, 1 8 /, 2 3 -, sadf
12, b2, 45, d3, , e3 1 *
With an expected output of
54.0, 9.0, 3.0, #ERR
55.0, 0.125, -1.0, #ERR
12.0, 0.125, 45.0, #ERR, 0, 0.0
I constrain the input to 26 columns, as I restrict column references from a to z, and do not allow compund addressing. Columns are not case sensitive.
Floating point and positive integers are valid inputs as numbers. Ints are internally converted to float for all operations. Negative numbers are not allowed. To get negative numbers, use an expression like "0 7 -"
I included some unit tests the ensure certain specifications are met. I do not aim for 100% code coverage, instead I check bounds, important error state and basic operations are correct. Simply run
python test.py
This section describes how the code works. Reference spreadsheet_improved.py to follow this section.
I wrote a very simple tokenizer, that converts a string to a token. The tokenizer's job is to recognize numbers, operators, and references to cells. Internally a token is a class with two fields, value and kind. Value is the value of the token, and kind is the type of token.
class TokenKind(Enum):
NUMBER = 1
CELL = 2
OPERATION = 3
ERROR = 4
class Token:
def __init__(self, value, kind):
self.value = value
self.kind = kind
def toString(self):
""" Quick function to convert token to printable string """
if self.kind == TokenKind.ERROR:
return "#ERR"
return str(self.value)The kind field is specified by an enumeration from the class TokenKind. NUMBER and OPERATION describe a number or an arithmetic operation. CELL identifies a token as cell reference in {LETTER}{NUMBER} format, and ERROR is an invalid token. The heart of the tokenizer is a function called stringToToken, which when given a string returns a single token.
stringToToken classifies tokens using a finite state machine. Python's regular expression engine is a state machine, and well suited to tokenizing strings. The regular expression that does the heavy lifting is
scanner = re.compile(r'''
(^([A-Za-z])([0-9]+)$) |# Match a cell
(^[+*/\-]$) | # Match an op
(^[0-9]+[.]*[0-9]*$) # Match a number
''', re.VERBOSE)The above regex matches either a cell reference, operation or a nuber, and returns the result as a match with five groups. By inspecting the value of these groups, I determine the type of token, which is done just below the regex, and lookes like:
res = re.match(scanner, input)
# Discover the token type
if res is not None: # Found a valid token
cell, col, row, op, num = res.groups()
if cell: # A cell reference, ie A7
rownum = int(row)
# Is positive and zero?
if rownum < 1:
return Token(None, TokenKind.ERROR)
colnum = ord(col.lower()) - ord('a') # Convert column letter to number
return Token((colnum, rownum - 1), TokenKind.CELL)
elif op:
return Token(input, TokenKind.OPERATION)
elif num:
number = float(input) # Convert all #s to floating point
return Token(number, TokenKind.NUMBER)
# Could not convert, return error
return Token(None, TokenKind.ERROR)Internally the program operates recusively. A method in Sheet called evalCell does the actual evaluation of a cell. If it encounters a cell reference in an expression such as 1 b2 +, evalCell will recursively cell b2. In other words, evaluation of expressions is depth first.
Cells are marked with a cell state that marks its stage of evaluation. If evalCell encounters a cell that has begun evaluation but not finished, it means a circular definition exists. This is the same procedure as marking cells in a depth first search to avoid going in circles. Cycles are detected, and result in a cell being evaluated to #ERR.
For example, an input with a loop:
input.csv:
b1,a1
b2 1 +,1
will result in
#ERR, #ERR
2.0, 1.0
Cells also record their final numeric value, so if a cell is referenced multiple times it does not need to recompute the result.
There is a recursion limit in python, and the code detects any attempt exceed the recursion limit, returning an #ERR value in the cell.
Another way to solve this is a topological sort, to determine the order of cell calculation, and recursion would not be necesary. Any CELL tokens encountered could have their value determined immediatly, because topological sort would insure all its dependencies are calculated.
This setion assumes knowledge of the common algorithms for evaluating postfix notation, I do not do a complete tutorial. There are two ways to evaluate a postfix expression (see Wikipedia for the two different algorithms). evalCell uses left to right, meaning it encounters the operands before the operator. This means I only need to recursively evaluate CELL tokens when they are first encountered in the token stream, and not when they are popped off the stack due to an operator.
It is also possible to use a right to left algorithm, and suggests a way to make evalCell non-recursive. When an unevaluated cell token is encountered, remove it from the input token stream, and push the contents of its cell to the front of the stream. For example if evaluating the expression 1 x + and x = 1 2 *, then when x is read from the token stream, just push 1 2 * to the front of token stream, resulting in:
`1 1 2 *`
Evaluation will proceed as expected. Difficulties arise in preventing infintite loops, which is easy enough to fix. Storing values of already computed cells is less easy to fix, and I have not implemented it.
I include a sample input.csv.
I developed this program on windows. Should be fine on linux and mac.