Recently I worked on a project for my Optimizing Compilers course. The purpose of this project was to implement Loop-invariant Code Motion and any other compiler optimizations that we choose. The project is competitive because one's mark is based on how one's compiler improves the mean execution time on a small set of static, pre-determined test cases. Given that the test cases do not change, it is natural to specialize one's optimizations to the code being tested. Realistically, this might not be the best approach as code tends to change and compiler optimizations are not always transparent.
Optimizations
So far I have implemented the following optimizations. This post will focus on the last optimization, symbolic interpretation (labeled EVAL).
- CP
- Copy propagation
- CF
- Constant folding (with local constant propagation)
- LICM
- Loop-invariant code motion
- DCE
- Dead code elimination (with unreachable code elimination, block merging, and local constant de-duplication)
- CSE
- Common subexpression elimination
- EVAL
- Symbolic interpretation (based on abstract interpretation)
These optimizations were arranged into the following pipeline, where dashed edges are followed when a pass changes something and solid edges are followed when no changes are made:
SimpleSUIF
This project uses Stanford's SimpleSUIF compiler infrastructure. SimpleSUIF's intermediate representation (IR) is a linked list of instructions, including such things as basic arithmetic, bitwise operators, memory/constant load/store, and calling/branching operations. The IR is register based, with three register classes: machine, pseudo, and temporary. For our purposes, machine registers are never used. Temporary registers represent single-definition and single-use registers, where both the definition and use (if any) must reside in the same basic block. Temporary registers often hold loaded constants. Pseudo registers behave like general purpose registers. Finally, all registers are typed.
One quirk of how we use SimpleSUIF is that there is no apparent way to access the IR for an arbitrary function within the same compilation unit. As such, interprocedural optimizations such as function inlining and compile-time execution are not possible. This was unfortunate as there was one particular test case that would have benefitted from interprocedural optimization.
Test case
Below is one of the functions in the test case of interest. Two lines are striked out because the dead code elimination optimization pass regards them as useless.
float f1(float b, float c){ int i; float j, k;j = c;for(i = 0; i < 2; i++) { k = b * i;j += k;} return k; }
Looking closely at this example, it is clear that only the initialization of i to 0, the last iteration of the loop, and the value of b are important to the output of f1. However, this is difficult to tell from the perspective of the IR without running through the program. With more information (e.g. about loop induction variables or loop dependencies), we might be able to make smarter decision, but only in some really restricted cases. Unfortunately, it's not clear how one should go about "executing" this program in the absence of a particular value for b. This is where symbolic interpretation comes in.
Symbolic interpretation
Symbolic interpretation is similar to local value numbering in that we operate on concrete and symbolic values. For simplicity, I restricted this optimization pass to a subset of the provably pure functions. Because information about other functions was absent, I considered a pure function to be any function that does not:
- Load from or store to a memory location.
- Call any functions. Note: this constraint can be relaxed in the case of a recursive function call. The test cases I focused on did not include recursive function calls; however, this method can easily be extended to apply to that case.
- Copy from one memory location to another memory location.
Thus, a function is considered pure if it depends only on constants, local variables, and function arguments, and performs no operation that could generate a side-effect.
The following control-flow graph (does not include some edges because I am lazy with SVG) is an interactive symbolic executor of the SimpleSUIF-like IR representing the above function. Below I describe how each step of the evaluator is performed.
The symbolic interpreter behaves similarly to something that performs a combination of constant folding and constant propagation, with the exception that when an operation is performed on an expression containing a symbol, a new symbol is generated.
For example, if one performs an ldc operation to load the constant 0 into register t6, then we can assign to t6 the value 0. If a copy (cpy) operation is performed, then the value of the right-hand register is assigned to be the new value of the left-hand register. For example, cpy r3 = t6 assigns to r3 the value 0.
Sometimes a register is used before it is defined. For example, r1 in mul t8 = r1, r3 is never defined in the above code. This is because r1 represents one of the arguments to the function. In this case, r1 is given a new symbolic value that is distinct from every other symbolic value. In the above simulator, the symbolic value assigned to r1 is named r1. The purpose of being able to identify the "origin" of a symbol value will be useful for code generation.
When a symbolic value participates in an expression, as in mul t8 = r1, r3, a new and unique symbolic value is generated that represents the expression. If any of the components of the expression are constants (known at compile time) then we want to store those constants as part of the symbolic expression. For example, in the first iteration of the loop, t8 is assigned the symbolic expression r1 * 0. In the second iteration of the loop, t8 is assigned the symbolic expression r1 * 1.
Something not touched on in this example is a branch that depends on a symbolic value. In this case, we cannot follow the branch as we don't know in which direction it will go at runtime. We are concerned with cases in which we can statically determine the direction of the branch.
Code Generation
The focus of symbolic evaluation has been to end up with some symbolic or constant expression for each register. In fact, for this optimization, only the returned register (r5) ends up being useful. If the returned register contained a constant value then the function is necessarily constant, and so the function's code can be replaced with a ldc followed by a ret.
In the case that the returned register is a symbolic expression, we can walk the expression tree and output for each subexpression the instructions needed to compute that subexpression. The leaves of the expression tree will be symbolic register values (named according to their register) or constants.
Using the above expression tree walking strategy, the symbolic expression of r5 can be converted to the following sequence of instructions:
ldc t1 = 1 mul t2 = r1, t1 ret t2
Here we have generated new registers to hold temporaries, but left symbolic registers alone. This new sequence of instructions takes the place of the old, larger sequence of instruction.
Did you try running CSE after LICM? It made a huge improvement for me (in fact, running CSE before LICM was worse than not running it at all for me - I think there was something wrong with my implementation).