In the late 70’s and early 80’s when the 8086 (The IBM PC) was running at 4.77Mhz and incapable of addressing more than 16MB of hardware memory (in real mode) – and even when minicomputers of the day such as the 16bit PDP11/xx range where running at around 15Mhz with a largely orthogonal instruction-set architecture and little I/O instructions – those involved in compiler design and implementation, such as myself working mainly with early versions of Unix sought to reduce both the compilation time and execution time of programs in order to combat the limitations of hardware of the time.
Successful compiler optimizers were required to produce a faster executable code, moderate compilation time and effective use of resources such as memory. Understanding compiler optimization was essential, particularly when changes to a programming construct or to an instruction set was required – the changes to the compilation process must be taken into consideration.
Instruction execution on any given processor is performed at the machine level, often using a machine language or machine code. However, most code is written in a higher-level language such as LISP, C or C++. Therefore, the important role of optimizing code is normally done by compilers, translating high-level languages into machine code. Given that system architectures vary dramatically, how code is executed against anyone processor class also varies greatly. For example, the Pentium processor has two separate integer execution units, called the U-pipe and the V-pipe. The U-pipe can execute any instruction while the V-pipe can only execute a subset. Programs that take this into account, for instance, can structure themselves in ways to achieve improved performance over those that do not optimise themselves in this manner.
Particular interest to the author was the "back end" compiler optimization techniques such as the peephole methods, and the exploration of the relationship between compiler optimization and the given architecture and design.
Most compilers perform lexical, syntactic, and semantic analysis, code optimization, and code generation, in this order. Code optimization includes common sub-expression elimination, constant folding, code motion, and strength reduction.
Code generation includes register allocation, code selection, and perhaps some peephole optimization. Most optimizers treat a high-level intermediate code to avoid machine dependencies and most peephole optimizers are machine specific and ad hoc, so they are normally confined to correcting a small sample of patterns that cannot be more easily corrected before code generation.
Generally, peephole optimization is used to mean the pattern matching and conditional replacement performed on small set of intermediate instructions. The circular dependence between the code generation phases implies that local optimals are rarely global optimals. Some strategies that may be adopted are:
- Accept the local optimal
- Develop intermediate goals whose achievement suggest global optimality
- Retain the choices so that the decisions can be made later
- Optimize the object by replacing awkward or overly constrained segments of code with improved ones.
As mentioned, peephole optimization is performed over a small set of instructions in a segment of generated code. The set is called a "peephole" or a "window". The peephole optimizer works by recognising sets of instructions that perform a NULL outcome (they do nothing), or that can be replaced by more efficient set of instructions. The most common techniques used in peephole optimization include:
- Evaluating constant sub-expressions in advance.
- Replacing slow operations with faster equivalents.
- Deleting useless operations
- Replacing several operations with one equivalent.
- Use algebraic laws to simplify instructions.
- Using special purpose instructions designed for special operand cases.
Peephole optimization examples
Treatment of single instructions with a constant operand
l A * 2 = A + A
l A * 1 = A
l A * 0 = 0
l A / 1 = A
Replacing the Multiply operation with a Shift
l A := A * 4;
Can be replaced by 2-bit left shift (signed/unsigned)
Folding jumps to jumps
A jump to an unconditional jump can copy the target address
JNE label 1
Label 1: JMP Label 2
Can be replaced by
JNE Label 2
As a result, Label 1 may become dead (unreferenced)
Jump to Return
A jump to a Return can be replaced by a Return
JMP label 1
Label 1 RET
Can be replaced by
Label 1 may become dead code