![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Optimizing Code This chapter discusses the impact that various switches and pragmas have on the optimization of code generated by the Rational Ada compiler. Specifically, this chapter discusses:
- Coding Strategies in Rational Ada
- Setting Optimization Factors
- Debugging Optimized Code
- The Optimization Process in Detail
Coding Strategies in Rational AdaThe Rational Ada compiler can generate code with various goals:
- Minimum code size —— Obtaining executable code that occupies the least amount of space.
- Minimum execution time —— Obtaining executable code that runs as quickly as possible.
- Minimum compilation time —— Allowing the compiler to work as quickly as possible.
- Maximum debugging ability —— Generating code that is completely debuggable at the source level.
Typically, code that is easy to debug has the fastest compilation time. Compilation time usually is inversely proportional to the size or execution time of generated code.
Typically, optimizing for execution time requires larger code size (smaller code size results in somewhat slower code execution), and optimizing for execution time or code space reduces debugging ability.
Any given optimization can be an exception to these guidelines.
Optimization Factors in Rational Ada
Apex allows you to specify —— using pragmas or switches —— several factors that affect the optimization of code generated by the Rational Ada compiler:
- Optimization level, which trades compilation speed and debugging ability for execution speed.
- Optimization objective, which trades execution speed for code size.
These factors affect each other, as described in Setting Optimization Factors.
Optimizing for Specific Effects
In many cases, you should generate code to meet more than one goal or to move gradually in the direction of a single goal.
For example, optimization level 0 provides the greatest debugging ability and the slowest execution speed, whereas level 2 provides the opposite.
Table 1 shows the Apex settings to use for various goals; you can mix settings to achieve the best balance for your project.
Table 1 Settings for Specific Code-Generation Goals1
For the best: Opt. level Opt. objective
Code size 2 Space
Execution speed 2 Time
Compilation speed 0 Space
Debugging ability 0 Space
1Some settings for some factors make the settings of other factors irrelevant. Refer to detailed descriptions.
Setting Optimization FactorsSetting the Optimization Level
The optimization level affects the amount of execution optimization performed during compilation. To specify this value, set the OPTIMIZATION_LEVEL context switch.
Optimization levels 0 through 2 are valid:
- Level 0 obtains the best compilation speed. No optimizations are done and full debugging is available.
- Level 1 obtains the best execution speed possible while still providing most debugging capability. Some code can be optimized away and thus affect debugging.
Level 1 also includes some inlining support. The compiler generates information that allows the debugger to determine when inlined code is entered and reconstruct a "fake" callstack entry so the resulting debugger session appears as if the inlining never occurred.
- Level 2 obtains the best execution speed at the expense of some debugging operations.
You can use this factor in the following way:
- During initial coding and debugging, use level 0.
- For testing and integration, release the more stable code at level 1 to allow most debugging but still provide quick execution time.
- For the production release, use level 2 for best execution speed if it will not be necessary to debug the installed software.
Table 2 describes the optimizations performed at levels 1 and 2. At level 1, the optimizations are performed only so far as to have minimum impact on debugging; level 2 removes this restriction.
All optimizations are performed subject to the constraints of the Ada LRM, Section 11.6.
Setting the Optimization Objective
The optimization objective specifies whether execution speed (Time) or code size (Space) is the primary goal of the compiler. It is set by:
- Setting the OPTIMIZATION_OBJECTIVE context switch. If the switch is not used, the default is Time. The setting applies in all cases unless overridden by pragma Optimize.
- Specifying pragma Optimize for a given compilation unit (LRM Annex B), which overrides the context switch for that unit.
- When the optimization objective is Space, automatic inlining does not occur (it occurs only at optimization level 2 in any event). At present it has no other effect.
Inlining Pragmas
Apex Ada supports the inlining behavior specified in the ARM.
The standard pragma Inline instructs the compiler to honor requests for inlining at optimization levels 1 and 2 when possible (it is not possible to inline recursive code, for example) and beneficial.
Even in the absence of an inlining pragma, at optimization level 2 the compiler may perform inlining of code local to the compilation unit if it is beneficial to do so.
Apex Ada supports two additional inlining pragmas:
- The pragma Inline_Never completely prohibits inlining of a specified subprogram.
- pragma Inline_Only forces the compiler to inline a specified subprogram, even at optimization level 0. If inlining is not possible, the compiler generates an error.
Debugging Optimized CodeYou can do source level debugging on code optimized at any level. And, of course, you can always do machine level debugging regardless of how the code was compiled.
Generally, however, the higher the optimization level, the less information is retained for debugging. More specifically:
At optimization level 0:
No optimization is performed. You should have full debugging capabilities.
At optimization level 1:
This is the default. At this level, the compiler performs optimizations which maximize performance without sacrificing debuggability. At this level, no optimizations are performed which move code from it "natural" order. This includes code hoisting and pipelining. In addition, no code or objects are eliminated.
At optimization level 2:
At this level, debuggability is sacrificed for performance. Code is eliminated. If you try to set a breakpoint at a source line which has been optimized away, the debugger will give you a message saying that there is no code associated with that source line.
Objects can be optimized away. If you try to display such an object, the debugger will tell you that it has been optimized away.
Code can be re-ordered. There are several optimizations that can cause this: code hoisting, loop unrolling, and pipelining are some examples. In this case, the debugger still generally knows which source line is associated with every instruction. However, when you're stepping through re-ordered or pipelined code, you may execute statements out of order or execute the same statement more than once!
Inlining is performed. If you step into inlined code, the debugger will still be able to place breakpoints and do other debugging operations associated with code locations, but will not be able to display objects declared in inlined routines. The only exception to this is that the debugger is able to display the parameters of C++ inlined member functions. For Ada, the debugger cannot display any objects declared in an inlined subprogram.
The Optimization Process in DetailThis section provides an overview of Apex Ada optimizations and describes methods by which optimizations may be controlled at a very fine level of granularity.
Optimization Process Overview
The Apex compiler executes a front-end process, then an optimizer, and finally a code generator. Optimizations take place in each of these steps.
The front-end process parses and analyzes Ada code to produce a extended Diana tree representation, called the persistent intermediate representation (PIR). The front-end optimizes the PIR, with a focus on Ada-specific optimizations and optimizations that can best be done with tree representations of the program.
The front-end then "walks" the PIR tree to generate a linear representation of the program using the so-called intermediate language (IL). The IL is essentially an abstract assembly code.
The IL is transformed by the optimizer to a more efficient IL. The input and output use the same IL instructions, but the output is much more efficient.
Finally the code generator phase processes the IL to produce actual target machine code. During this process, a number of target-specific optimizations are applied.
The following sections describe the optimizations that take place during each processing step.
Front-End Optimizations
The Apex front-end performs optimizations on a tree-based representation of the program called MILO.
ILX (inline expansion)
At optimization level 1, only those routines that have a pragma Inline applied are candidates for inlining. The optimizer does not inline routines if it feels that there would be no significant benefit. For example, if a routine is quite large, then the benefits of inline reduction of call overhead would likely be eliminated by code expansion negative effects on computer caches.
At optimization level 2, the optimizer treats all routines for which bodies are available at compilation time as candidates for inlining even if there is no pragma Inline applied to the routine spec. The optimization objective switch controls how aggressively we inline:
- OPTIMIZATION_OBJECTIVE = space
The compiler only inlines a call if the resulting code will be smaller than an out-of-line call. Typical inlines are simple selectors of elements from a private data type.
- OPTIMIZATION_OBJECTIVE = time
The compiler inlines any routines based on how many calls are made to the routine (only determinable for local routines) and how large the code sequence is for the routine versus the code sequence for calling the routine. For example, local routines that are called only once are inlined regardless of size.
The Apex inlining approach means users can design their Ada code with proper abstract layers without fear of huge calling penalties. A coder need only worry about algorithm efficiencies and good, maintainable design, not about how to structure routines to eliminate calls.
Apex provides two additional inlining pragmas that apply to optimization levels 0, 1, and 2: pragma Inline_Never and pragma Inline_Always.
GTE (goto elimination)
The GTE optimization eliminates jumps to the next location. It chains jumps to other jumps, returns, and raises. While the middle-pass optimizer also does this optimization on the linear IL, it is essential to this at the Milo level to clean up the Milo trees so that other Milo optimizations can be triggered.
goto L ... <<L>> return True;
return True; ... <<L>> return True;
If there are no other references to L, then the statement with label <<L>> will be eliminated by DCE.
DCE (dead code elimination)
DCE eliminates code that is unreachable, i.e., basic blocks for which there is no control path to them.
return True; X := ...; -- Not reachable and hence eliminated
UCE (unreferenced code elimination)
UCE eliminates routines, objects, constants, symbols, etc., that are never referenced. The major impact is elimination of routines from generic instantiations that are never called. This optimization works only at the compilation unit level so routines which are externally visible cannot be eliminated (this cannot be done until link time).
UCR (useless code removal)
UCR eliminates code that has no side-effects. The middle pass very frequently generates code for side-effects only and when it turns out that there are no side-effects (very frequently the case), this useless code can be eliminated. For example,
procedure Eval (X : integer) is begin null; end Eval; pragma Inline (Eval); declare X : Integer := ...; begin Eval (X + 1); end;
The evaluation of X+1 can be eliminated if it can be shown not to overflow since this value is never used.
TRE (tail recursion elimination)
Tail recursion is transformed into iteration. This is done as part of inline-expansion. For example,
function F (X: Natural) return Boolean is begin if X = 0 then return True; elsif X = 1 then return False; else return F (X - 2); end F;
function F (X : Natural) return Boolean is Y : Natural := X; begin loop if Y = 0 then return True; elsif Y = 1 then return False; else Y := Y - 2; end loop; end F;
EVO (Evaluation reordering)
EVO reorders the evaluation of expressions to reduce the number of temporaries required. It also reorders the evaluation of the actual parameters of a call to reduce the number of temporaries and to increase the likelihood that an actual parameter can be evaluated into the register used for parameter passing.
BND (Object binder)
BND attempts to bind objects that do not have overlapping lifetimes to the same location in the stack frame to reduce the size of the stack frame. This phase does not attempt to optimize the binding of (symbolic) register objects; that is handled by the middle-pass and code generator optimizations.
Middle-Pass Optimizations
The middle-pass optimizer transforms IL code into optimized IL code using a number of well-known optimization algorithms. The code is optimized by applying the algorithms in some order, often reapplying one or more of the algorithms after some other algorithm is run.
Optimization Algorithms
There are nine "canned" algorithm sequences, known simply as optimization levels 0 to 2 (level 0 is no optimization).
The optimization algorithms are referenced by single letters, with no particular meaning for the choice of letter. The algorithm choices are presented below in groups, with some explanation of the transformations those algorithms provide. The following sections provide a description of the middle-pass optimizations which are available
- Propagation Optimization Algorithms
- Address Conversion and Coalescing Optimization Algorithms
- Address Simplification Optimization Algorithms
- Common Sub-Expression Elimination Optimization Algorithms
- Code Hoisting Optimization Algorithms
- Reduction Optimization Algorithms
- Merge Bit Optimization Algorithms
- Loop Unrolling Optimization Algorithm
- Range Check Elimination Optimization Algorithm
- Useless Computations and Variables Optimization Algorithms
- Control Flow Optimization Algorithm
- Memory Optimization Algorithms
- Normalization Optimization Algorithms
- Register Alignment Optimization Algorithm
- Miscellaneous Optimization Algorithms
- Delineate Sections of Transformations
Propagation Optimization Algorithms
Propagation moves an item forward from computation to use, often allowing the computation to be folded or coalesced into the using instruction and eliminating the original computation. Address propagation often moves the computation of an address offset into the operand of a subsequent instruction in the form "op offset(reg)".
Address Conversion and Coalescing Optimization Algorithms
e: Convert add's of addresses into addr's
i: Coalesce address increments
z: Convert simple addr's into add's of addresses
Address conversion and coalescing often allows address propagation to proceed in situations that might not otherwise permit propagation. In particular, note that `e' and `z' are inverses; `e' is used to collapse simple address computations into complex address operands preparatory to an address propagation, then `z' is used to convert the results back to simple address computations that are better suited to efficient code generation and code scheduling.
Address Simplification Optimization Algorithms
X Simplify addresses completely
x: Simplify addresses
Address simplification is used to break complex addresses into simple expressions that permit better propagation. Often propagation will reverse the result of these transformations, after propagation of other computations have taken place. For example,
add const,reg; op offset_const(reg)
will be converted by transformation "x" into
add const,reg; add offset_const,reg; op 0(reg)
which can then be propagated by transformations "I" and "e" into
op offset_const2(reg)
where offset_const2 is the sum of const and offset_const.
Common Sub-Expression Elimination Optimization Algorithms
C Global CSE
V: Local CSE
c: Global CSE within loops, no increments
v: Local CSE, but not constants
Common sub-expression elimination (CSE) finds expressions that are repeated and moves the computation ahead of all uses so that it is only computed once. Often repeated expressions occur implicitly within Ada code, such as index computations for different arrays within a loop.
Local CSE restricts the search and moves to basic blocks, i.e. code sequences with no branches and no entry labels. Global CSE searches across basic blocks and thus involves complex flow analysis.
Code Hoisting Optimization Algorithms
H Code hoisting
h: Hoisting CSE from if-statements
Code hoisting involves moving computations from a loop (where it is executed with each iteration) into the prefix code for the loop. It is also possible to hoist code that occurs commonly within both branches of an if/then statement into the prefix code before the if/then. While that transformation alone does not speed up the code, it does reduce code size and also permits other transformations, such as code coalescing, to work on the hoisted code.
Reduction Optimization Algorithms
S Reducing induction expressions
d: Reducing moves
m: Reducing moves and renumbering regs
A transformation associated with code hoisting, strength reduction, involves transforming an expensive operation within a loop into a less expensive operation. For example,
for I in 1..10 loop A(I) := I*4 ; end loop
can be strength reduced (to change a multiply to an add within the loop) to
tmp := 4 ; for I in 1..10 loop A(I) := tmp ; tmp := tmp + 4 ; end loop
Merge Bit Optimization Algorithms
B Merge bit operations
b: Limited merge bit operations
Bit operation merging involves merging operations on successive bit fields into a single operation on a larger field. This permits boolean bit arrays that are, for example, being ANDed bit by bit, to be converted to 8-bit or possibly 64-bit integer AND instructions. It allows byte by byte instructions to be merged into a 32-bit or 64-bit integer instruction.
Loop Unrolling Optimization Algorithm
L: Unroll loops
Loop unrolling involves replicating the code within a loop several times (four to eight seems to be the best trade-off) so that more than one computational iteration is achieved with each iterative computation. While unrolling does reduce the number of iterator computations and end tests, the more important benefit is to provide more instructions for the code generator scheduling optimizer to schedule. Often having instructions from multiple iterations allows the scheduler to delay use of a computation sufficiently such that high-performance pipelines are not stalled waiting for results. Loop unrolling is limited to relatively small loops, so the resulting code expansion is not a great impact on instruction caches.
Range Check Elimination Optimization Algorithm
R: Eliminate range checks
Range check elimination computes maximum limits for computation results and uses those limits to eliminate limit checks whenever the result cannot possibly exceed a (sub)type limit. The computation result limits are computed using (sub)type limits for operands, propagated constants, and instruction properties.
Useless Computations and Variables Optimization Algorithms
D Delete dead variables
U: Remove useless computations including range instructions
u: Remove useless computations
Often, other transformations result in pieces of code that cannot possibly have any external effect on the program. These transformations eliminate those useless computations.
Control Flow Optimization Algorithm
f: Simplify control flow
This operation manipulates the control flow to reduce them number of branches and tests.
Memory Optimization Algorithms
T Extracting expensive constants fron instructions
Y: Extracting memory operations from instructions
y: Extracting memory loads from instructions
These operations eliminate redundant loads and stores from memory. Whenever an instruction operand refers to memory, it is transformed into a load or store register instruction followed by the original instruction using the register for its operand. Multiple register loads and stores are then collapsed into one load and one store where possible. Depending on the results from other transformations, it is possible that the loads and stores remaining after this transformation may then be reintegrated into the using instructions as operands referencing memory.
Normalization Optimization Algorithms
N Normalization
n: Normalization
Normalization reorders the IL code within basic blocks into sequences that can be more easily processed. Normalization is important for many propagation, merging, and coalescing transformations.
Register Alignment Optimization Algorithm
G: Finding register alignment
Register alignment controls code generation with respect to register sizes, including sizes used when spilling registers.
Miscellaneous Optimization Algorithms
F: Counting registers
g: Generating debug aliases
r: Mark debug registers
These are miscellaneous transformations used to set up for subsequent optimizations, or used to set up extra data information used by the debugger for properly associating registers with symbols in the original Ada code.
Delineate Sections of Transformations
{: Optional loop transformation start
}: Optional loop transformation end
Braces delineate section(s) of transformations that are taken only if there are loops in the section of code being optimized.
Optimizations Performed at Each Level
For each optimization level, a predetermined sequence of transformations is executed. These sequences are currently stable, but future experience could prompt a change in these, possibly for a particular platform. The current sequences for optimization levels 0-2 are shown in the following table. A quick reference key to the algorithm letters is provided immediately below the table.
0 (no optimization)
1 NmPDFVPCPHAbyXMDmG
2 ndPDmFncPADIRyxcPXpdzcPV{HPSrPDuHPSrPHPS}rMDLCHPAUDIiBYxcPXzCPVmnHCPEUDMDeEmVUDTGDgz
Code Generator Optimizations
The Apex code generators also optimize code. These optimizations are typically target-specific. In this section, general optimization areas performed by code generators will be discussed.
Register Usage Optimization
Code generators make the assignment of physical registers from the pseudo-registers specified within the IL. Complex graph-coloring algorithms and target-specific heuristics are used to try to minimize the number of register spills, and register moves.
A register spill occurs when a variable allocated to a register has to be temporarily saved to the stack so the register may be used for some other variable. Register spills can significantly slow execution speed, so the code generator code makes significant efforts to minimize these occurrences.
Register moves occur when a variable allocated to one register has to be moved to another register for a procedure call. Register moves do not have as negative an impact as a register spill, but there are wasted instructions if the values can be computed directly into the appropriate result register. Consequently, code generators also attempt to minimize register moves.
Instruction Scheduling
All present day computer architectures (whether RISC or not) have heavily pipelined computational units in order to achieve their high speeds. Depending on the architecture, there may any or all of the following scheduling issues.
Existing Apex code generators use heuristics that make use of all the above scheduling issues, if applicable to the particular target architecture.
Advanced Instruction Sets
Most architectures are constantly evolving. Apex code generators frequently have architecture version switches to enable use of more advanced instructions. When those switches are enabled, the code generator takes a different path for some Ada code and generates machine code that uses the advanced instructions.
However, these alternative architecture switches do not come without a price. Users often want one executable that will run on different architecture versions. Most modern operating systems provide emulations for missing hardware instructions, but those emulations often are orders of magnitude slower than generating code that does not use the advanced instructions. The result is that if the advanced instructions are enabled and the resulting executable is run on an older architecture version, applications can suffer significant degradation in speed of execution.
Advanced instruction sets also in combination increase the number of code execution scenarios and elevate the importance of the operating system version (version 10.2 may properly emulate an instruction, but 10.1 may have bugs). For this reason, architecture variants are very carefully reviewed for potential benefit before supporting architecture variant switches. Often several variants will be grouped into one switch version to minimize the testing cases.
Peephole Optimization
During actual target code generation, it is often possible to collapse some adjacent IL instructions into a single target instruction. This is called peephole optimization because the transformation only operates on a small "peephole" view of instructions for determining these transformations. This optimization may also be run on the target code instructions after generation, rather than the IL during code generation. The principles are the same.
An example is generation of the S8ADDL instruction for Dec Alpha targets, which. shifts one register by 3 (multiply by 8) and then adds it to another register. This operation is often used for computing array offsets. If the peephole optimizer sees an IL multiply by 8 followed by an IL add of the result to another register, then the code generator collapses those two IL instructions into the single Alpha S8ADDL instruction.
Rational Software Corporation http://www.rational.com support@rational.com techpubs@rational.com Copyright © 1993-2004, Rational Software Corporation. All rights reserved. |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |