How Quantify handles basic blocks

To determine the time spent in a function, Quantify analyzes the basic code blocks of each function. A basic block is a sequence of instructions that are always executed together in succession. Basic blocks typically start at the beginning of functions and other code blocks and terminate at conditional jumps to other basic blocks.

Quantify uses information about your machine's hardware to compute the expected number of machine cycles each original basic block will require to execute.

On RISC architecture machines, most instructions take a single machine cycle. Instructions such as load and store instructions can take longer and can stall, depending on the instruction stream that follows each instruction. Quantify uses this machine-specific information to estimate the number of instruction cycles each basic block will take, including the expected number of stall cycles.

Quantify inserts code that adds the expected cycle count to a basic block cycle accumulator each time the basic block is entered. These accumulators have 64-bit precision, providing accurate counts even in programs or blocks that execute for a very long time.

The counts Quantify reports reflect the time the original program would have taken without Quantify. The reported times are exclusive of any Quantify run-time overhead.

How Quantify identifies basic blocks

To understand how Quantify identifies basic blocks, consider this example program:

The block numbers indicate the extent of the basic blocks in the function test when compiled using the -g debugging option.

Here is the basic block flow structure of the test function:

Here is an example of the cycle counts for the program compiled using cc -g on a SPARCstation ELC:

To execute the first call to test(0, 1, 3, 8) in main, the program enters blocks 1, 3, 5, 6 and 7, which takes 51 cycles. The table below compares the difference in cycle counts between optimized and non-optimized code. The optimized version is noticeably faster, primarily because all the calculation occurs in registers, thereby avoiding the need to load and store values from memory.

Block	Nonoptimized	Optimized
1	26	4
3	3	2
5	8	1
6	6	1
7	8	3
Total cycles	51	11

Many optimizing compilers rearrange the execution order of machine instructions to take advantage of the RISC processor's ability to overlap operations. These instruction scheduling optimizations can have a significant impact on performance.

Since Quantify bases its analysis on the optimized instruction sequences produced by the compiler, Quantify's reports reflect any benefits of instruction scheduling performed by the compiler.

Note:

Quantify's analysis does not reflect the additional performance improvements possible on superscalar architectures using multiple pipelines and other hardware features such as dynamic branch prediction. On such machines, Quantify's estimates are pessimistic, predicting a slower run time than what actually might be possible.

How Quantify reports multiple basic blocks

In the Annotated Source window, lines marked with a plus sign (+) indicate the start and possible continuation of multiple basic blocks over one source line. This occurs in expressions such as:

if ((a > 0) && (b > 0)) {c++;}

The two clauses of the conditional expression and the increment clause are compiled as three separate basic blocks, but all these blocks are associated with the same line number. When Quantify displays the data, the number in the margin reflects the sum of the data recorded for all the basic blocks associated with that line.

If you select View > Multi-block lines > Show multi-block lines, the individual times for basic blocks, as ordered in the object file, are shown on comment lines inserted immediately after the initial multiple basic block line. In some cases, the compiler might order the basic blocks differently from the order of the source code.

Annotations and compiler differences

Quantify reports counts for basic blocks in the Annotated Source window using the line number information emitted by the compiler for debugging purposes. Different compilers emit different line information in addition to different machine instruction sequences for a source file. Quantify's annotated source reports can reflect some of these internal differences.

For example, consider this code fragment:

if ((a > 0) && (b > 0)) {c++;}

When compiling without debugging information, most compilers emit three basic blocks for this code fragment, corresponding to the two test expressions and the variable increment statement. When compiling with debugging information, however, some compilers emit four basic blocks. The extra basic block corresponds to an "empty" else clause:

if ((a > 0) && (b > 0)) {c++;} else {}

At run time, if the conjunction succeeds, c will be incremented and the code will jump to the following statement. If either of the conditions fail, however, the �empty� code block will be executed, jumping to the next statement. This jump costs some machine cycles, and Quantify records those cycles in a separate basic block.

Annotations for if-then expressions

For if-then expressions written on several lines, the data from the �empty� basic block can produce annotations such as:

The implied else clause can result in positive counts for the last line of the then cause.

For a description of the annotations in the Annotated Source window, read What annotations mean.

Annotations for switch expressions

Similar annotations can occur in switch statements, which are often rewritten by the compiler as if-then-else statements. Consider the following annotation fragment:

In this case, the compiler rewrote the switch statement as follows:

The counts on line 4 are not caused by the break expression but by the implicit else clause added by the compiler and associated with line 4. Compilers often do this because they assume that an optimizer eliminates the superfluous branches in a later pass if debugging information is not needed.

Showing multi-block lines, Quantify would display:

Quantify indicates that the implicit goto exit_switch statement, which corresponds to the original break statement, was never executed. However, the added implicit else basic block was executed. Since the sum of the multiple basic blocks under line 4 was not zero, Quantify reports the total and marks the line as being executed.

Most compilers emit many small basic blocks when compiling for debugging. The increase in the number of small basic blocks often results in a degradation in speed when Quantify is recording data in these functions, since it must record the time for each basic block separately. You can control this trade-off. For information, read Changing the granularity of collected data.

On Solaris systems, if you are recording data on register window traps, the counts for the first and last lines of the function can look quite large. Quantify assigns the register window trap times to the prevailing basic block at the time of the trap. This is typically the first or last basic block of the function.

C++ templates and annotated source

When using C++ templates, it is common to include the template declaration in a header file and define each type variant (specialization) in one or more source files. For debugging purposes, the compiler indicates that the source code for each specialization is actually found in the header file. This means that several different specializations share the same source code in the header file.

Quantify reports data for each called specialization separately, reporting each specialization as a separate function with a demangled name that indicates the specialization's data types. Quantify displays the annotated source for a function in the header file with the collected data for that function.

Note:

The same display technique applies to static C functions defined in header files and multiple function definitions on a single line.