Operators that appear in the Access Plan Explorer for access plans on DB2 for Linux, UNIX, and Windows

These operators can appear in the Access Plan Explorer when you are analyzing the access plan for an SQL statement that runs on DB2® for Linux, UNIX, and Windows.
Delete

This operator represents a necessary operation. To improve access plan costs, concentrate on other nodes (such as scans and joins) that define the set of rows to be deleted.

Performance suggestion: If you are deleting all rows from a table, consider using the DROP TABLE statement or the LOAD REPLACE command.

User-defined index scan
The scanning uses the multiple start/stop conditions from the user-supplied range producer function. This operation is performed to narrow down the set of qualifying rows before the optimizer accesses the base table (based on predicates).
Performance suggestion:
  • Over time, database updates can cause an index to become fragmented, resulting in more index pages than necessary. This can be corrected by dropping and re-creating the index, or reorganizing the index.
  • If statistics are not current, update them by using the RUNSTATS command.
Fetch
This operator represents the fetching of columns from a table by using a specific row identifier (RID).
Filter data
This operator represents the application of residual predicates, so that the data is filtered based on the criteria that is supplied by the predicates.
Generate table rows
This operator represents a built-in function that generates a table of rows, by using no input from tables, indexes, or operators.
Group by
This operator represents the grouping of rows according to common values of designated columns or functions. This operation is required to produce a group of values, or to evaluate set functions. If no GROUP BY columns are specified, this operator can still be used if there are aggregate functions in the SELECT list. Their presence indicates that the optimizer treats the entire table as a single group when the performing the aggregation.
Performance suggestion:
  • To improve access plan costs, concentrate on other nodes (such as scans and joins) that define the set of rows to be grouped.
  • To improve the performance of a SELECT statement that contains a single aggregate function but no GROUP BY clause, try the following:
    • For a MIN(C) aggregate function, create an ascending index on C.
    • For a MAX(C) aggregate function, create a descending index on C.
Hash join
This operator represents a hash join for which the qualified rows from tables are hashed to allow direct joining without pre-ordering the content of the tables.

A join is necessary whenever more than one table is referenced in a FROM clause. A hash join is possible whenever a join predicate equates columns from two different tables. The join predicates need to be exactly the same data type. Hash joins can also arise from a rewritten subquery, as is the case with nested loop joins.

A hash join does not require that the input tables are ordered. The join is performed by scanning the inner table of the hash join and generating a lookup table by hashing the join column values. It then reads the outer table, hashing the join column values, and checking in the lookup table generated for the inner table.

Performance suggestion:

  • Use local predicates (that is, predicates that reference one table) to reduce the number of rows to be joined.
  • Increase the size of the sort heap to make it large enough to hold the hash lookup table in memory.
  • If statistics are not current, update them by using the RUNSTATS command.
Insert

This operator represents a necessary operation. To improve access plan costs, concentrate on other nodes (such as scans and joins) that define the set of rows to be deleted.

Index intersection
This operator represents the ANDing of the results of multiple index scans by using Dynamic Bitmap techniques. The operation allows ANDed predicates to be applied to multiple indexes, in order to reduce underlying table accesses to a minimum.
This operation is performed to:
  • Narrow down the set of rows before accessing the base table
  • AND together predicates applied to multiple indexes
  • AND together the results of semijoins, used in star joins.
Performance suggestions:
  • Over time, database updates can cause an index to become fragmented, resulting in more index pages than necessary. This can be corrected by dropping and recreating the index, or reorganizing the index.
  • If statistics are not current, update them using the RUNSTATS command .
  • In general, index scans are most effective when only a few rows qualify. To estimate the number of qualifying rows, the optimizer uses the statistics that are available for the columns referenced in predicates. If some values occur more frequently than others, it is important to request distribution statistics by using the WITH DISTRIBUTION clause for the RUNSTATS command. By using the non-uniform distribution statistics, the optimizer can distinguish among frequently and infrequently occurring values.
  • This operation can best exploit single column indexes, as start and stop keys are critical to it.
Index scan
This operator represents the scanning of an index to produce a reduced stream of row IDs. The scanning can use optional start and stop conditions, or might apply to indexable predicates that reference columns of the index.

This operation is performed to narrow down the set of qualifying row IDs before accessing the base table (based on predicates).

Performance suggestions:
  • Over time, database updates can cause an index to become fragmented, resulting in more index pages than necessary. This can be corrected by dropping and recreating the index, or reorganizing the index.
  • When two or more tables are being accessed, access to the inner table via an index can be made more efficient by providing an index on the join column of the outer table.
  • If statistics are not current, update them using the RUNSTATS command.
  • In general, index scans are most effective when only a few row IDs qualify. To estimate the number of qualifying row IDs, the optimizer uses the statistics that are available for the columns referenced in predicates. If some values occur more frequently than others, it is important to request distribution statistics by using the WITH DISTRIBUTION clause for the RUNSTATS command. By using the non-uniform distribution statistics, the optimizer can distinguish among frequently and infrequently occurring values.
Merge scan join
A merge join for which the qualified rows from both outer and inner tables must be in join-predicate order. A merge join is also called a merge scan join or a sorted merge join .

A join is necessary whenever there is more than one table referenced in a FROM clause. A merge join is possible whenever there is a join predicate that equates columns from two different tables. It can also arise from a rewritten subquery.

A merge join requires ordered input on joining columns, since the tables are typically scanned only once. This ordered input is obtained by accessing an index or a sorted table.

Performance suggestions:
  • Use local predicates (that is, predicates that reference one table) to reduce the number of rows to be joined.
  • If statistics are not current, update them using the RUNSTATS command.
Nested loop join
This operator represents a nested loop join that scans (usually with an index scan) the inner table once for each row of the outer table.

A join is necessary whenever there is more than one table referenced in a FROM clause. A nested loop join does not require a join predicate, but generally performs better with one.

A nested loop join is performed either:
  • By scanning through the inner table for each accessed row of the outer table.
  • By performing an index lookup on the inner table for each accessed row of the outer table.
Performance suggestions:
  • A nested loop join is likely to be more efficient if there is an index on the join-predicate columns of the inner table.

    Another (less important) way to make the join more efficient is to create an index on the join columns of the outer table so that the outer table is ordered.

  • If statistics are not current, update them using the RUNSTATS command.
Return
This operator represents the return of data from a query. This is the final operator in the access plan and shows the total accumulated values and costs for the access plan.
Row identifier (RID) scan
This operator represents a scan of a list of row identifiers (RIDs) obtained from one or more indexes.
This operation is considered by the optimizer when:
  • Predicates are connected by OR keywords, or there is an IN predicate. A technique called index ORing can be used, which combines results from multiple index accesses on the same table.
  • It is beneficial to use list prefetch for a single index access, since sorting the row identifiers before accessing the base rows makes the I/O more efficient.
Remote data fetch
This operator represents the retrieval of data from a remote data source in a federated system.
Sort
This operator represents the sorting of the rows in a table into the order of one or more of its columns, optionally eliminating duplicate entries.

Sorting is required when no index exists that satisfies the requested ordering, or when sorting would be less expensive than an index scan. Sorting is usually performed as a final operation once the required rows are fetched, or to sort data prior to a join or a group by.

If the number of rows is high or if the sorted data cannot be piped, the operation requires the costly generation of temporary tables.

Table scan
This operator represents a table scan (relation scan) that retrieves rows by reading all the required data directly from the data pages.
This type of scan is chosen by the optimizer over an index scan when:
  • The range of values scanned occurs frequently (that is, most of the table must be accessed).
  • The table is small.
  • Index clustering is low.
  • An index on the table does not exist.
Temporary table
This operator represents the storing of data in a temporary table, so that the data can be read during another operation.
Table queue
This operator represents a table queue that is used to pass table data from one database agent to another when there are multiple database agents processing a query. Multiple database agents are used to process a query when parallelism is involved.
Union

This operator represents a necessary operation. To improve access plan costs, concentrate on other nodes (such as scans and joins) that define the set of rows to be deleted.

Duplicate elimination
This operator represents the elimination of rows having duplicate values for specified columns.
Update

This operator represents a necessary operation. To improve access plan costs, concentrate on other nodes (such as scans and joins) that define the set of rows to be deleted.

Index scan over XML data
This operator represents a range scan of any associated index over the corresponding XML data before the base table is accessed. The operation narrows the set of qualifying row IDs and XML node IDs.
Performance suggestions:
  • Over time, database updates might cause an index to become fragmented, resulting in more index pages than necessary. This can be corrected by dropping and recreating the index, or reorganizing the index.
  • When two or more tables are being accessed, access to the inner table by means of an index can be made more efficient by providing an index on the join column of the outer table.
  • If statistics are not current, update them using the RUNSTATS command.
XML document scan
This operator represents the navigation of XML fragments for evaluating XPath expressions and extracting any needed document fragments.
XML index ANDing
This operator represents the index over XML data ANDing of the results of multiple index scans, used for the evaluation of complex predicates from a single query.
In order for this operation to be used, the following conditions must be met:
  • Only equality predicates are used.
  • There are no wildcards in the index lookup path.
  • All predicates are used on the same XML column.

If any of these conditions are not met the Index intersection operation will be used instead.

Performance suggestions:
  • Over time, database updates can cause an index to become fragmented, resulting in more index pages than necessary. This can be corrected by dropping and recreating the index, or by reorganizing it.
  • If statistics are not current, update them using the RUNSTATS command.
  • In general, index scans are most effective when only a few rows qualify. To estimate the number of qualifying rows, the optimizer uses the statistics that are available for the columns referenced in predicates. If some values occur more frequently than others, it is important to request distribution statistics by using the WITH DISTRIBUTION clause with the RUNSTATS command. By using the non-uniform distribution statistics, the optimizer can distinguish among frequently and infrequently occurring values.

Feedback