These operators can appear in the Access Plan Explorer
when you are analyzing the access plan for an SQL statement that runs
on DB2® for Linux, UNIX,
and Windows.
- Delete
This operator represents a
necessary operation. To improve access plan costs, concentrate on
other nodes (such as scans and joins) that define the set of rows
to be deleted.
Performance suggestion: If you are deleting
all rows from a table, consider using the DROP TABLE statement or
the LOAD REPLACE command.
- User-defined index scan
- The scanning uses the multiple
start/stop conditions from the user-supplied range producer function.
This operation is performed to narrow down the set of qualifying rows
before the optimizer accesses the base table (based on predicates).
Performance suggestion:- Over
time, database updates
can cause an index to become fragmented, resulting in more index pages
than necessary. This can be corrected by dropping and re-creating
the index, or reorganizing the index.
- If statistics are not
current, update them by using the RUNSTATS command.
- Fetch
- This operator represents the fetching of columns from a table
by using a specific row identifier (RID).
- Filter data
- This operator represents the application of residual predicates,
so that the data is filtered based on the criteria that is supplied
by the predicates.
- Generate table rows
- This operator represents a built-in function that generates a
table of rows, by using no input from tables, indexes, or operators.
- Group by
- This operator represents the grouping of rows according to common
values of designated columns or functions. This operation is required
to produce a group of values, or to evaluate set functions. If no
GROUP BY columns are specified, this operator can still be used if
there are aggregate functions in the SELECT list. Their presence indicates
that the optimizer treats the entire table as a single group when
the performing the aggregation.
- Performance suggestion:
- To improve access plan costs,
concentrate on other nodes (such as scans and joins) that define the
set of rows to be grouped.
- To improve the performance
of a SELECT statement that contains a single aggregate function but
no GROUP BY clause, try the following:
- For a MIN(C) aggregate
function, create an ascending index on
C.
- For a MAX(C) aggregate function, create a descending index
on
C.
- Hash join
- This operator represents a hash join for which the qualified rows
from tables are hashed to allow direct joining without pre-ordering
the content of the tables.
A join is necessary whenever
more than one table is referenced in a FROM clause. A hash join is
possible whenever a join predicate equates columns from two different
tables. The join predicates need to be exactly the same data type.
Hash joins can also arise from a rewritten subquery, as is the case
with nested loop joins.
A
hash join does not require that the input tables are ordered. The
join is performed by scanning the inner table of the hash join and
generating a lookup table by hashing the join column values. It then
reads the outer table, hashing the join column values, and checking
in the lookup table generated for the inner table.
Performance suggestion:
- Use
local predicates (that
is, predicates that reference one table) to reduce the number of rows
to be joined.
- Increase
the size of the
sort heap to make it large enough to hold the hash lookup table in
memory.
- If statistics
are not
current, update them by using the RUNSTATS command.
- Insert
This operator represents a
necessary operation. To improve access plan costs, concentrate on
other nodes (such as scans and joins) that define the set of rows
to be deleted.
- Index intersection
- This operator represents the ANDing of the results of multiple
index scans by using Dynamic Bitmap techniques. The operation
allows ANDed predicates to be applied to multiple indexes, in order
to reduce underlying table accesses to a minimum.
This operation is performed
to:
- Narrow down the set of rows before accessing the base
table
- AND together predicates applied to multiple indexes
- AND together the results of semijoins, used in star joins.
Performance
suggestions: - Over time, database updates can cause an
index to become fragmented,
resulting in more index pages than necessary. This can be corrected
by dropping and recreating the index, or reorganizing the index.
- If statistics are not current, update them using the RUNSTATS
command .
- In general, index scans are most effective when
only a few rows
qualify. To estimate the number of qualifying rows, the optimizer
uses the statistics that are available for the columns referenced
in predicates. If some values occur more frequently than others, it
is important to request distribution statistics by using the WITH
DISTRIBUTION clause for the RUNSTATS command. By
using the non-uniform distribution statistics, the optimizer can distinguish
among frequently and infrequently occurring values.
- This
operation can best exploit single column indexes, as start
and stop keys are critical to it.
- Index scan
- This operator represents the scanning of an index to produce a
reduced stream of row IDs. The scanning can use optional
start and stop conditions, or might apply to indexable predicates
that reference columns of the index.
This operation is performed
to narrow down the set of qualifying row IDs before accessing the
base table (based on predicates).
Performance suggestions: - Over time, database updates can cause an index to become fragmented,
resulting in more index pages than necessary. This can be corrected
by dropping and recreating the index, or reorganizing the index.
- When two or more tables are being accessed, access to the inner
table via an index can be made more efficient by providing an index
on the join column of the outer table.
- If statistics are
not current, update them using the RUNSTATS command.
- In general, index scans are most effective when only a few row
IDs qualify. To estimate the number of qualifying row IDs, the optimizer
uses the statistics that are available for the columns referenced
in predicates. If some values occur more frequently than others, it
is important to request distribution statistics by using the WITH
DISTRIBUTION clause for the RUNSTATS command. By
using the non-uniform distribution statistics, the optimizer can distinguish
among frequently and infrequently occurring values.
- Merge scan join
- A merge join for which the
qualified rows from both outer and inner tables must be in join-predicate
order. A merge join is also called a merge scan join or a sorted
merge join .
A
join is necessary whenever there is more than one table referenced
in a FROM clause. A merge join is possible whenever there is a join
predicate that equates columns from two different tables. It can also
arise from a rewritten subquery.
A merge join requires ordered
input on joining columns, since the tables are typically scanned only
once. This ordered input is obtained by accessing an index or a sorted
table.
Performance
suggestions: - Use local predicates (that is, predicates
that reference one table)
to reduce the number of rows to be joined.
- If statistics
are not current, update them using the RUNSTATS command.
- Nested loop join
- This operator represents a nested loop join that scans (usually
with an index scan) the inner table once for each row of the outer
table.
A join is necessary whenever
there is more than one table referenced in a FROM clause. A nested
loop join does not require a join predicate, but generally performs
better with one.
A
nested loop join is performed either:
- By scanning through
the inner table for each accessed row of the
outer table.
- By performing an index lookup on the inner table
for each accessed
row of the outer table.
- Return
- This operator represents the return of data from a query. This
is the final operator in the access plan and shows the total accumulated
values and costs for the access plan.
- Row identifier (RID) scan
- This operator represents a scan of a list of row identifiers (RIDs)
obtained from one or more indexes.
This operation
is considered by the optimizer when:
- Predicates are connected
by OR keywords, or there is an IN predicate.
A technique called index ORing can be used, which combines results
from multiple index accesses on the same table.
- It is beneficial
to use list prefetch for a single index access,
since sorting the row identifiers before accessing the base rows makes
the I/O more efficient.
- Remote data fetch
- This operator represents the retrieval of data from a remote data
source in a federated system.
- Sort
- This operator represents the sorting of the rows in a table into
the order of one or more of its columns, optionally eliminating duplicate
entries.
Sorting is required when no
index exists that satisfies the requested ordering, or when sorting
would be less expensive than an index scan. Sorting is usually performed
as a final operation once the required rows are fetched, or to sort
data prior to a join or a group by.
If the number of rows is high
or if the sorted data cannot be piped, the operation requires the
costly generation of temporary tables.
- Table scan
- This operator represents a table scan (relation scan) that retrieves
rows by reading all the required data directly from the data pages.
This type of scan is chosen
by the optimizer over an index scan when:
- The range of values
scanned occurs frequently (that is, most of
the table must be accessed).
- The table is small.
- Index
clustering is low.
- An index on the table does not exist.
- Temporary table
- This operator represents the storing of data in a temporary table,
so that the data can be read during another operation.
- Table queue
- This operator represents a table queue that is used to pass table
data from one database agent to another when there are multiple database
agents processing a query. Multiple database agents are used to process
a query when parallelism is involved.
- Union
This operator represents a
necessary operation. To improve access plan costs, concentrate on
other nodes (such as scans and joins) that define the set of rows
to be deleted.
- Duplicate elimination
- This operator represents the elimination of rows having duplicate
values for specified columns.
- Update
This operator represents a
necessary operation. To improve access plan costs, concentrate on
other nodes (such as scans and joins) that define the set of rows
to be deleted.
- Index scan over XML data
- This operator represents a range scan of any associated index
over the corresponding XML data before the base table is accessed.
The operation narrows the set of qualifying row IDs and XML node IDs.
Performance
suggestions: - Over time, database updates might cause
an index to become fragmented,
resulting in more index pages than necessary. This can be corrected
by dropping and recreating the index, or reorganizing the index.
- When two or more tables are being accessed, access to the inner
table by means of an index can be made more efficient by providing
an index on the join column of the outer table.
- If statistics
are not current, update them using the RUNSTATS command.
- XML document scan
- This operator represents the navigation of XML fragments for evaluating
XPath expressions and extracting any needed document fragments.
- XML index ANDing
- This operator represents the index over XML data ANDing of the
results of multiple index scans, used for the evaluation of complex
predicates from a single query.
In order for this operation
to be used, the following conditions must be met:
- Only equality
predicates are used.
- There are no wildcards in the index lookup
path.
- All predicates are used on the same XML column.
If
any of these conditions are not met the Index intersection operation
will be used instead.
Performance
suggestions: - Over time, database updates can cause an
index to become fragmented,
resulting in more index pages than necessary. This can be corrected
by dropping and recreating the index, or by reorganizing it.
- If
statistics are not current, update them using the RUNSTATS command.
- In general, index scans are most effective when only a few rows
qualify. To estimate the number of qualifying rows, the optimizer
uses the statistics that are available for the columns referenced
in predicates. If some values occur more frequently than others,
it is important to request distribution statistics by using the WITH
DISTRIBUTION clause with the RUNSTATS command.
By using the non-uniform distribution statistics, the optimizer can
distinguish among frequently and infrequently occurring values.