The SELECT function combines, filters, and transforms complex
message and database data.
Syntax
Notes:
- You no longer require the enclosing parentheses in SELECT expressions.
This does not prevent you using parentheses but, if they are present, they
are merely normal, expression-scoping, parentheses.
- For the COUNT parameter only, you can specify the value of the following Expression as
a single star (*).
Usage
The
SELECT function is the usual and most efficient way of transforming messages.
You can use SELECT to:
- Comprehensively reformat messages
- Access database tables
- Make an output array that is a subset of an input array
- Make an output array that contains only the values of an input array
- Count the number of entries in an array
- Select the minimum or maximum value from a number of entries in an array
- Sum the values in an array
Introduction to SELECT
The SELECT
function considers a message tree (or sub-tree) to consist of a number of
rows and columns, rather like a database table. A
FieldReference in
a FROM clause identifies a field in a message tree. The identified field
is regarded in the following ways:
- The identified field is regarded as a row in a table.
- The field’s siblings are regarded as other rows of the same table.
- The field’s children are regarded as the table’s columns.
Note: The FieldReference in a FROM clause can also
be a table reference that refers directly to a real database table.
The
return value of the SELECT function is typically another message tree that
contains rows whose structure and content is determined by the SelectClause.
The number of rows in the result is the sum of all the rows pointed to by
all the field references and table references in the FROM clause, filtered
by the WHERE clause; only those fields for which the WHERE clause evaluates
to TRUE are included.
The return value of the SELECT function can also
be scalar (see ITEM selections).
You can specify
the
SelectClause in several ways; see:
Simple selections
To
understand the SELECT function in more detail, first consider the following
simple case:
- The SelectClause consists of a number of expressions,
each with an AS Path clause.
- The FROM clause contains a single FieldReference and
an AS CorrelationName clause.
The SELECT function creates a local, reference, correlation variable,
whose name is given by the AS
CorrelationName clause, and
then steps, in turn, through each row of the list of rows derived from the
FROM clause. For each row:
- The correlation variable is set to point to the current row.
- The WHERE clause (if present) is evaluated. If it evaluates to FALSE or
unknown (null), nothing is added to the result tree and processing proceeds
to the next row of the input. Otherwise processing proceeds to the next step.
- A new member is added to the result list.
- The SELECT clause expressions are evaluated and assigned to fields named
as dictated by the AS Path clause. These fields are child
fields of the new member of the result list.
Typically, both the SelectClause and the WHERE
clause expressions use the correlation variable to access column values (that
is, fields in the input message tree) and thus to build a new message tree
containing data from the input message. The correlation variable is referred
to by the name specified in the AS CorrelationName clause
or, if an AS clause is not specified, by the final name in the FROM FieldReference (that
is, the name after the last dot).
Note that:
- Despite the analogy with a table, you are not restricted to accessing
or creating messages with a flat, table-like, structure; you can access and
build trees with arbitrarily deep folder structures.
- You are not restricted to a column being a single value; a column can
be a repeating list value or a structure.
These concepts are best understood by reference to the examples.
If
the field reference is actually a TableReference, the operation
is very similar. In this case, the input is a real database table and is thus
restricted to the flat structures supported by databases. The result tree
is still not so restricted, however.
If the FROM clause contains more
than one field reference, the rightmost reference steps through each of its
rows for each row in the next-to-rightmost reference, and so on. The total
number of rows in the result is thus the product of the number of rows in
each table. Such selects are known as joins and commonly use a WHERE
clause that excludes most of these rows from the result. Joins are commonly
used to add database data to messages.
The AS
Path clause
is optional. If it is unspecified, the broker generates a default name according
to the following rules:
- If the SelectClause expression is a reference to a
field or a cast of a reference to a field, the name of the field is used.
- Otherwise the broker uses the default names Column1, Column2, and so on.
Examples
The following example
performs a SELECT on the table Parts in the schema Shop in the database DSN1.
Because no WHERE clause exists, all rows are selected. Because the select
clause expressions (for example, P.PartNumber) contain no AS clauses, the
fields in the result adopt the same names:
SET PartsTable.Part[] = SELECT
P.PartNumber,
P.Description,
P.Price
FROM Database.DSN1.Shop.Parts AS P;
If the target of the SET
statement (PartsTable) is a variable of type ROW, after the statement is executed
PartsTable will have, as children of its root element, a field called Part
for each row in the table. Each of the Part fields will have child fields
called PartNumber, Description, and Price. The child fields will have values
dictated by the contents of the table. (PartsTable could also be a reference
into a message tree).
The next example performs a similar SELECT. This
case differs from the last in that the SELECT is performed on the message
tree produced by the first example (rather than on a real database table).
The result is assigned into a subfolder of OutputRoot:
SET OutputRoot.XMLNS.Data.TableData.Part[] = SELECT
P.PartNumber,
P.Description,
P.Price
FROM PartsTable.Part[] AS P;
INSERT selections
The
INSERT clause is an alternative to the AS clause. It assigns the result of
the SelectClause expression (which must be a row) to the
current new row itself, rather than to a child of it. The effect of this is
to merge the row result of the expression into the row being generated by
the SELECT. This differs from the AS clause, in that the AS clause always
generates at least one child element before adding a result, whereas INSERT
generates none. INSERT is useful when inserting data from other SELECT operations,
because it allows the data to be merged without extra folders.
ITEM selections
The
SelectClause can
consist of the keyword ITEM and a single expression. The effect of this is
to make the results nameless. That is, the result is a list of values of the
type returned by the expression, rather than a row. This option has several
uses:
- In conjunction with a scalar expression and the THE function, it can be
used to create a SELECT query that returns a single scalar value (for example,
the price of a particular item from a table).
- In conjunction with a CASE expression and ROW constructors, it can be
used to create a SELECT query that creates or handles messages in which the
structure of some rows (that is, repeats in the message) is different to others.
This is useful for handling messages that have a repeating structure but in
which the repeats do not all have the same structure.
- In conjunction with a ROW constructor, it can be used to create a SELECT
query that collapses levels of repetition in the input message.
Column function selections
The SelectClause can
consist of one of the functions COUNT, MAX, MIN, and SUM operating on an expression.
These functions are known as column functions. They return a single scalar
value (not a list) giving the count, maximum, minimum, or sum of the values
that Expression evaluated to in stepping through the rows
of the FROM clause. If Expression evaluates to NULL for
a particular row, the value is ignored, so that the function returns the count,
maximum, minimum, or sum of the remaining rows.
For the COUNT function
only, Expression can consist of a single star (*). This
form counts the rows regardless of null values.
To make the result
a useful reflection of the input message, Expression typically
includes the correlation variable.
Typically, Expression evaluates
to the same data type for each row. In these cases, the result of the MAX,
MIN, and SUM functions are of the same data type as the operands. The returned
values are not required to be all of the same type however, and if they are
not, the normal rules of arithmetic apply. For example, if a field in a repeated
message structure contains integer values for some rows and float values for
others, the sum follows the normal rules for addition. The sum is of type
float because the operation is equivalent to adding a number of integer and
float values.
The result of the COUNT function is always an integer.
Differences between message and database
selections
FROM expressions in which a correlation variable represents
a row in a message behave slightly differently from those in which the correlation
variable represents a row in a real database table.
In the message case,
a path involving a star (*) has the normal meaning; it ignores the field's
name and finds the first field that matches the other criteria (if any).
In
the database case a star (*) has, for historical reasons, the special meaning
of all fields. This special meaning requires advance knowledge of the definition
of the database table and is supported only when querying the default database
(that is, the database pointed to by the node's
data source attribute).
For example, the following queries return column name and value pairs only
when querying the default database:
SELECT * FROM Database.Datasource.SchemaName.Table As A
SELECT A.* FROM Database.Datasource.SchemaName.Table As A
SELECT A FROM Database.Datasource.SchemaName.Table AS A
Specifying the SELECT expressions
- SelectClause
- SelectClause expressions can use any of the broker’s
operators and functions in any combination. They can refer to the tables’
columns, message fields, correlation names declared by containing SELECTs,
and to any other declared variables or constants that are in scope.
- AS Path
- An AS Path expression is a relative path (that is,
there is no correlation name) but is otherwise unrestricted in any way. For
example, it can contain:
- Indexes (for example, A.B.C[i])
- Field-type specifiers (for example, A.B.(XML.Attribute)C )
- Multipart paths (for example, A.B.C )
- Name expressions (for example, A.B.{var})
Any expressions in these paths can also use any of the broker’s operators
and functions in any combination. The expressions can refer to the tables’
columns, message fields, correlation names declared by containing SELECTs,
and any declared variables or constants.
- FROM clause
- FROM clause expressions can contain multiple database references, multiple
message references, or a mixture of the two. You can join tables with tables,
messages with messages, or tables with messages.
FROM clause FieldReferences
can contain expressions of any kind (for example, Database.{DataSource}.{Schema}.Table1).
You
can calculate a field, data source, schema, or table name at run time.
- WHERE clause
The WHERE
clause expression can use any of the broker’s operators and functions in any
combination. It can refer to table columns, message fields, and any declared
variables or constants.
However, be aware that the broker
treats the WHERE clause expression by examining the expression and deciding
whether the whole expression can be evaluated by the database. If it can,
it is given to the database. In order to be evaluated by the database, it
must use only those functions and operators supported by the database.
The WHERE clause can, however, refer to message fields, correlation
names declared by containing SELECT functions, and to any other
declared variables or constants within scope.
If the whole
expression cannot be evaluated by the database, the broker looks for top-level
AND operators and examines each sub-expression separately. It then attempts
to give the database those sub-expressions that it can evaluate, leaving the
broker to evaluate the rest. You need to be aware of this situation for two
reasons:
- Apparently trivial changes to WHERE clause expressions can have large
effects on performance. You can determine how much of the expression was given
to the database by examining a user trace.
- Some databases’ functions exhibit subtle differences of behavior from
those of the broker.
Relation to the THE function
You
can use the function THE (which returns the first element of a list) in conjunction
with SELECT to produce a non-list result. This is useful, for example, when
a SELECT query is required to return no more than one item. It is particularly
useful in conjunction with ITEM (see ITEM selections).
Differences from the SQL standard
ESQL
SELECT differs from database SQL SELECT in the following ways:
- ESQL can produce tree-structured result data
- ESQL can accept arrays in SELECT clauses
- ESQL has the THE function and the ITEM and INSERT parameters
- ESQL has no SELECT ALL function in this release
- ESQL has no ORDER BY function in this release
- ESQL has no SELECT DISTINCT function in this release
- ESQL has no GROUP BY or HAVING parameters in this release
- ESQL has no AVG column function in this release
Restrictions
The
following restrictions apply to the current release:
- When a SELECT command operates on more than one database table, all the
tables must be in the same database instance. (That is, the TableReferences
must not specify different data source names.)
- If the FROM clause refers to both messages and tables, the tables must
precede the messages in the list.
- Using dynamic DSN, SCHEMA and TABLE names with 'SELECT *' statements
is not supported. If you use a schema, table or datasource name as a variable
(dynamic variables) in 'SELECT *' queries, the variables are not resolved
to the correct set of schema or table names.