In this lesson you will learn how to customize an existing
Data Manager build for extracting data, transforming the data, and
loading the data into the data warehouse.
Prerequisites: - You have completed lesson 1.1.
- You have created a new empty database for the ETL catalog. To
create a database, see the product documentation of the database you
are using to host the data warehouse.
In lesson 1.1, you modified the Rational® ClearQuest® data configuration file by
adding new attributes to a data table. These attributes now need to
be put into the build stream so that they can be exposed in reports.
In
this lesson, you will modify the Data Manager build so that new attributes
enter the data collection process. You will learn to customize an
existing build to extract new attributes, and transform and deliver
these attributes to the data warehouse.
Prepare the ETL catalog- Start IBM® Cognos® Data Manager.
- Select Create a new catalog, specify the
name to be RationalInsightETL, specify the
connection information of the new empty database you created before
you began this lesson, and click Finish.
- When the new catalog is created and opens, click , and select the OOTBDMCatalog.ctg file. The default
location of this file is the [rational_insight_installation_directory]/datamgr/insight_catalog/ directory.
Extract new attributes from the data source- Expand the folder . Click CQ_ENT_Defect.
Concept: In Data Manager, the ETL is specified
in fact or dimension builds. A fact build delivers fact data and can
also deliver dimension data, so you can construct a private data mart
from within a single build. A dimension build delivers data to describe
a single business dimension, such as a product or a customer. Rational Insight uses fact
builds to load data into the consolidated operational data store.
Fact and dimension builds are used to build the data marts or star
schema. The builds are organized into JobStreams as a series of steps
that can be executed in sequence or in parallel in Data Manager.
- Double-click the RI_Stg data source icon.
Click the Query tab.
Concept: By using
the data source element of the build, you can select a subset of the
database required for the build.
In lesson 1.1, you modified the ClearQuest defect entity
to expose the related component ID and status attributes. In the next
few steps, you will modify the data source query to include the new
attributes.
- Expand the ClearQuest node to list the
database schemas (modeled as resource groups in through XML Data Configuration).
Expand the CQEnterprise schema to list all
data tables defined in the CQEnterprise resource group from the data
configuration file from lession 1.1. Expand the ENTDefect table
to show all of the defined columns.
You will now modify the existing
SQL to retrieve the component_dbid and status columns. Make sure you
add a comma to separate the column names.
- Drag the ENTDefect columns Status and Component_dbid from the
panel at the left to the SQL Query panel at the right. Place the new
columns after the existing "href" entry. Make sure that the columns
are separated by a comma.
- To test the new SQL statement, click at the lower left of the window. In the window to specify
variables, type the schema name (CQEnterprise) for the $RESOURCEGROUP
entry. Leave the $MODIFIED_SINCE field empty so that the null value
is used by the data service to return all records.
- Click the Result columns tab and click Refresh.
In the window to specify variables, type the schema name (CQEnterprise)
for the $RESOURCEGROUP entry. Leave the $MODIFIED_SINCE field empty
so that the null value is used by the data service to return all records.
The columns defined in this tab are the resulting columns from the
SQL defined in step 7. These columns will be used in downstream objects.
Click OK.
- Right-click DataStream and select Properties.
In the DataStream items tab, double-click the
new columns in the left panel to add them into the data stream, and
click OK.
Concept: A
DataStream object gathers together all the data sources from which
a build or reference structure acquires data. This provides a method
to combine different data sources into one object. An example of this
can be seen in the RequisitePro® builds.
You have now added one new column to a catalog and updated the
transformation model to reflect this change.
Next, you will
transform the component_dbid attribute to the surrogate key of the
component in the data warehouse.
Transform the attributes- Right-click the Transformation Model icon
and click Mapping.
Concept: The transformation model is central to a fact build.
You use it to manipulate the acquired source data in a number of ways,
such as merging data from different sources and aggregating data.
In this lesson, you will load the new attributes without applying
any business rules.
- Double-click the Status data stream.
Concept: You will create a new attribute in
the transformation model and map the Data Stream item to the attribute.
- Click Add, click Dimension,
specify the name to be Lookup_Component, and
select the (no reference) child node. Click
the button to the right of this node.
- Select the node and click OK.
- Return to the mapping window. In the left panel, select the Component_dbid data
stream item and in the right panel, select the Lookup_Component=>EXTERNAL_KEY1 attribute,
and click Map.
Concept: In the next few steps, you are building the relationship
between the defect and component entity in the data warehouse. The
EXTERNAL_KEY1 specifies the natural key (dbid for Rational ClearQuest) for the component. The lookup
will return the surrogate key if the component exists in the data
warehouse. If the record does not exist, the build will insert the
default -1 entry representing "info not available."
- Click the Add button and click Attribute.
Specify the name to be COMPONENT_ID, and click OK twice.
- Right-click the Transformation Model icon
and select Show Build Elements. Double-click Lookup_Component,
click the Unmatched Members tab, select Accept
unmatched member identifiers, and click OK.
- In the Transformation Model tab, right-click
the COMPONENT_ID attribute and click Convert
to derivation.
- In the Transformation Model tab, double-click
the COMPONENT_ID attribute, click the Calculation tab,
and type the following code:
If NOT Unmatched( Lookup_Component )
then RETURN Lookup_Component.COMPONENT_ID ;
else RETURN -1;
Click OK.Concept: For specifying a positive match for
this type of condition, a double-negative construction is used in
the script.
Deliver the attributes to target table- Double-click the "{$ODS}"."REQUEST" table
delivery.
- In the Table Properties tab, specify the
column names to be STATUS and COMPONENT_ID for the Status and COMPONENT_ID
elements. Select the checkbox for Automatically add columns
to table. Click OK.
Concept: Because you did not
modify the defect table in the data warehouse to include columns to
store the new attributes, checking the Automatically add
columns to table option results in the table getting modified
if the columns do not exist. You can also use Rational Data Architect to modify the table
before executing the ETL job or to update the data model from the
data warehouse after the ETL job has modified the table.
- Save the catalog.
You have now modified an existing build.
In this lesson, you have:
When you load a new entity, you
need to create a new build and modify an existing build to add more
attributes to an existing entity. In this lesson, you modified an
existing build. In the next lesson (Lesson 1.3), you will create a
new build. The ETL process requires one build for each entity.