< Previous | Next >

Lesson 1.1: Customize the source data used in ETL jobs

In this lesson you will learn how to modify XML data configuration files. The XML data configuration file contains information about the source and structure of the data used for ETL and is used by the XML ODBC driver at run time. In this lesson, you will add new attributes and resources to an XML data configuration file.
Prerequisite: Before you begin the lesson, back up the clearquest.xdc. The default location is the [rational_insight_installation_directory]/dataconfig/configs/configurations directory.

For this lesson, you will use the clearquest.xdc file as a starting point and modify this file to meet your lesson objectives. You will add a component entity and make it a property of the defect record.

In this lesson, you will:
  • Modify the connection information of the resource group to match the data services server deployed in your environment
  • Modify the defect data table to add new attributes and value mappings
  • Add a new resource and data table for the component
  • Add multiple projects to a resource category
Configure the server connections for the resource group
  1. Start XML Data Configuration.
  2. Click File > Open Configuration > clearquest.xdc. Typically, the file is located in the [rational_insight_installation_directory]/dataconfig/configs/configurations directory.
  3. Expand the XML Data Configuration node to list the resource groups. Double-click CQEnterprise. Typically, CQEnterprise is located in XML Data Configuration > Resource > ClearQuest Enterprise > CQEnterprise.
    Navigation tree for the XML data configuration file
  4. In the resource group editor, change the Data Services URL to match the URL of the data services server deployed in your environment, and change the authentication type and the user name and password to match the authentication settings in your data services server.
    Concept: A resource group should define a base URL for a set of XML documents. The recommended scope for the resource groups by products are:
    • Rational® ClearQuest®: URL to a user database
    • Rational RequisitePro®: URL to a RequisitePro project
    • Rational ClearCase®: URL to a ClearCase VOB
    • Microsoft® Project: URL to a project file
    • Rational QualityManager: Root URL to the RQM REST service
    • Rational TestManager: URL to a TestManager project
  5. Save the configuration.
You have now modified the connection information of the clearquest.xdc file to match data services server deployed in your environment.
Add columns to a data table
  1. Expand the CQEnterprise > ENTDefect node to list the data tables.
    Data tables under the ENTDefect node
    Concept: The ENTDefect node represents the XML document (resource) for all defect records returned by joining the URL defined in the parent resource group to the relative path assigned to the resource.
  2. Double-click the ENTDefect data table.
  3. Click the Columns tab, select the row for the Priority column, and click the Value mapping link at the top right corner of the editor.
    Concept: With a value map, you can transform the source instance values to a new value during the extraction process. This is a convenient method for standardizing on a common definition for data from multiple sources. For example, if your company uses High, Medium, and Low to define priority and some groups within the company use P1, P2, and P3, you can define a value map to standardize this definition.
  4. In the Value Mapping window, specify the following values:
    Key Value
    P1 High
    P2 Medium
    P3 Low
    Save the data table.
  5. To open the associated data table template, in the Columns tab, click the Modify link at the top right corner. Select the Priority row definition, select the BusinessPriority column in the XML schema section, and click Update XPath.
    Update the XPath
    Concept: The XML Data Configuration automatically validates the schema returned from the data service with existing column mappings. Any invalid mappings are highlighted. In this situation, the customized schema has the same information as the current Priority column but in a different attribute. In this step, you re-mapped the invalid out-of-the box /Priority XPath to the valid /BusinessPriority XPath.
  6. In the Column Mapping section, expand the Defect root node under the XML schema section to list the attributes under Defect.

    In the next few steps, you will create new table column mappings from the XML source. The tree structure on the left represents the child nodes contained within the extractable node associated with the data table.

  7. Click Status and click Create. Make sure that the name of the table column is Status and the SQL Type is Varchar. Save the table template.
  8. In the Mapping Columns section, expand the Defect root node under the XML schema section to list the attributes under Defect. Select the node Defect > Component.
  9. Click dbid and click Create. Make sure that the name of the table column is Component_dbid and the SQL Type is Integer. Save the table template.
    Concept: In the customized Enterprise schema for this tutorial, you need to give enough information for the ETL process to create a relationship from the Defect record to the Component record. You will do this by exposing the Component dbid attribute (the unique key for ClearQuest records).
  10. Return to the editor of the ENTDefect data table. When prompted to update, click OK.
  11. Select the two new columns Status and Component_dbid, and click Load.
    Concept: A data table definition can contain a subset of the column mappings in the associated data table template. With the Load/Unload feature, you can control the visibility of the columns to the ODBC clients.
  12. Save the data table.
You have now modified the defect data table by adding new attributes and value mappings.

Next, you will define a new data table for exposing the component data from the XML source. The first step is to define a new resource to represent the XML document containing the component information.

Create a data table
  1. Right-click the CQEnterprise resource group, select New > Resource, and select Locate using data service.
    Concept: If you do not know the URL of the resource, you can use the Locate using data service to define the resource.
  2. Specify the name to be ENTComponent and click Next.
  3. Expand the Database > RecordQueries > Queries > Query node and in the Element Attributes section, select the href checkbox and click Next.

    You are now using the wizard to retrieve the schema for the XML document returned from the resource group level. The PersonalQueries and PublicQueries folders contain the relevant queries defined in the ClearQuest user database. The RecordQueries folder contains dynamically generated queries of the ClearQuest data service that will return all records of a particular type. With the wizard page, you can narrow your focus to all of the possible records without having to see the actual data.

  4. Select a URL that ends with /Record+Queries/Component. All instances of the node that was selected through the previous wizard page are displayed. Click Finish.
  5. Right-click the newly created resource, and select New > Data Mapping Table and specify the name to be ENTComponent. Click Next.
  6. Specify the extractable node to be /Query/results/Component. Click Next.
  7. Select Create a new template. Type a name and description for the data table and click Next.

    In the next step, you will select those nodes in the XML tree that you want as table columns.

  8. Select the following nodes:
    • dbid as SQL type integer
    • Component as SQL type VARCHAR
    Click Finish.
You have now defined the necessary data tables in the CQEnterprise resource group. At the moment, it is only for a single ClearQuest database. However, you can extract data from multiple resource groups under a single resource category. To be able to do this, you need to create resource groups for each ClearQuest database under the resource category.

In the next few steps, you will create a new resource group under the ClearQuest Enterprise resource category. To do this, you will use the CQEnterprise resource group as the starting point.

Create a resource group for other projects
  1. Drag the CQEnterprise resource group and drop it in the ClearQuest Enterprise resource category.
  2. Click Copy.
  3. Select the duplicated resource group and modify it by changing the URL to match the data services URL for the ClearQuest user database for another project. In the first section of this tutorial, you learnt how to do so.
    Concept: In the ETL, you can load multiple resource groups with identical configurations in a batch mode by organizing the resource groups in categories.
  4. Select Link to existing resource group, select CQEnterprise, click OK, and save the changes.
    Concept: A resource group can be linked to an existing resource group. The new resource group you created has its own name and connection information but it shares all resources and data mapping tables defined in the existing resource group CQEnterprise. This means only one copy of the resources and data mapping tables are maintained. If you change CQEnterprise, all linked resource groups will automatically use the updated resources and data mapping tables.

Lesson checkpoint

You modified the XML data configuration file of a product source. You added new attributes to the data table and added a resource group to a resource category.
< Previous | Next >

Feedback