Debugging and Optimizing Data Loads

This chapter describes how to debug data loads by finding the problem, fixing the problem, and loading the failed records. In addition, this chapter describes how to optimize data loads. This chapter contains the following sections:

Debugging a Data Load

If you try to load a data source into Hyperion Essbase OLAP Server, but it does not load correctly, check the following:

Are you connected to the appropriate application and database?
Did you try to load the correct data source?

If the answer to those questions is yes, then there is probably something wrong. When you have trouble loading a data source, look at the error log file generated for that data load. It lists the errors that occurred when Hyperion Essbase tried to load the data source. The error log file is located on the client machine in \ESSBASE\CLIENT\DATALOAD.ERR.

If there is no error log file, check the following:

Did the person running the data load set one up? See Setting the Error Log File for information on setting up an error log. By default, Hyperion Essbase creates an error log when you load data using a rules file.
Are you sure that the data source and Hyperion Essbase server are available? See the Verifying that the Server Is Available and Verifying that the Data Source Is Available sections in this chapter for more information.
Did the server crash during the data load? If so, you probably received a time-out error on the client. If the server crashed, see Recovering from a Server Crash.

If the error log file exists but is empty, Hyperion Essbase does not think that an error occurred during loading. Check the following:

Does the rules file contain selection/rejection criteria that rejected every record in the data source? See Selecting Records if you do not know how to set up selection/rejection criteria.
Is the rules file correct? Does the rules file validate properly? See Problems Validating a Rules File.

Not All Errors Are in the Log File

When Hyperion Essbase cannot load a record, it writes the record to the error log file, DATALOAD.ERR,on the client. There is a limit to the number of records that an error log can contain. The default limit is 1000 records, but you can set the limit to be lower than 1000 by setting DATAERRORLIMIT in the ESSBASE.CFG file. See the online Technical Reference in the DOCS directory for more information.

When Hyperion Essbase writes the maximum allowed number of records in the error log file, it does not log any other errors it encounters. The data load, however, continues. Any subsequent errors are lost.

Data Loaded Incorrectly

If the data source loads correctly, but the data in the database is wrong, check the following:

Are you sure that you loaded the correct data source? If so, check the data source again to make sure it contains the correct values.
Are there any blank fields in the data source? You must insert #MI or #MISSING into a data field that has no value. Otherwise, the data source may not load correctly. To replace a blank field with #MI or #MISSING, see Replacing an Empty Field with Text.
Is the data source formatted correctly? Are all ranges set up properly?
Are there any implicitly shared members you were unaware of? Implicit shares happen when a parent and child share the same data value. This occurs if a parent has only one child or only one child rolls up into the parent. See Building Shared Members Using a Rules File.
Did you add incoming data to existing data instead of replacing it using a rules file? See Changing Data Values.
Have you selected or rejected any records that you did not intend to select or reject using a rules file? See Setting up a Rules File to Manipulate Records.
If the sign is reversed (for example, a minus sign instead of a plus sign), did you perform any sign flips on UDAs (user-defined attributes) in your rules file using a rules file? See Flipping Field Signs.
Did you clear data combinations that you did not intend to clear using a rules file? See Clearing Existing Data Values.
Did you scale the incoming values incorrectly using a rules file? See Scaling Data Values.

Are all member and alias names less than 79 characters long?

Note:

You can check data by exporting it, running a report on it, or by using a spreadsheet. To do exports and reports, see Developing Report Scripts and Performing Interactive and Batch Operations Using ESSCMD. To use a spreadsheet, see the Hyperion Essbase Spreadsheet Add-in User's Guide for your particular spreadsheet.

After you fix the problem with the database or the rules file, you can load just the records that failed by loading the error log. See Loading the Error Log File.

Verifying that the Server Is Available

Try to access the server without using Hyperion Essbase to help identify if the problem is with Hyperion Essbase and not with your server or network. Check the following:

Is the server machine running? Try to connect to it without using Hyperion Essbase. If you cannot, check with your system administrator.
Is the Hyperion Essbase server running? Check with your Hyperion Essbase administrator.
Can the client machine connect to the server machine? Try to connect to the server machine from the client machine without using Hyperion Essbase.

Verifying that the Data Source Is Available

If Hyperion Essbase cannot open the data source to load, check the following:

Is the data source already open? This can happen if a user is editing the data source. Hyperion Essbase can only load data sources that are not locked by another user or application.
Does the data source have the correct file extension? All text files must have a file extension of.TXT. All rules files must have a file extension of.RUL.
Is the data source name or path name correct? Check for misspellings.
Is the data source in the specified location? Check to make sure no one has moved or deleted the data source.
If you are using an SQL data source, is the connection information (such as the user name, password, or database name) correct?
If you are using an SQL data source, can you connect to the SQL data source without using Hyperion Essbase?

Loading the Error Log File

If your Isolation Level transaction setting is Committed, you must re-start the data load from the beginning. If your Isolation Level is Uncommitted, you can load just the records that failed by loading the error log. Hyperion Essbase copies each unloaded record to the error log file during loading. Just reloading these records is much faster than loading each data source again, including loading those records that succeeded during the first load.

For more information on Isolation Level settings, see About Isolation Levels.

Make sure you fixed the problem that caused the errors. Then load the error log:

If you are loading from the server, rename the DATALOAD.ERR file to DATALOAD.TXT. Hyperion Essbase can only load text files that end in.TXT on the server. If you are loading from the client, the file can have any name valid on the local operating system.

Note: If you are reloading the dimension build error file, it is called DIMBUILD.ERR.
Load the DATALOAD.TXT file using the same rules file you used for the original data sources. If you do not know how to load a data source, see Introducing Data Loading.

Recovering from a Server Crash

If the server crashes while you are loading data, Hyperion Essbase sends you a time-out error. If you are overwriting the values in the database with the data source, reload the data sources after the server is running again.

If the Isolation Level transaction setting is Committed, you must re-start the data load from the beginning. If the Isolation Level is Uncommitted, and you are adding to or subtracting from the existing values in the database when the server crashes, do the following:

Determine how much data Hyperion Essbase loaded before the crash. Compare the values in the data source with the values in the database. If the values you are adding to or subtracting from were not changed, restart the data load.
If the values you are adding to or subtracting from were changed, you must clear the values that loaded and reload the previous data sources. If, for example, you derive the monthly sales figures by adding the sales figures for each week as they are loaded, clear the sales figures in the database and re-load the sales figures for each week up to the current week.

For more information on Isolation Level settings, see About Isolation Levels.

Problems Validating a Rules File

If you cannot validate your rules file, check to make sure that it is set up correctly. Make sure that:

All members and dimensions are spelled correctly.
All members are quoted, if they contain numbers or file delimiters.
The rules file is associated with the correct outline.
There are no extra delimiters in the data source.
Each record contains only one member from each dimension.
No members from the same dimension appear more than once in a record.
Any dimensions specified in the header record are not also in the data records.
You did not define more than one field as a data field.

Ignoring End of File Markers

Some SQL data sources may have end of file markers made up of special characters that can cause a data load or dimension build to fail. To fix this problem, define a rejection criterion to reject that record.

Find the end of file marker in your SQL data source.
Determine how to search for it using the Hyperion Essbase search command. This may be difficult as the end of file marker may be composed of one or more special characters. See Ignoring Fields Based on String Matching for information on how to do this.
Define a rejection criterion that rejects the end of file marker. See Rejecting Records for information on how to do this.

Optimizing Data Loads

Loading large data sources into an Hyperion Essbase database can take a great deal of time. However, you can speed up the data loading process. To speed up a data load, it is important to:

Minimize the time spent reading and parsing the data source.
Minimize the time spent reading and writing to the database.

This section contains the following subsections:

How Does Hyperion Essbase Load Data?

When Hyperion Essbase loads a data source, it does the following:

Uses the index to find the correct block on the disk. The index is composed of the sparse dimensions.
Loads the block into the cache, and loads the data into it.
Loads the block on the disk that the next record corresponds to. If the correct block is already in the cache, Hyperion Essbase does not have to load the first block.
Repeats this process until each field is loaded.

See Basic Architectural Elements for information on sparse and dense data combinations.

Grouping Sparse Member Combinations Together

If you arrange your data source so that records with the same sparse member combinations are consecutively grouped, the data loads more quickly. The order of your fields is irrelevant, so long as the data that is changing from record to record is in the same block as the previous record. Continue this until there is no more data for the block and then change one of the sparse dimensions to access a different block, and so on. Thus, your data source should group all records by block.

The file in Figure 24-1, for illustration purposes, displays its fields such that you can easily see the sparse, non-attribute dimensions on the left and dense dimensions on the right, based on the structure of the Sample Basic database.

Note:

Because you do not load data into attribute dimensions, even though they are set as sparse, they are not relevant to this discussion.

Sparse, non-attribute dimensions:

Scenario
Product
Market

Dense dimensions:

Measures

Year

Figure 24-1: Sample Basic Database Showing Sparse and Dense Dimensions

Scenario   Product   Market   Measures   Year
Actual     Cola      Ohio     Sales      Jan    25

Sort the records in the data source so that records with like values in the sparse dimensions are together. Then specify all the combinations of members in the dense dimensions, before specifying a different member in a sparse dimension. Figure 24-2, for example, sorts the records to put like records together. The values for the Measures dense dimension change first. Hyperion Essbase accesses one block.

Figure 24-2: Sorted Records

Jan
Actual     Cola     Ohio     Sales     25
Actual     Cola     Ohio     Margin    18
Actual     Cola     Ohio     COGS      20
Actual     Cola     Ohio     Profit    5

Figure 24-3, on the other hand, does not sort its records and changes its sparse dimensions before the dense ones. It loads more slowly than Figure 24-2, because Hyperion Essbase accesses four different blocks instead of one.

Figure 24-3: Unsorted Records

Jan
Actual    Cola          Ohio      Sales    100
Budget    "Root Beer"   Florida   Sales    96
Actual    "Root Beer"   Ohio      Sales    145
Budget    Cola          Florida   Sales    85

If you are using a data source that loads more than one cell per record, use the same idea as Figure 24-2, but arrange the data so that the expanded dimension in the record is a dense dimension.

For more information on creating rules files, see Introduction to Rules Files. For more information on dense and sparse dimensions, see Basic Architectural Elements

Why Does This Speed up the Data Load?

Positioning your data based on sparse member combinations in the data source speeds up the data load because of how Hyperion Essbase stores sparse and dense dimensions. All data is stored in blocks. A block contains cells for all possible dense dimension intersections. Hyperion Essbase creates a block offset that points to the intersections in the block where the data is stored. The intersection of the sparse dimensions forms an index entry that points to the block where the data is stored.

So, when you put the sparse member combinations together, Hyperion Essbase uses the index to find the block where the data is stored and loads that block into its cache. Hyperion Essbase then uses the block offset to determine which parts of the block to change and writes to that block multiple times. Because the correct block is in the cache, you do not have to open it each time you write to parts of the block. This reduces the number of physical disk I/O's required, which speeds up your data load.

Positioning Data in the Same Order as the Outline

After you arrange your data source so that the sparse data combinations are together, rearrange it so that sparse dimensions are in the same order as the outline.

Why Does This Speed up the Data Load?

Positioning fields in the data source to match sparse dimensions in the outline speeds up the data load because of the way Hyperion Essbase accesses data using the index. Hyperion Essbase uses the index cache size to determine how much of the index can be paged into memory. Hyperion Essbase pages portions of the index in and out of memory as requested by the data load or other operations. If the data in your data source lists records in the same order as the outline, then less paging of the index occurs, thus reducing the I/O's required.

Note:

If the index cache size is large enough to hold the index in memory, then this method does not speed data loading.

For more information about setting the index cache size, see Using Hyperion Essbase Application Manager for Database Settings. For more information about choosing the size of the index cache, see Estimating Disk and Memory Requirements for a Database.

Loading from the Server

If you load the data source from the server instead of the client, the data loads more quickly. To load a data source from the server, move the data source to the server and then start the load.

Why Does This Speed up the Data Load?

Using the server speeds up the data load because the data does not have to be transported over the network from the client to the server.

Making the Data Source as Small as Possible

Make your data source as small as possible.

If you are using free-form data, set up ranges in the data source. Ranges reduce the number of fields Hyperion Essbase must read before loading data values. Figure 24-4 shows a file that does not use ranges and Figure 24-5 shows the same file optimized to use ranges. Figure 24-4 contains 32 fields that Hyperion Essbase must read in order to load the data values properly.

Figure 24-4: Data Source Without Ranges

Jan     "New York"   Cola          4
Jan     "New York"   "Diet Cola"   3
Jan     Ohio         Cola          8
Jan     Ohio         "Diet Cola"   7
Feb     "New York"   Cola          6
Feb     "New York"   "Diet Cola"   8
Feb     Ohio         Cola          7
Feb     Ohio         "Diet Cola"   9

Figure 24-5 contains only 22 fields that Hyperion Essbase must read in order to load the data values properly. In ranges, the last range cycles the fastest. In Figure 24-5, for example, the first range encountered is Jan and Feb (from the Year dimension), the second range is New York and Ohio (from the Market dimension), and the third range is Cola and Diet Cola (from the Product dimension). The last range encountered, in this case, is Cola and Diet Cola. So Hyperion Essbase assigns the first value, 4, to Jan, New York, Cola. Hyperion Essbase assigns the second value, 3, to Jan, New York, Diet Cola. The third value, 8, is assigned to Jan, Ohio, Cola. Hyperion Essbase continues in this order until the file is loaded.

Figure 24-5: Data Source with Ranges

Jan   "New York"   Cola          4
                   "Diet Cola"   3
      Ohio         Cola          8
                   "Diet Cola"   7
Feb   "New York"   Cola          6
                   "Diet Cola"   8
      Ohio         Cola          7
                   "Diet Cola"   9

Why Does This Speed up the Data Load?

The less there is to read in a data source, the less time it takes Hyperion Essbase to read it. Therefore, as long as a data source is complete, the smaller it is, the faster Hyperion Essbase can read and load it.

Making the Fields as Small as Possible

Make the fields in the data source as small as possible by:

Removing excess white space in the data source.
Rounding off computer-generated numbers to only the precision you need. For example, if the data value has nine decimal points and you only care about two, round the number off.
Using #MI instead of #MISSING.

Why Does This Speed up the Data Load?