Change Data Capture

Data needs to be extracted periodically from the source system(s) and transformed to the data warehouse. This process is commonly referred to as refreshing the data warehouse. The most efficient refresh method is to extract and transform only the data that has changed since the last extraction. Change Data Capture identifies and processes only the data that has changed in each of the tables in a database and makes the changed data available to the Data Warehouse. BIA Reporting has been designed with the intention that the refresh will take place on a nightly basis. However, the implementation is flexible and it is possible to run the refresh at a different frequency.

BIA Reporting 'Change Data Capture' techniques include using a control table which stores a last written date for each table that is being populated. When an ETL runs, the last written field for that table is also updated. The next time the ETL runs, it first reads from this control table and then extracts the data that has been updated since the previous ETL run.

It is important to note that for change data capture to work in BIA Reporting all the last written fields must be populated in the source tables that the reporting solution extract data from.

There are three control tables provided with BIA Reporting. Each of the control tables contain a list of all the tables that are populated in that database:

As already stated, a row in the ETL Control table is updated before and after every ETL run for the table which is being updated. This works by the ETL's calling a pre-mapping transformation to read the previous last written date and setting the extract time. After the ETL has run, a post-mapping transformation is called which updates the last written date to the current date. This functionality is not supported by the ETL tool. These transformations are custom BIA Reporting transformations written in java called from the ETL tool.

After the ETL Control table has been initially populated with data (see next section) the last written date is reset to a start date to ensure that the ETL's extract all data updated after this date. The developer can manually set the last written date or use the resetetl.XXX build command (where x is staging, central, or datamarts) which resets the last written date for all tables in that database to the 1st of January, 1934.