Daylight saving time and network time servers

Daylight saving time and network time servers both affect the current time on a server machine. In data movement services, timestamps are used to determine when the servers check for new data and when they perform the data movement. Changes to the system clock affect the data services components and their perceived performance.

Manually moving the system clock also affects the time. The behavior of the system is affected in the same way, regardless of the method used to move the clock. Undesired delays to data movement services occur when the system clock is moved into the past. Delays do not occur when the system clock is moved forward, although the data movement services components will reach scheduled intervals sooner.

Effect of changing times on components

If the system clock is moved backward in time, the affected components will have their internal timestamps set into the future, but the system clock set into the past. The system clock, these internal timestamps, and the parameters indicating intervals will all be affected by the system clock change.

If the system clock is moved forward in time, then the affected components will have internal timestamps set in the past and the current timestamp of the system clock set into the future. This is similar to the normal behavior of the system clock; the difference is that some time measurements that include determining the next scheduled run, or determining the life cycle of a database record will be affected. The actual elapsed time will be less then the calculated elapsed time.

Example: A record is to remain on the system for 24 hours after it becomes eligible for deletion. Assume that a record becomes eligible for deletion at 3 p.m., on Tuesday. At this point, due to the 24 hour retention policy, the record may be deleted any time after 3 p.m., on Wednesday. Also assume that at 3:01 p.m., on Tuesday the system clock is set forward to 3:01 p.m., on the following Friday. Accordingly, even though only 1 minute has passed in actual elapsed time, the system elapsed time will be 3 days and 1 minute, and the 24 hour retention period for that record will have been satisfied. This means that due to the time change, the record can be removed quicker than if the system clock had not been changed.

Another example: A particular ETL component runs every 24 hours, and immediately after it runs, the system clock jumps ahead 26 hours. Instead of waiting the prerequisite 24 hours in actual time prior to its next run, the ETL Component will start again as soon as possible. This is due to the system calculation that at least 26 hours has passed since the last run. In effect, moving the clock forward has decreased the ETL interval in this case. The ETL Component will resume as normal after this run to the defined interval as long as no other changes to the system clock are made.

The following sections focus only on moving the clock backwards after the components have been running before the clock was turned back.

The Capture component will continue to capture changes to any and all tables that it is tracking. A time calculation is used to determine when to commit those changes and make them available to the Apply Component and the Source Life Cycle component. The Capture component has most probably internally marked a timestamp in the future. It will not commit the transaction until the internal clock is greater than the internal timestamp marked, plus some internal parameter for the interval between commits. In this case, if the system clock moved backward 1 hour, then the worst outcome is that any transactions occurring in the new hour will not be available until after that hour passes. If the clock was set to 1 year in the past, then it would take 1 year for the system to catch up.
Note: The Capture component commits after a prescribed number of transactions; the default is 120. It is possible that data will be available to the Apply and Source Life Cycle components earlier than defined.

The Apply component also maintains an internal timestamp that determines when it will check for new records. In this scenario, this internal timestamp will be greater than the system current timestamp. The Apply component waits until the system timestamp catches up to its internal timestamp, even if new records are available. Once the current timestamp has caught up, then it will start looking for records that are available for data movement.

The timestamps do not determine which rows are to be replicated. That is determined by an internal value that is not affected by the system clock. The Source and Target Life Cycle components also use timestamps to determine when to start and what records are ready to be pruned.

The Source Life Cycle component in the State to Runtime data movement service only uses timestamps to determine when to start. It does not use timestamps to determine what to prune. This component on this service does not support the retention policy feature that allows information eligible for pruning to exist for a certain retention policy period. However, this feature does exist in the Runtime to Historical data movement service Source Life Cycle component. Some records do not meet the retention policy criteria until the current system clock catches up. The Target Life Cycle components on both data movement services support the retention policy definition so any change in time affects when they run and what they prune.

The ETL Components use timestamps as part of their internal scheduling. Once started, these components expect time to be increasing. When the system clock moves into the past, the ETL scheduling is affected, and no ETL is performed until the system catches up.

The possible scenarios of system clock time are:

Recovery

Recovery during a change made by a time server should not be necessary because the time differentials should be very small -- just a few minutes. The effect will be a small time frame where nothing happens while the components catch up. During a change in time, because of the change to daylight saving time, going back in time causes the components to stop replicating for an hour, after which the components will need to catch up with the system. Whether this is a problem that depends upon the system.

One scenario where this wait could be significant is when a mistake is made and the system time is set forward a long time in the future while the component servers are running. Then (regardless of whether the servers are running) the time is restored to the current time. In this case, the components would have set their internal timestamps into the future, but will be running in the current time frame. There will be a long delay before the data movement service processes any rows again. This delay could cause a build up in the system that would affect recovery time. The administrator may have to take a corrective action.

Corrective action

One option is to simply force the Capture and Apply components to initiate a full refresh, and then to update the internal timestamps for the Source Life Cycle, Target Life Cycle, and the ETL components.
  1. Identify all the databases that are running on the server where the time was moved into the future and then pushed back into the past. There are two possibilities: State and Runtime or Runtime and Historical.
  2. Stop all the servers supporting the data movement services on the affected system. During this process, you will be modifying internal parameters, and some components may be out of synchronization if they are allowed to run. For more information, refer to Starting and stopping a data movement service.
  3. Modify the internal timestamps of the Source Life Cycle components and the Target Life Cycle components.
    Note: This action is subject to change from release to release.
    1. Updating Source Life Cycle pruning timestamps. This will modify the settings for all Source Life Cycle components serving all business measures models on the system.
      Verify the current settings:
      connect to <source_database>
      SELECT PC.TABLE_NAME, PC.RETENTION_IN_MINUTES, PC.LAST_PRUNED, PC.PRUNE_INTERVAL, CURRENT TIMESTAMP as "CURRENT TIMESTAMP"
      FROM WBIRMADM.RMPRUNECTRL PC
      WHERE PC.TABLE_NAME LIKE 'APP%'
      Note: Review the values of LAST_PRUNED and PRUNE_INTERVAL and CURRENT TIMESTAMP. Decide if you want to prune immediately or on the next interval.
      -- TO RUN As soon as possible.
      UPDATE WBIRMADM.RMPRUNECTRL SET (LAST_PRUNED)=(CURRENT TIMESTAMP - PRUNE_INTERVAL MINUTES)
      WHERE TABLE_NAME LIKE 'APP%';
      
      -- TO RUN at the next interval
      UPDATE WBIRMADM.RMPRUNECTRL SET (LAST_PRUNED)=(current timestamp)
      WHERE TABLE_NAME LIKE 'APP%';
      connect reset;
    2. Updating Target Life Cycle pruning timestamps.
      CONNECT TO <target_database>
      SELECT PC.TABLE_NAME, PC.RETENTION_IN_MINUTES, PC.LAST_PRUNED, PC.PRUNE_INTERVAL,
      CURRENT TIMESTAMP as "CURRENT TIMESTAMP"
      FROM WBIRMADM.RMPRUNECTRL PC
      WHERE PC.TABLE_NAME NOT LIKE 'APP%';
      Note: Review the values of LAST_PRUNED and PRUNE_INTERVAL and CURRENT TIMESTAMP. Decide if it is desirable to prune immediately, or on the next interval.
      -- TO RUN As soon as possible.
      UPDATE WBIRMADM.RMPRUNECTRL SET (LAST_PRUNED)=(CURRENT TIMESTAMP - PRUNE_INTERVAL MINUTES)
      WHERE TABLE_NAME NOT LIKE 'APP%';
      
      -- TO RUN at the next interval
      UPDATE WBIRMADM.RMPRUNECTRL SET (LAST_PRUNED)=(current timestamp)
      WHERE TABLE_NAME NOT LIKE 'APP%';
    3. Updating the ETL Schedule.
      Note: This will affect all ETL activities across all models.
      CONNECT TO <TARGET>
      -- This query shows 
      SELECT TARGETTABLE, TGT_RM_SPETL_NAME, ETL_0_MINUTES, NEXTSTARTTIME, LASTUPDATED,
      CURRENT TIMESTAMP as "CURRENT TIMESTAMP" FROM WBIRMADM.RMCONTROL;
      Note: The ETL_0_MINUTES, the NEXTSTARTTIME and LASTUPDATED values as compared to the CURRENT TIMESTAMP.
      -- TO Run at the next interval
      UPDATE WBIRMADM.RMCONTROL SET (NEXTSTARTTIME, LASTUPDATED)=
      (CURRENT TIMESTAMP + ETL_0_MINUTES MINUTES, CURRENT TIMESTAMP);
      
      -- To Run as soon as possible
      UPDATE WBIRMADM.RMCONTROL SET (NEXTSTARTTIME, LASTUPDATED)=
      (CURRENT TIMESTAMP,CURRENT TIMESTAMP-ETL_0_MINUTES MINUTES);
      
      CONNECT RESET
    4. Force a full refresh. The simplest way to force a full refresh of the replication Capture and Apply servers is to copy and modify the StartCapture scripts for every business measures model. Find every start capture script for every model deployed on the system (If you have followed the instructions found in the consolidating start and stop scripts section, then simply find each asncap command), and add the parameter: STARTMODE=COLD to the end of the command line.
      Note: A full refresh is an extreme case and can lead to poor performance until the full refresh is finished. This is due to the fact that full refresh adds extra overhead that is not normally present during normal Data Services Operations. So it is important that full refresh is done during a time when the system is not under heavy use. Full Refresh performance depends heavily on the size of the data in the source database of a Data Movement Service.

      Example:

      Before:
      db2cmd asncap CAPTURE_SERVER=STATE CAPTURE_SCHEMA=CAPTURE_1 CAPTURE_PATH="."
      After:
      db2cmd asncap CAPTURE_SERVER=STATE CAPTURE_SCHEMA=CAPTURE_1 CAPTURE_PATH="." STARTMODE=COLD
      Then start all of the scripts. This full refresh will cause the Capture and Apply components to reset all of their internal timestamps, but will incur the extra cost of moving and reprocessing the data. It is important to plan for potential performance decreases while the system catches up.

Copyright IBM Corporation 2005. All Rights Reserved.