Think about recoverability as early as possible during the application design stages. This topic covers a number of aspects of design planning to consider.
For ease of presentation, the following questions assume a single application.
Question 1: Does the application update data in the system? If the application is to perform no updating (that is, it is an inquiry-only application), recovery and restart functions are not needed within CICS®. (But you should take backup copies of non-updated data sets in case they become unreadable.) The remaining questions assume that the application does perform updates.
Question 2: Does this application update data sets that other online applications access? If yes, does the business require updates to be made online, and then to be immediately available to other applications--that is, as soon as the application has made them? This could be a requirement in an online order entry system where it is vital for inventory data sets7 to be as up-to-date as possible for use by other applications at all times.
Alternatively, can updates be stored temporarily and used to modify the data set(s) later--perhaps using offline batch programs? This might be acceptable for an application that records only data not needed immediately by other applications.
Question 3: Does this application update data sets that batch applications access? If yes, establish whether the batch applications are to access the data sets concurrently with the online applications.If accesses made by the batch applications are limited to read-only, the data sets can be shared between online and batch applications, although read integrity may not be guaranteed. If you intend to update data sets concurrently from both online and batch applications, consider using DL/I or DB2®, which ensure both read and write integrity.
Question 4: Does the application access any confidential data? Files that contain confidential data, and the applications having access to those files, must be clearly identified at this stage. You may need to ensure that only authorized users may access confidential data when service is resumed after a failure, by asking for re-identification in a sign-on message.
Question 5: If a data set becomes unusable, should all applications be terminated while recovery is performed? If degraded service to any application must be preserved while recovery of the data set takes place, you will need to include procedures to do this.
Question 6: Which of the files to be updated are to be regarded as vital? Identify any files that are so vital to the business that they must always be recoverable.
Question 7: How important is data integrity, compared to availability? Consider how long the business can afford to wait for a record that is locked, and weigh this against the risks to data integrity if the normal resynchronization process is overridden.
The acceptable waiting time will vary depending on the value of the data, and the number of users whom you expect to be affected. If the data is very valuable or infrequently accessed, the acceptable waiting time will be longer than if the data is of low value or accessed by many business-critical processes.
Question 8: How long can the business tolerate being unable to use the application in the event of a failure? Indicate (approximately) the maximum time that the business can allow the system to be out of service after a failure. Is it minutes or hours? The time allowed may have to be negotiated according to the types of failure and the ways in which the business can continue without the online application.
Question 9: How is the user to continue or restart entering data after a failure? This is an important part of a recovery requirements statement because it can affect the amount of programming required. The terminal user’s restart procedure will depend largely on what is feasible--for example:
Such factors define the point where the user restarts work. This could be at a point that is as close as possible to the point reached before the system failure. The best point could be determined with the aid of a progress transaction8). Or it could be at some point earlier in the application--even at the start of the transaction.
These considerations should be in the external design statement.
Question 10: During what periods of the day do users expect online applications to be available? This is an important consideration when applications (online and batch) require so much of the available computer time that difficulties can arise in scheduling precautionary work for recovery (taking backup copies, for example). See The RLS quiesce and unquiesce functions.
After considering the above questions, produce a formal statement of application and recovery requirements. Before any design or programming work begins, all interested parties should agree on the statement--including:
Decide how the user is to restart work on the application after a system failure. Points to consider are:
When designing the user’s restart procedure (including the progress transaction, if used) include precautions to ensure that each input data item is processed once only.
Decide how application work might continue in the event of a prolonged failure of the system. For example, for an order-entry application, it might be practical (for a limited time) to continue taking orders offline--by manual methods. If you plan such an approach, specify how the offline data is to be subsequently entered into the system; it might be necessary to provide a catch-up function.
For each application, specify the type of terminal the user is to work with.
Decide if you will provide special procedures to overcome communication problems; for example:
Decide the security procedures for an emergency restart or a break in communications. For example, when confidential data is at risk, specify that the users should sign on again and have their passwords rechecked.
Bear in mind the security requirements when a user needs to use an alternative terminal if a failure is confined to one terminal (or to a few terminals).