There are four types of error that can occur in service integration:
errors that a messaging engine can
recover from while it is running, errors that can be resolved by an automatic
restart of the messaging engine,
errors that require the user's intervention and errors that are not detectable
in the messaging engine.
Errors that a messaging engine can
recover from while it is running
These recoverable errors can be
rectified by the system without restarting or failing over the messaging engine.
In this situation the system automatically takes actions to rectify the error.
The system also adds an entry to the system error log which will provide an
explanation of the error and suggest any actions that the user should take.
The messaging engine continues to run and to honor the quality of service
specified for the messages it is processing.
Errors that can be resolved by an automatic restart of the messaging engine (local
errors)
A local error can be resolved by restarting the messaging engine,
either on its current server or on an alternative server. For example, if
a messaging engine cannot connect
to its data store it may be that the server in which it is running cannot
create a connection. However another server in the same cluster may still
have access. The HAManager will
failover the messaging engine and
shut down the server on which it was running. If the type of deployment that
has been configured does not have failover capability, for example if there
is only one server rather than a cluster, the server is shut down and the messaging engine is restarted only after
the server is restarted.
Errors that require the user's intervention (global errors)
A
global error cannot be fixed by restarting or failing over the messaging engine.
For example, if a messaging engine's
data store becomes corrupted, the messaging engine will
be incapable of running on a different server because it will encounter the
same problem. If a messaging engine in
this situation were to be failed over, the messaging engine would
be continually failed over because it would not be able to run in any server.
This would cause unwanted disruption to the cluster as servers attempted to
run the messaging engine and were shut down. To avoid such a situation, if
a global error is encountered, the messaging engine logs
an error, stops processing messages, and is not failed over. The messaging engine cannot
be restarted until you have corrected the global error condition and restarted
the server.
Error not detectable by the messaging engine
Errors
such as a thread spinning (when the thread becomes trapped in a tight loop
and no longer performs useful work), or a deadlock (when two threads are blocking
each other), may only be detectable by explicit health monitoring. The HAManager provides such monitoring,
and periodically tests the health of the messaging engine.
If the HAManager detects that
the messaging engine is not able
to run properly then the HAManager shuts
down the server which is hosting the messaging engine.
If the server was in a cluster the messaging engine will
be restarted on an alternative server, if its policy allows. The shut down
server will be restarted by the node agent. If the server was not in a cluster
the server must be restarted, then the messaging engine will
restart on that server.