WebSphere WebSphere Application Server Network Deployment, Version 6.0.x Operating Systems: AIX, HP-UX, Linux, Solaris, Windows

Service integration error types

There are four types of error that can occur in service integration: errors that a messaging engine can recover from while it is running, errors that can be resolved by an automatic restart of the messaging engine, errors that require the user's intervention and errors that are not detectable in the messaging engine.

Errors that a messaging engine can recover from while it is running

These recoverable errors can be rectified by the system without restarting or failing over the messaging engine. In this situation the system automatically takes actions to rectify the error. The system also adds an entry to the system error log which will provide an explanation of the error and suggest any actions that the user should take. The messaging engine continues to run and to honor the quality of service specified for the messages it is processing.

Errors that can be resolved by an automatic restart of the messaging engine (local errors)

A local error can be resolved by restarting the messaging engine, either on its current server or on an alternative server. For example, if a messaging engine cannot connect to its data store it may be that the server in which it is running cannot create a connection. However another server in the same cluster may still have access. The HAManager will failover the messaging engine and shut down the server on which it was running. If the type of deployment that has been configured does not have failover capability, for example if there is only one server rather than a cluster, the server is shut down and the messaging engine is restarted only after the server is restarted.

Errors that require the user's intervention (global errors)

A global error cannot be fixed by restarting or failing over the messaging engine. For example, if a messaging engine's data store becomes corrupted, the messaging engine will be incapable of running on a different server because it will encounter the same problem. If a messaging engine in this situation were to be failed over, the messaging engine would be continually failed over because it would not be able to run in any server. This would cause unwanted disruption to the cluster as servers attempted to run the messaging engine and were shut down. To avoid such a situation, if a global error is encountered, the messaging engine logs an error, stops processing messages, and is not failed over. The messaging engine cannot be restarted until you have corrected the global error condition and restarted the server.

Error not detectable by the messaging engine

Errors such as a thread spinning (when the thread becomes trapped in a tight loop and no longer performs useful work), or a deadlock (when two threads are blocking each other), may only be detectable by explicit health monitoring. The HAManager provides such monitoring, and periodically tests the health of the messaging engine. If the HAManager detects that the messaging engine is not able to run properly then the HAManager shuts down the server which is hosting the messaging engine. If the server was in a cluster the messaging engine will be restarted on an alternative server, if its policy allows. The shut down server will be restarted by the node agent. If the server was not in a cluster the server must be restarted, then the messaging engine will restart on that server.
Related tasks
Injecting failures into a high availability system

Concept topic

Terms of Use | Feedback

Last updated: 5 Oct 2005
http://publib.boulder.ibm.com/infocenter/ws60help/index.jsp?topic=/com.ibm.websphere.pmc.nd.doc\concepts\cjt0004_.html

© Copyright IBM Corporation 2004, 2005. All Rights Reserved.
This information center is powered by Eclipse technology. (http://www.eclipse.org)