If the workload management component is not properly distributing the workload
across servers in multi-node configuration, use the following options to isolate
the problem.
Eliminate environment or configuration issues
Determine
if the servers are capable of serving the applications for which they have
been enabled. Identify the cluster that has the problem.
- Are there network connection problems with the members of the cluster
or the administrative servers, for example deployment manager or node agents?
- If so, ping the machines to ensure that they are properly connected
to the network.
- Is there other activity on the machines where the servers are installed
that is impacting the servers ability to service a request? For example, check
the processor utilization as measured by the task manager, processor ID, or
some other outside tool to see if:
- It is not what is expected, or is erratic rather than constant.
- It shows that a newly added, installed, or upgraded member of the cluster
is not being utilized.
- Are all of the application servers you started on each node running, or are some stopped?
- Are the applications installed
and operating?
- If the problem relates to distributing workload across container-managed
persistence (CMP) or bean-managed persistence (BMP) enterprise beans, have
you configured the supporting JDBC providers and Data sources on each server? For problems relating to data access,
review the topic Cannot access a data source.
If you are experiencing workload management problems related to
HTTP requests, such as HTTP requests not being served by all members of the
cluster, be aware that the HTTP plug-in balances the load across all servers
that are defined in the PrimaryServers list if affinity has not been established.
If you do not have a PrimaryServers list defined then the plug-in load balances
across all servers that are defined in the cluster if affinity has not been
established. If affinity has been established, the plug-in should go directly
to that server for all requests.
For workload management problems relating
to enterprise bean requests, such as enterprise bean requests not getting
served by all members of a cluster:
- Are the weights set to the allowed values?
- For the cluster in question, log onto the administrative console and:
- Select Servers > Clusters.
- Select your cluster from the list.
- Select Cluster members.
- For each server in the cluster, click on server_name and note the
assigned weight of the server.
- Ensure that the weights are within the valid range of 0-20. If a server
has a weight of 0, no requests are routed to it. Weights greater than 20 are
treated as 0.
Browse
log files for WLM errors and CORBA minor codes
If you still encounter
problems with enterprise bean workload management, the next step is to check
the activity log for entries that show:
- A server that has been marked unusable more than once and remains unusable.
- All servers in a cluster have been marked bad and remain unusable.
- A Location Service Daemon (LSD) has been marked unusable more than once
and remains unusable.
If any of these warning are encountered, follow the user response
given in the log. If, after following the user response, the warnings persist,
look at any other errors and warnings in the Log Analyzer on the affected
servers to look for:
- A possible user response, such as changing a configuration setting.
- Base class exceptions that might indicate a WebSphere Application Server
defect.
You may also see exceptions with "CORBA" as part of the exception
name, since WLM uses CORBA (Common Object Request Broker Architecture) to
communicate between processes. Look for a statement in the exception stack
specifying a "minor code". These codes denote the specific reason a CORBA
call or response could not complete. WLM minor codes fall in range of 0x4921040
- 0x492104F. For an explanation of minor codes related to WLM, see the Reference: Generated API documentation for
the package and class com.ibm.websphere.wlm.WsCorbaMinorCodes.
Analyze
PMI data
The purpose for analyzing the PMI data is to understand
the workload arriving for each member of a cluster. The data for any one
member of the cluster is only useful within the context of the data of all
the members of the cluster. To
obtain PMI data for all members of a cluster, see Performance Monitoring Infrastructure (PMI).
Use the Monitoring performance with Tivoli Performance Viewer (TPV) to verify that, based
on the weights assigned to the cluster members (the steady-state weights),
each server is getting the correct proportion of the requests.
To turn
on PMI metrics using the Tivoli Performance Viewer:
- Select Data Collection in the tree view. Servers that do not have
PMI enabled will be grayed out.
- For each server that data you wish to collect data on, click Specify...
- You can now enable the metrics. Set the monitoring level to low on
the Performance Monitoring Setting panel
- Click OK
- You must hit Apply for the changes you have made to be saved.
WLM PMI metrics can be viewed on a server by server basis. In
the Tivoli Performance Viewer select Node > Server >WorkloadManagement
>Server/Client. By default the data is shown in raw form in a table, collected
every 10 seconds, as an aggregate number. You can also choose to see the data
as a delta or rate, add or remove columns, clear the buffer, reset the metrics
to zero, and change the collection rate and buffer size.
After you have obtained the PMI data, you should calculate
the percentage of numIncomingRequests for each member of the cluster to the
total of the numIncomingRequests of all members of the cluster. A comparison
of this percentage value to the percentage of weights directed to each member
of the cluster provides an initial look at the balance of the workload directed
to each member of a cluster.
In
addition to the numIncomingRequests two other metrics show how work is balanced
between the members of a cluster, numincomingStrongAffinityRequests and numIncomingNonWLMObjectRequests.
These two metrics show the number of requests directed to a specific member
of a cluster that could only be serviced by that member.
For example, consider a 3-server cluster. The following weights
are assigned to each of these three servers:
- Server1 = 5
- Server2 = 3
- Server3 = 2
Allow our cluster of
servers to start servicing requests, and wait for the system to reach a steady
state, that is the number of incoming requests to the cluster equals the number
of responses from the servers. In such a situation, we would expect that the
percentage of requests routed to each server to be:
- % routed to Server1 = weight1 / (weight1+weight2+weight3) = 5/10 or 50%
- % routed to Server2 = weight2 / (weight1+weight2+weight3) = 3/10 or 30%
- % routed to Server3 = weight3 / (weight1+weight2+weight3) = 2/10 or 20%
Now let us consider a
case where there are no incoming requests with neither strong affinity nor
any non-WLM object requests.
In
this scenario, let us assume that the PMI metrics gathered show the number
of incoming requests for each server are:
- numIncomingRequestsServer1 = 390
- numIncomingRequestsServer2 = 237
- numIncomingRequestsServer3 = 157
Thus, the total number
of requests coming into the cluster is: numIncomingRequestsCluster = numIncomingRequestsServer1
+ numIncomingRequestsServer2 + numIncomingRequestsServer3 = 784
numincomingStrongAffinityRequests = 0
numIncomingNonWLMObjectRequests = 0
Can we decide based on this data if WLM is properly balancing
the incoming requests among the servers in our cluster? Since there are no
requests with strong affinity, the question we need to answer is, are the
requests in the ratios we expect based on the assigned weights? The computation
to answer that question is straightforward:
- % (actual) routed to Server1 = 390 / 784 = 49.8%
- % (actual) routed to Server2 = 237 / 784 = 30.2%
- % (actual) routed to Server3 = 157 / 784 = 20.0%
So WLM is behaving as designed, as the data are completely what is expected,
based on the weights assigned the servers.
Now
let us consider a 3-server cluster. We have assigned the following weights
to each of these three servers:
- Server1 = 5
- Server2 = 3
- Server3 = 2
Allow this cluster of
servers to start servicing requests and wait for the system to reach a steady
state, that is the number of incoming requests to the cluster equals the number
of responses from the servers. In such a situation, we would expect that the
percentage of requests that are routed to Server1-3 would be:
- % routed to Server1 = weight1 / (weight1+weight2+weight3) = 5/15 or 1/3
of the requests.
- % routed to Server2 = weight2 / (weight1+weight2+weight3) = 5/15 or 1/3
of the requests.
- % routed to Server3 = weight3 / (weight1+weight2+weight3) = 5/15 or 1/3
of the requests.
In this scenario, let
us assume that the PMI metrics gathered show the number of incoming requests
for each server are:
- numIncomingRequestsServer1 = 1236
- numIncomingRequestsServer2 = 1225
- numIncomingRequestsServer3 = 1230
Thus, the total number
of requests coming into the cluster:
- numIncomingRequestsCluster = numIncomingRequestsServer1 + numIncomingRequestsServer2
+ numIncomingRequestsServer3 = 3691
- numincomingStrongAffinityRequests = 445, and that all 445 requests are
aimed at Server1.
- numIncomingNonWLMObjectRequests = 0.
In this case, we see
that the number of requests was not evenly split among the three servers,
as expected. Instead, the distribution is:
- % (actual) routed to Server1 = 1236 / 3691= 33.49%
- % (actual) routed to Server2 = 1225 / 3691= 33.19%
- % (actual) routed to Server3 = 1230 / 3691= 33.32%
However, the correct
interpretation of this data is the routing of requests is not perfectly balanced
because Server1 had several hundred strong affinity requests. WLM attempts
to compensate for strong affinity requests directed to 1 or more servers by
distributing new incoming requests preferentially to servers that are not
participating in transactional affinity, to compensate for those servers that
are participating in transactions. In the case of incoming requests with strong
affinity and non-WLM object requests, the analysis would be analogous to this
case.
If, after you
have analyzed the PMI data and accounted for transactional affinity and non-WLM
object requests, the percentage of actual incoming requests to servers in
a cluster to do not reflect the assigned weights, this indicates that requests
are not being properly balanced. If this is the case, it is recommended that
you repeat the steps described above for eliminating environment and configuration
issues and browsing log files before proceeding.
For current information available from IBM Support on known problems and
their resolution, see the IBM Support page.
IBM Support has documents that can save you time gathering information
needed to resolve this problem. Before opening a PMR, see the IBM Support page.