If an application server dies (its process spontaneously closes), or freezes
(its Web modules stop responding to new requests):
- Isolate the problem by installing Web modules on different servers, if
possible.
- Read the Monitoring performance with Tivoli
performance viewer (formerly resource analyzer) topic. You can use
the performance viewer to determine which resources have reached their maximum
capacity, such as Java heap memory (indicating a possible memory leak) and
database connections. If a particular resource appears to have reached its
maximum capacity, review the application code for a possible cause:
- If database connections are used and never freed, ensure that application
code performs a close() on any opened Connection object within
a finally{} block.
- If there is a steady increase in servlet engine threads in use, review
application synchronized code blocks for possible deadlock conditions.
- If there is a steady increase in a JVM heap size, review application code
for memory leak opportunities, such as static (class-level) collections, that
can cause objects to never get garbage-collected.
- As an alternative
to using the performance viewer to detect
memory leak problems, enable verbose garbage collection on the application
server. This feature adds detailed statements to the JVM error log file of
the application server about the amount of available and in-use memory. To
set up verbose garbage collection:
- Select Servers > Application Servers > server_name > Process
Definition > Java Virtual Machine, and enable Verbose Garbage Collection.
- Stop and restart the application server.
Periodically, or after the application server stops, browse the log
file for garbage collection statements. Look for statements beginning with
"allocation failure". The string indicates that a need for memory allocation
has triggered a JVM garbage collection (freeing of unused memory). Allocation
failures themselves are normal and not necessarily indicative of a problem.
The allocation failure statement is followed by statements showing how many
bytes are needed and how many are allocated.
If there is a steady increase
in the total amount of free and used memory (the JVM keeps allocating more
memory for itself), or if the JVM becomes unable to allocate as much memory
as it needs (indicated by the bytes needed statement), there might be a memory
leak.
- If either
the performance viewer or verbose garbage collection output indicates that
the application server is running out of memory, one of the following problems
might be present:
- There is a memory
leak in application code that you must address. To pinpoint the cause of a
memory leak, enable the RunHProf function in the Servers > Application Servers > server_name > Process Definition
> Java Virtual Machine pane of the problem application server:
- In the same JVM pane, set the HProf Arguments field to a value
similar to depth=20,file=heapdmp.txt. This value shows exception
stacks to a maximum of 20 levels, and saves the heapdump output to the install_root/bin/heapdmp.txt file.
- Save the settings.
- Stop and restart the application server.
- Reenact the scenario or access the resource that causes the hang or crash,
if possible. Stop the application server. If this is not possible, wait until
the hang or crash happens again and stop the application server.
- Examine the file into which the heapdump was saved. For example, examine
the install_root/bin/heapdmp.txt file:
- Search for the string, "SITES BEGIN". This finds the location of a list
of Java objects in memory, which shows the amount of memory allocated to the
objects.
- The list of Java objects occurs each time there was a memory allocation
in the JVM. There is a record of what type of object the memory instantiated
and an identifier of a trace stack, listed elsewhere in the dump, that shows
the Java method that made the allocation.
- The list of Java object is in descending order by number of bytes allocated.
Depending on the nature of the leak, the problem class should show up near
the top of the list, but this is not always the case. Look throughout the
list for large amounts of memory or frequent instances of the same class being
instantiated. In the latter case, use the ID in the trace stack column
to identify allocations occurring repeatedly in the same class and method.
- Examine the source code indicated in the related trace stacks for the
possibility of memory leaks.
- The default maximum heap size of the application
server needs to be increased.
- There
is a defect in the WebSphere Application Server product that you must either
report, or correct by installing
a fix or fix pack from a maintenance download. Contact IBM support.
- If an application
server spontaneously dies, look for a Java thread dump file. The JVM creates
the file in the product directory structure, with a name like javacore[number].txt.
- Force an application
to create a thread dump (or javacore). Here is the process for forcing a thread
dump, which is different from the process in earlier releases of the product:
- Using the wsadmin command prompt, get a handle to the problem application
server: wsadmin>set jvm [$AdminControl completeObjectName type=JVM,process=server1,*]
- Generate the thread dump: wsadmin>$AdminControl invoke $jvm dumpThreads.
- Look for an output file in the installation root directory with a name
like javacore.date.time.id.txt.
- Browse the thread
dump for clues:
- If the JVM creates the thread dump as it closes (the thread dump is not
manually forced), there might be "error" or "exception information" strings
at the beginning of the file. These strings indicate the thread that caused
the application server to die.
- The thread dump contains a snapshot of each thread in the process, starting
in the section labeled "Full thread dump."
- Look for threads with a description that contains "state:R". Such threads
are active and running when the dump is forced, or the process exited.
- Look for multiple threads in the same Java application code source location.
Multiple threads from the same location might indicate a deadlock condition
(multiple threads waiting on a monitor) or an infinite loop, and help identify
the application code with the problem.
If these steps
do not fix your problem, search to see if the problem is known and documented,
using the methods identified in the available
online support (hints and tips, technotes, and fixes) topic. If you
find that your problem is not known, contact
IBM support to report it.
For current information available from IBM Support on known problems and
their resolution, see the IBM Support page.
IBM Support has documents that can save you time gathering information
needed to resolve this problem. Before opening a PMR, see the IBM Support page.