QUESTION: DDTS problem-solving and debugging: tools and techniques
ANSWER


Part 2 of 2 (part 1 is Technote 6111)
-------------------------------------------------------------------------

OTHER LOG FILES

There are also several other LOG files in ~ddts/spool:

ADMINLOG - Log of administrative actions such as rebuilding the SQL
                   database, connecting DDTS sites together, etc.

CMLOG    - Log of CM activities, if you are using any of DDTS's
                   integrations to CM systems (such as the integration with
                   ClearCase).

Each of these also has corresponding *.old files, which are managed
by ddtsclean as described above.

-------------------------------------------------------------------------

PROBLEMS WITH DAEMONS

When problems occur involving DDTS's background daemons, as mentioned
above, the main place to look for symptoms is on the LOG. However for
certain conditions DDTS also sends an email message to the DDTS
administrator. Make sure you are on the 'administrator' list (check the
file ~ddts/conf/administrator and make sure your email address is 
listed there; if not, or if there are userids listed there who are no
longer in your organization, use "adminbug mins" to correct the
entries) and check your email from around the time the suspected
problem occurred.

Typically, if a daemon was unable to process a particular transaction,
it will send a message to the administrator, put some messages in the
LOG, and it will move the file for the transaction to a *.lost
directory.

For example, the queue of work for the 'bugs.in' daemon is the group of
files in ~ddts/spool/bugs.in/. If bugs.in is processing a transaction
(a file) and an error is encountered, then after retrying the operation
a couple of times the bugs.in daemon will move the file to 
~ddts/spool/bugs.lost/. Similarly for the other daemons: there is a
directory consisting of its "normal" spool of incoming work, and a
corresponding ".lost" directory where it will move transactions if they
cannot be processed.

Daemon       Spool       "Lost"
-----------  ----------  ----------
bugmail      bugmail     bugmail.lost
bugs.in      bugs.in     bugs.lost
cm.in        cm.in       cm.lost
dadmin.in    dadmin.in   dadmin.lost
ddtsbackend  dbackend    dbackend.lost
ddupd        ddupd       ddupd.lost
mail.in      --          mail.lost
net.out      net.out     net.lost

NOTE: mail.in is a special case. Its queue of incoming work is not a
set of separate files in a directory; rather, it is a system mailbox
(typically /usr/spool/mail/ddts). However for the purposes of this 
discussion it is similar to the others in the sense that it has a
".lost" directory where it puts transactions it cannot handle.

When you see files in the *.lost directories, this indicates that a
daemon was unable to handle the corresponding transaction. You should
check the creation timestamp of the file in *.lost, look in the LOG
from around the same timeframe for additional symptoms, and look at
the email you would have received (for certain situation).

If the information from those sources enables you to solve the problem,
go ahead and do it. If not, then please have all that information 
available when you call DDTS Technical Support.

-------------------------------------------------------------------------

THE WEB ERROR_LOG

In diagnosing sitations with respect to the web interface to DDTS, quite
often it is useful to view the web server's error log file. Different
servers (Apache, Netscape, etc) have different names, locations, and
methods for accessing this file so consult your specific server's
documentation for details.

For the Apache web server, the error log is a file located under the
web server's home directory, in a subdirectory called 'logs', with the
filename 'error_log'. There is also a file called 'access_log' which 
logs accesses to the web server by client browsers.

It is useful to use the "tail -f" trick described above under the
description of the DDTS LOG on the error_log and access_log files too:

  cd DIRECTORY_CONTAINING_MY_WEB_STUFF/logs
  tail -f error_log&
  tail -f access_log&

After running the above commands hit Enter a couple of times to get
some command prompts back (this creates some white space and makes it
visually easier to discern messags that were already there from new
messages), then go to your web browser and create the error situation.

Watch the error_log and access_log as your request is processed. You
may see intelligible error messages, or you may see UNIX-looking 
errors like "permission denied" or "COMMAND: cannot execute". These
are extremely useful in tracking down the real root cause of the
problem.

To take this one step further: in diagnosing problems in WebDDTS if there
are no visible symptoms on the browser and no helpful messages on the
web server's error_log or access_log, then it can be useful to try to
determine where the problem is occurring by modifying the WebDDTS 
programs to produce diagnostic/debugging output. The best way to do that
is to modify the programs and insert statements to write to STDERR
(filehandle 2 in UNIX). In the context of a web application, such output 
is directed to the web server's error log and may be viewed in the
manner described above.

To write to STDERR, if you are working with a bourne shell script you
could insert a statement like:

  >&2 echo "Hello, yes you did in fact get here."

In the perl-based parts of the code, you could insert:

  print STDERR "Hello world.\n";

These examples show only writing static text (which is useful in the
sense that it tells you whether you reached that part of the code or
not) but you can also use any of the normal functions of echo and print,
specifically you can print the values in variables, environmental 
information, etc. (NOTE: in release 4.1, any modifications that you make 
to the perl code will take affect only after the program wt_perl has been 
restarted. To do this cleanly, simply run the command 'webconfigure -noprompt'
as user 'ddts').

-------------------------------------------------------------------------

XDDTS ERROR MESSAGES

For the UNIX-hosted user interface 'xddts', sometimes error
conditions occur which are recognized by the xddts program and presented
in popup dialog boxes. Those messages are vital in solving the problem!

In some cases however, errors occur and these are not trapped or
recognized by xddts, so you may not get an error popup at all or the
error popup may contain only *some* of the information needed to solve
the problem.

In such a case, look at the terminal window from which you invoked
xddts (UNIX calls this the "parent process" or "owning terminal
process"). Useful messages will often be written there.

If there is no owning terminal process (for example, if you started 
xddts and then quit the xterm window, or if you started xddts from an
OpenWindows popup menu) then you should shut down xddts, restart it
from a terminal window, and recreate the error, being careful to note
any messages that appear in the terminal window at the same point in
time when the error occurs in xddts.

-------------------------------------------------------------------------

CUSTOMIZATION PROBLEMS: WHERE TO LOOK?

For issues involving a particular DDTS function on the user interface,
or a problem that occurs for a particular user or group of users, it
can be confusing trying to find out where to look to find symptoms and
correct the problem.

Examples include:
1. I've added a state, and now some of my fields don't show up on
   xddts in certain situations.
2. When I bring up a list of defects in a particular project, after
   building the index window xddts just hangs and I have to kill it.

To solve problems like this, there are two main questions to answer:

1. If the symptom didn't used to exist but now it does, then what
   changed? Finding the answer to that question invariably leads to
   the solution to the problem. You can look at file date/time stamps,
   the ADMINLOG (tells you if any new classes/projects were created
   or removed), talk to your system administrator (they are usually
   not very helpful in this regard until you pin them down with
   evidence; see the next section), etc.

2. Is the problem general to all users of all functions for all defects,
   or is it localized in some way? Specifically, is it localized:
   - by user?
   - by machine?
   - by project
   - by class?
   Answering this question helps isolate whether a system-level factor
   (PATH, filesystem mounting, security permissions) are at work or
   whether the problem is in one of the files you modify in the process
   of customizing DDTS (e.g. master.tmpl, states/statenames, etc...)

To check if a problem is "localized" to a certain factor, vary that factor
and see if the problem changes or disappears. For example, if a particular
problem happens for one UNIX user but not another, then that narrows
the scope of the problem to aspects that are user-specific. That 
includes filesystem permission issues, DDTS-specific security issues,
settings in the users' .ddtsrc or web preferences files, etc.

It would specifically *exclude* system-level factors such as UNIX
workstation configuration, or DDTS-based factors common to the two
users such as master.tmpl, states, oneofs files, etc.

By determining which factors influence the situation you can rule out
certain factors as being relevant. Doing that allows you to focus on
the remaining set of possibly-relevant factors and find the problem.

Here are some specific guidelines:

1. If the problem is class-specific, then all users would be impacted
when doing operations to defects in that class, but would be able to
perform the same operation in a different class. If that's the case,
this indicates the problem is somewhere in the class-specific files
such as master.tmpl, states, statenames, or oneofs.

2. If the problem is project-specific (occurs for one project in a
class but not for another project in the same class) then that further
narrows the scope: the problem is probably caused by the project
definition files (proj.info, proj.control, proj.notify). For example,
the proj.control governs DDTS security access to defects (for view
and for modification). If two projects have different proj.control's
and one has the problem and the other does not, to determine if 
security is a factor change the problematic project's proj.control
to be the same as the non-problematic one. If the behavior stops,
then clearly the problem is with DDTS security.

3. Particularly for problems with the UNIX-based user interfaces like
'xddts', it is important to consider the host UNIX workstation on
which the client program (xddts) is running, the workstation on 
which the program's windows are being viewed (may not be the same,
if you have rlog'ed in to another workstation and set your DISPLAY
back), and the userid you are logged in as. In such a circumstance,
try varying one factor holding the others constant, and gauge the
result.

For example, if a user can't bring up xddts on a particular display,
have the same person go to a different workstation, log in, and
start xddts. If it starts OK then that indicates the user's personal
DDTS file (~/.ddtsrc) is OK but that there is something about the
X server on the first workstation that is not working correctly with
xddts. If the workstation is running an older operating system or
X environment it may need to be upgraded.

On the other hand, if the anomalous behavior "follows" the user when
they move to a different workstation, then clearly the behavior must
have something to do with the user's personal DDTS startup file
(~/.ddtsrc) or X configuration files (~/XDdts, .Xdefaults, .xinitrc,
etc). In that event, try moving these files to a different name
and retrying; if the problem goes away consult the DDTS or UNIX
docs to try to discern the problem with the user's original files.

-------------------------------------------------------------------------

TOOLS FOR DEBUGGING: DEBUG MODE, TRUSS, AND GDB

Particularly when working with system-level problems or problems on
the UNIX-hosted user interfaces, there are a couple of undocumented
flags for the DDTS daemons and client programs, and a couple of UNIX
tools that can be used to gain insight into problems and lead to
solutions.

First, command switches and debug mode:

- The ddtsd daemon can be switched into "debug mode" where it gives
a very verbose accounting of its activities in the DDTS LOG. This 
will show where it is looking to see if any of its subordinate
daemons have any work to do, when it is invoking the daemons, and
other activities like checking for "stale" lock files and for 
backgrounded transactions.

To activate debug mode, get the process ID (PID) of the ddtsd daemon
either from a 'ps' command, from the DDTS LOG startup messages for
ddtsd, or from the file ~ddts/spool/ddtsd.pid. Then, issue the
following:

  kill -USR2 PID

...where PID is the process ID.  Then, watch the DDTS LOG (see above).

To turn debug mode off, send the same signal (it's an on/off toggle).

Several of the daemons can also be started with an undocumented
"-d" option, which instructs them to run in debug mode. In some 
cases debug mode doesn't produce any additional useful output but
in other cases it does. Use this option only at the direction of
DDTS Technical Support.

Several of the DDTS client commands (xddts in particular) have an
optional "-f" switch which tells the command not to do an immediate
fork(). Normally these commands do a fork() so as to disconnect
themselves from the command shell so the user can get the command
prompt back to run addtitional commands (this will usually be
accompanied by a message about "backgrounding..."). In debugging 
there are situations where this behavior is disruptive to the
diagnosis, so it can be turned off by specifying "-f".

For example:

  xddts -f

This is most useful in combination with gdb or truss; see below.

Two favorite UNIX commands for this purpose are truss and gdb:

'truss' is a command only available on Solaris (SunOS has a similar 
command called 'trace' and SGI has a command called 'par' but there is
no version for HP-UX or any of the other UNIX environments on which
DDTS runs). 'truss' runs a command and traces its execution, showing
information about all UNIX system calls it makes. This can be 
extremely enlightening particularly where PATH or security aspects
are in question (the truss output will show specifically where a file
was found, and will expose the details of any permission-denied
problems).

For example, to run the 'findbug' command under truss to see what
files it's reading, you could use a command like:

  truss -o findbug.truss -a -f -rall -wall findbug -p my.project

This command would run 'findbug -p my.project', capturing all the
system-call activity to a file called findbug.truss, showing all
parameters to exec() functions ("-a" operand), following all 
fork()'ed child processes ("-f"), and showing the full contents of
all read and write buffers ("-rall" and "-wall").

See the truss manpage for more information.

gdb, the GNU debugger, is useful to analyze core files or to 
interactively debug programs.

To use gdb on a core file, do the following:

gdb   --- To get into GDB interactive mode
core core   -- To load the core file called 'core' into memory.

Assuming the core file is readable, you will see output showing
the program that produced the core file and some information about
the conditions at the time of error. Note that you will probably
not see complete information about function names and variables 
because the DDTS binaries are distributed with this information
"stripped" (see the 'ld' and 'strip' manpages for details) but just
knowing which binary produced the core file is usually valuable 
to know; if it's a DDTS program, you should try to track down the
cause of the error by looking in the LOG or contacting Support for
help. If it's not a DDTS program, you should contact the program's
vendor for assistance.

..............................................................................................................
This ends technote part 2 of 2 parts.