QUESTION: DDTS problem-solving and debugging: tools and techniques ANSWER Part 2 of 2 (part 1 is Technote 6111) ------------------------------------------------------------------------- OTHER LOG FILES There are also several other LOG files in ~ddts/spool: ADMINLOG - Log of administrative actions such as rebuilding the SQL database, connecting DDTS sites together, etc. CMLOG - Log of CM activities, if you are using any of DDTS's integrations to CM systems (such as the integration with ClearCase). Each of these also has corresponding *.old files, which are managed by ddtsclean as described above. ------------------------------------------------------------------------- PROBLEMS WITH DAEMONS When problems occur involving DDTS's background daemons, as mentioned above, the main place to look for symptoms is on the LOG. However for certain conditions DDTS also sends an email message to the DDTS administrator. Make sure you are on the 'administrator' list (check the file ~ddts/conf/administrator and make sure your email address is listed there; if not, or if there are userids listed there who are no longer in your organization, use "adminbug mins" to correct the entries) and check your email from around the time the suspected problem occurred. Typically, if a daemon was unable to process a particular transaction, it will send a message to the administrator, put some messages in the LOG, and it will move the file for the transaction to a *.lost directory. For example, the queue of work for the 'bugs.in' daemon is the group of files in ~ddts/spool/bugs.in/. If bugs.in is processing a transaction (a file) and an error is encountered, then after retrying the operation a couple of times the bugs.in daemon will move the file to ~ddts/spool/bugs.lost/. Similarly for the other daemons: there is a directory consisting of its "normal" spool of incoming work, and a corresponding ".lost" directory where it will move transactions if they cannot be processed. Daemon Spool "Lost" ----------- ---------- ---------- bugmail bugmail bugmail.lost bugs.in bugs.in bugs.lost cm.in cm.in cm.lost dadmin.in dadmin.in dadmin.lost ddtsbackend dbackend dbackend.lost ddupd ddupd ddupd.lost mail.in -- mail.lost net.out net.out net.lost NOTE: mail.in is a special case. Its queue of incoming work is not a set of separate files in a directory; rather, it is a system mailbox (typically /usr/spool/mail/ddts). However for the purposes of this discussion it is similar to the others in the sense that it has a ".lost" directory where it puts transactions it cannot handle. When you see files in the *.lost directories, this indicates that a daemon was unable to handle the corresponding transaction. You should check the creation timestamp of the file in *.lost, look in the LOG from around the same timeframe for additional symptoms, and look at the email you would have received (for certain situation). If the information from those sources enables you to solve the problem, go ahead and do it. If not, then please have all that information available when you call DDTS Technical Support. ------------------------------------------------------------------------- THE WEB ERROR_LOG In diagnosing sitations with respect to the web interface to DDTS, quite often it is useful to view the web server's error log file. Different servers (Apache, Netscape, etc) have different names, locations, and methods for accessing this file so consult your specific server's documentation for details. For the Apache web server, the error log is a file located under the web server's home directory, in a subdirectory called 'logs', with the filename 'error_log'. There is also a file called 'access_log' which logs accesses to the web server by client browsers. It is useful to use the "tail -f" trick described above under the description of the DDTS LOG on the error_log and access_log files too: cd DIRECTORY_CONTAINING_MY_WEB_STUFF/logs tail -f error_log& tail -f access_log& After running the above commands hit Enter a couple of times to get some command prompts back (this creates some white space and makes it visually easier to discern messags that were already there from new messages), then go to your web browser and create the error situation. Watch the error_log and access_log as your request is processed. You may see intelligible error messages, or you may see UNIX-looking errors like "permission denied" or "COMMAND: cannot execute". These are extremely useful in tracking down the real root cause of the problem. To take this one step further: in diagnosing problems in WebDDTS if there are no visible symptoms on the browser and no helpful messages on the web server's error_log or access_log, then it can be useful to try to determine where the problem is occurring by modifying the WebDDTS programs to produce diagnostic/debugging output. The best way to do that is to modify the programs and insert statements to write to STDERR (filehandle 2 in UNIX). In the context of a web application, such output is directed to the web server's error log and may be viewed in the manner described above. To write to STDERR, if you are working with a bourne shell script you could insert a statement like: >&2 echo "Hello, yes you did in fact get here." In the perl-based parts of the code, you could insert: print STDERR "Hello world.\n"; These examples show only writing static text (which is useful in the sense that it tells you whether you reached that part of the code or not) but you can also use any of the normal functions of echo and print, specifically you can print the values in variables, environmental information, etc. (NOTE: in release 4.1, any modifications that you make to the perl code will take affect only after the program wt_perl has been restarted. To do this cleanly, simply run the command 'webconfigure -noprompt' as user 'ddts'). ------------------------------------------------------------------------- XDDTS ERROR MESSAGES For the UNIX-hosted user interface 'xddts', sometimes error conditions occur which are recognized by the xddts program and presented in popup dialog boxes. Those messages are vital in solving the problem! In some cases however, errors occur and these are not trapped or recognized by xddts, so you may not get an error popup at all or the error popup may contain only *some* of the information needed to solve the problem. In such a case, look at the terminal window from which you invoked xddts (UNIX calls this the "parent process" or "owning terminal process"). Useful messages will often be written there. If there is no owning terminal process (for example, if you started xddts and then quit the xterm window, or if you started xddts from an OpenWindows popup menu) then you should shut down xddts, restart it from a terminal window, and recreate the error, being careful to note any messages that appear in the terminal window at the same point in time when the error occurs in xddts. ------------------------------------------------------------------------- CUSTOMIZATION PROBLEMS: WHERE TO LOOK? For issues involving a particular DDTS function on the user interface, or a problem that occurs for a particular user or group of users, it can be confusing trying to find out where to look to find symptoms and correct the problem. Examples include: 1. I've added a state, and now some of my fields don't show up on xddts in certain situations. 2. When I bring up a list of defects in a particular project, after building the index window xddts just hangs and I have to kill it. To solve problems like this, there are two main questions to answer: 1. If the symptom didn't used to exist but now it does, then what changed? Finding the answer to that question invariably leads to the solution to the problem. You can look at file date/time stamps, the ADMINLOG (tells you if any new classes/projects were created or removed), talk to your system administrator (they are usually not very helpful in this regard until you pin them down with evidence; see the next section), etc. 2. Is the problem general to all users of all functions for all defects, or is it localized in some way? Specifically, is it localized: - by user? - by machine? - by project - by class? Answering this question helps isolate whether a system-level factor (PATH, filesystem mounting, security permissions) are at work or whether the problem is in one of the files you modify in the process of customizing DDTS (e.g. master.tmpl, states/statenames, etc...) To check if a problem is "localized" to a certain factor, vary that factor and see if the problem changes or disappears. For example, if a particular problem happens for one UNIX user but not another, then that narrows the scope of the problem to aspects that are user-specific. That includes filesystem permission issues, DDTS-specific security issues, settings in the users' .ddtsrc or web preferences files, etc. It would specifically *exclude* system-level factors such as UNIX workstation configuration, or DDTS-based factors common to the two users such as master.tmpl, states, oneofs files, etc. By determining which factors influence the situation you can rule out certain factors as being relevant. Doing that allows you to focus on the remaining set of possibly-relevant factors and find the problem. Here are some specific guidelines: 1. If the problem is class-specific, then all users would be impacted when doing operations to defects in that class, but would be able to perform the same operation in a different class. If that's the case, this indicates the problem is somewhere in the class-specific files such as master.tmpl, states, statenames, or oneofs. 2. If the problem is project-specific (occurs for one project in a class but not for another project in the same class) then that further narrows the scope: the problem is probably caused by the project definition files (proj.info, proj.control, proj.notify). For example, the proj.control governs DDTS security access to defects (for view and for modification). If two projects have different proj.control's and one has the problem and the other does not, to determine if security is a factor change the problematic project's proj.control to be the same as the non-problematic one. If the behavior stops, then clearly the problem is with DDTS security. 3. Particularly for problems with the UNIX-based user interfaces like 'xddts', it is important to consider the host UNIX workstation on which the client program (xddts) is running, the workstation on which the program's windows are being viewed (may not be the same, if you have rlog'ed in to another workstation and set your DISPLAY back), and the userid you are logged in as. In such a circumstance, try varying one factor holding the others constant, and gauge the result. For example, if a user can't bring up xddts on a particular display, have the same person go to a different workstation, log in, and start xddts. If it starts OK then that indicates the user's personal DDTS file (~/.ddtsrc) is OK but that there is something about the X server on the first workstation that is not working correctly with xddts. If the workstation is running an older operating system or X environment it may need to be upgraded. On the other hand, if the anomalous behavior "follows" the user when they move to a different workstation, then clearly the behavior must have something to do with the user's personal DDTS startup file (~/.ddtsrc) or X configuration files (~/XDdts, .Xdefaults, .xinitrc, etc). In that event, try moving these files to a different name and retrying; if the problem goes away consult the DDTS or UNIX docs to try to discern the problem with the user's original files. ------------------------------------------------------------------------- TOOLS FOR DEBUGGING: DEBUG MODE, TRUSS, AND GDB Particularly when working with system-level problems or problems on the UNIX-hosted user interfaces, there are a couple of undocumented flags for the DDTS daemons and client programs, and a couple of UNIX tools that can be used to gain insight into problems and lead to solutions. First, command switches and debug mode: - The ddtsd daemon can be switched into "debug mode" where it gives a very verbose accounting of its activities in the DDTS LOG. This will show where it is looking to see if any of its subordinate daemons have any work to do, when it is invoking the daemons, and other activities like checking for "stale" lock files and for backgrounded transactions. To activate debug mode, get the process ID (PID) of the ddtsd daemon either from a 'ps' command, from the DDTS LOG startup messages for ddtsd, or from the file ~ddts/spool/ddtsd.pid. Then, issue the following: kill -USR2 PID ...where PID is the process ID. Then, watch the DDTS LOG (see above). To turn debug mode off, send the same signal (it's an on/off toggle). Several of the daemons can also be started with an undocumented "-d" option, which instructs them to run in debug mode. In some cases debug mode doesn't produce any additional useful output but in other cases it does. Use this option only at the direction of DDTS Technical Support. Several of the DDTS client commands (xddts in particular) have an optional "-f" switch which tells the command not to do an immediate fork(). Normally these commands do a fork() so as to disconnect themselves from the command shell so the user can get the command prompt back to run addtitional commands (this will usually be accompanied by a message about "backgrounding..."). In debugging there are situations where this behavior is disruptive to the diagnosis, so it can be turned off by specifying "-f". For example: xddts -f This is most useful in combination with gdb or truss; see below. Two favorite UNIX commands for this purpose are truss and gdb: 'truss' is a command only available on Solaris (SunOS has a similar command called 'trace' and SGI has a command called 'par' but there is no version for HP-UX or any of the other UNIX environments on which DDTS runs). 'truss' runs a command and traces its execution, showing information about all UNIX system calls it makes. This can be extremely enlightening particularly where PATH or security aspects are in question (the truss output will show specifically where a file was found, and will expose the details of any permission-denied problems). For example, to run the 'findbug' command under truss to see what files it's reading, you could use a command like: truss -o findbug.truss -a -f -rall -wall findbug -p my.project This command would run 'findbug -p my.project', capturing all the system-call activity to a file called findbug.truss, showing all parameters to exec() functions ("-a" operand), following all fork()'ed child processes ("-f"), and showing the full contents of all read and write buffers ("-rall" and "-wall"). See the truss manpage for more information. gdb, the GNU debugger, is useful to analyze core files or to interactively debug programs. To use gdb on a core file, do the following: gdb --- To get into GDB interactive mode core core -- To load the core file called 'core' into memory. Assuming the core file is readable, you will see output showing the program that produced the core file and some information about the conditions at the time of error. Note that you will probably not see complete information about function names and variables because the DDTS binaries are distributed with this information "stripped" (see the 'ld' and 'strip' manpages for details) but just knowing which binary produced the core file is usually valuable to know; if it's a DDTS program, you should try to track down the cause of the error by looking in the LOG or contacting Support for help. If it's not a DDTS program, you should contact the program's vendor for assistance. .............................................................................................................. This ends technote part 2 of 2 parts.