-----Original Message-----
From: Weiss, Paul 
Sent: Friday, April 25, 2003 11:28 AM
To: Bickford, Fred; Faria, Mike; chat_cc; tech-cc-all
Subject: RE: SOLUTION:182359002(DRAFT) - WINDOWS: Can ClearCase be configured not to use the XML and HTML diffmerge ?


Folks,  there are trade-offs to changing the default magic file entries.  I asked our expert in this area for input on this and have included it below.  Please make sure our customers understand the limitations before doing this.  I am concerning with customers doing this and then generating calls when they do not follow the rules/limits.

*********
	As to tradeoffs and benefits:

	There are several benefits to the XML diff merge tool:

	* Diffs are independent of XML input formatting
	  - XML Diff Merge has no line length restrictions, and the diffs
	    will appear the same, whether the file is nicely indented,
	    or all on a single line, or in any state in between.

	    Most XML is produced by machine. It is often not formatted
	    to be understood by humans (i.e., not indented in a semantically
	    meaningful way).

	    We have learned from experience that many XML files have
  	    either VERY long lines (thousands of characters long),
	    or no line termination at all (e.g., XML generated by MSXML).

	  - Text diff merge has a limitation of around 3000 characters
	    as the max line length. Text_file containers have a limitation
	    of 8000 characters as the max line length.

	  - XML Diff Merge can therefore process some files that
	    text diff merge cannot.

	  - XML Diff Merge parses the XML into a tree, so the result
	    is always "pretty-printed", while maintaining all whitespace
	    exactly as in the original files (e.g., during a merge). 


	* XML Diff Merge understands different XML encodings, especially UTF-16
	  - If your XML files use UTF-16, you can't use text_file containers,
	    and you can't use the text diff merge tool. UTF-16 looks like 
	    "binary" to these tools. 

	  - XML Diff Merge can therefore process some files that
	    text diff merge cannot.

	  - XML Diff Merge understands and normalizes several XML encodings.
	    You can even compare a version encoded as UTF-8 vs. one in ASCII
	    vs. UTF-16, and get only "real" differences, not differences caused
	    by the same information being encoded in multiple ways.

	  - When merging, XML Diff Merge can even convert from one encoding to 
 	    another. For example, if your inputs are UTF-16, you can write
	    the output in UTF-8. Or any other supported encoding.


	* XML Diff Merge "looks thru the eyes of the XML parser".
        - XML Diff Merge parses the XML and breaks it down into its syntactic 
	    components (e.g., elements, attributes, attribute values).
	    It can, for example, auto-merge differences in attributes, 
	    even if those attributes are formatted all on one line.
	    Text diff merge would simply report a conflict.

	  - XML Diff Merge also resolves such XML-isms as character references.
	    For example, a "copyright" character may be placed in a file 
  	    directly (C) or by character reference (&#169; or &#xa9;). When placed
	    directly, the actual sequence of bytes used may be different, 
	    according to the encoding used. In XML Diff Merge, you'll always
	    see the C (assuming your font has that glyph, of course), and you don't
	    have to worry about things like encodings and references.

	  - XML Diff Merge is very UNICODE-aware. It is capable of displaying the
	    full 16-bit character space directly, on any system. Most 
	    MBCS/I18N apps can display, say, Japanese text ONLY on a machine
	    running *Japanese* Windows. XML Diff Merge can display the Japanese text
	    on an English system (using, say, Arial Unicode MS font that has
	    the Japanese glyphs).

	  - Text diff merge MAY be able to display the Japanese characters
	    if the file contains a BOM. XML Diff Merge will work for any
	    supported XML encoding.

	  - XML Diff Merge can therefore give a correct display for some files
	    that text diff merge cannot.

	
	* The "tree view" of XML Diff Merge has the potential for tree-editing
	  - XML is structured as a tree. If you wanted to edit that tree, 
          adding elements or moving them around, for example, you would need to 
	    be careful to get both the starting and ending elements, or you
	    would wind up with an invalid XML file. 
	
	    With a tree view, the user could edit that tree without worry
	    of making such errors, no matter how badly the source file
	    was formatted.

	  - I say "potential", however, since these tree-edit operations are
	    mostly not-yet-implemented. But this was part of the idea with 
	    going with a tree view.

	  - The tree view also allows expand/collapse of different parts of the
	    tree, and this is often useful for understanding the structure
	    and the diffs. This part *is* implemented today.


*********
	I suggest we be careful how we word a technote about the current set of limitations/tradeoffs in a solution.   
-Paul
**********
 
   Unfortunately, there are also several serious drawbacks:

	* The XML Diff Merge algorithm has several serious, fundamental limitations
	  and shortcomings. The current algorithm is often fooled by whitespace differences,
	  for example, and this often makes merging difficult. There are, in fact,
	  a number of scenarios that cause poor diff and merge results.

	* The algorithm can take a long time to run. There may also be one or more
	  bugs that cause the compute time to be *very* much longer than necessary.

	* The algorithm can consume a lot of memory. This fact, coupled with the
	  runtime problem above, impose a practical upper limit of about 1Mb on 
	  the size of XML files that can be diff/merged.

	* Algorithm problems can be corrected over time, given resources, etc.

	* However, since it parses the XML file, XML Diff Merge cannot operate if
	  the XML has a syntax error, or is in an unsupported encoding,
	  or contains some XML structure that requires what is called a "validating"
	  parse (i.e., macro expansion).

	  In such cases, the file cannot be interpreted as XML, so the user must resort
	  to a "lower level" of diff, such as line-oriented text diff.
	
	  This problem can't really be corrected, except by making it easier to 
	  "fall back" to a text diff. The current tool tries hard to make this
	  case work, and to make it easy, but of course it always tries the XML
	  way first. If a user knows *a priori* that the XML way won't work,
	 it would be better to allow them to go straight to the text method.


Hope the above information helps
Thanks
Paul