You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Nigel Kerr <ni...@jstor.org> on 2000/08/25 14:35:45 UTC

Tree Differencing with org.apache.crimson.treediff.*

good gentles,

i write with respect to the two classes in the
org.apache.crimson.treediff package in the xml-contrib area.  i'm not
sure if this is the best place to write about this, but here goes.
apologies if this is the wrong forum.

my interest in the two classes there, TreeDiff and DocumentTree, stemp
from wanting a way to test two trees for changes: an ongoing process
here exports data for external users, and that data changes and is
revised on an on-going basis.  the external user needs to be able to
see what chunks have changed, and then, within a given chunk, what
specifically changed.  these two classes, originally by Ram Jeyaraman
of Sun, seemed like a good place to start.

they are commented out of the build.xml file for the crimson package,
with the note that they are "not part of parser".  are there plans to
place it anywhere else within apache's code?  has/does anyone use it
or plan to use it, or would anyone find it useful?

i ask these questions because it has been worthwhile to me to spend
some time refactoring these two classes to make them easier to use,
and to present the differencing results in a more directly usable
form.  at the end of this message are three url's for java source
files that are the present state of my work.  i'd like to see this all
find its way back to somewhere it would be useful, if indeed it could.

in the refactoring, my goals were:

        make the diff-er easier to use inside other programs:
        submitting DOM trees, for instance, instead of files for
        differencing.

        make the results easier to use: handing back a DOM tree of the
        change operations between the two trees differenced.

        make the whole independent of any particular DOM
        implementation, so that it could be used in interestingly
        heterogenous environments (someone here in my group favors the
        oracle code, for example).

        have the code throw exceptions usefully and descriptively when
        things go awry.

        bring the 1998 code up to date with respect to current DOM and
        java2 Collections.  Jeyaraman had suggestions in the code for
        where these sorts of things could happen, and i've tried to
        make some of my own contributions.  i feel that there are
        possibly ways to make the internals sleeker and more
        OO-svelte, but this may just be wide-eyed refactoring zeal.

        understand better exactly how the algorithm works.

i'm making progress.  the first three have been accomplished in this
first cut, the last three have made some progress.  structurally, the
code flows much like Jeyaraman's original, and Jeyaraman's comments
are largely intact.  i need to write comments of my own to explain
what i'm up to.

i welcome comments or suggestions about how to refactor this all, and
indeed where this could usefully go in the apache xml landscape.

the source files:

   http://www-personal.umich.edu/~nigelk/treediff/treeDiffAlgorithm.java
   http://www-personal.umich.edu/~nigelk/treediff/treeDiffDriver.java
   http://www-personal.umich.edu/~nigelk/treediff/treeDiffTree.java

cheers,
nigel kerr
software developer, jstor.org




Re: Tree Differencing with org.apache.crimson.treediff.*

Posted by Edwin Goei <Ed...@eng.sun.com>.
Nigel Kerr wrote:
> 
> my interest in the two classes there, TreeDiff and DocumentTree, stemp
> from wanting a way to test two trees for changes: an ongoing process
> here exports data for external users, and that data changes and is
> revised on an on-going basis.  the external user needs to be able to
> see what chunks have changed, and then, within a given chunk, what
> specifically changed.  these two classes, originally by Ram Jeyaraman
> of Sun, seemed like a good place to start.
> 
> they are commented out of the build.xml file for the crimson package,
> with the note that they are "not part of parser".  are there plans to
> place it anywhere else within apache's code?  has/does anyone use it
> or plan to use it, or would anyone find it useful?

As far as I can tell, the treediff package allows you to get a diff of
two DOM trees.  However, I am not familiar with the code and no one
maintains it as far as I know.  I took it out of the build.xml file
because it is a separate module independent of a XML parser.

> 
> i ask these questions because it has been worthwhile to me to spend
> some time refactoring these two classes to make them easier to use,
> and to present the differencing results in a more directly usable
> form.  at the end of this message are three url's for java source
> files that are the present state of my work.  i'd like to see this all
> find its way back to somewhere it would be useful, if indeed it could.

Sounds good.  I'll see if I can find out if the original author is
interested in helping to maintain the code.  I'll forward your email to
him.

-Edwin