You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xerces.apache.org by Rob Davis-5 <te...@robertjdavis.co.uk> on 2008/12/05 16:57:29 UTC

How to compare Documents? Existing library/method available? or use DOMTreeWalker?

I want to compare two Document objects.

That is compare their contents to see if they are exactly the same. This
would mean recursively traversing through each node in the Document and
comparing it with the other Document.

Is there are routine, method or library to do this already?

If not then I could write my own, using a DOMTreeWalker for each Document,
they would be iterated through and as soon as an inequality encountered then
the code would indicating not equal.

Another way might be to generate a checksum and compare, or perhaps even use
the checksum available from the file that the Document is created from.

I've already searched google and this forum using the terms compare and
comparison. The closest I got was:
http://www.nabble.com/Node-equals()---to7319083.html#a7319083

Which seems to be what I'm looking for in the original post of this thread
but the responses seem off topic and talk about "serializing entity defs
with quotes".

Thoughts? Thanks in advance.

-- 
View this message in context: http://www.nabble.com/How-to-compare-Documents--Existing-library-method-available--or-use-DOMTreeWalker--tp20856968p20856968.html
Sent from the Xerces - J - Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: How to compare Documents? Existing library/method available? or use DOMTreeWalker?

Posted by Rob Davis-5 <te...@robertjdavis.co.uk>.
Thank you Andy. I have used Michael's solution. However the link you posted
is interesting: that could be useful for other things!



Andy Stevens-2 wrote:
> 
> 2008/12/5 Rob Davis-5 <te...@robertjdavis.co.uk>:
>>
>> I want to compare two Document objects.
>>
>> That is compare their contents to see if they are exactly the same. This
>> would mean recursively traversing through each node in the Document and
>> comparing it with the other Document.
>>
>> Is there are routine, method or library to do this already?
> 
> The difference engine from XMLUnit?
> http://xmlunit.sourceforge.net/userguide/html/ar01s03.html
> 
> 
> Andy
> -- 
> http://pseudoq.sourceforge.net/  Open source java Sudoku solver
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/How-to-compare-Documents--Existing-library-method-available--or-use-DOMTreeWalker--tp20856968p20912399.html
Sent from the Xerces - J - Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: How to compare Documents? Existing library/method available? or use DOMTreeWalker?

Posted by Andy Stevens <in...@googlemail.com>.
2008/12/5 Rob Davis-5 <te...@robertjdavis.co.uk>:
>
> I want to compare two Document objects.
>
> That is compare their contents to see if they are exactly the same. This
> would mean recursively traversing through each node in the Document and
> comparing it with the other Document.
>
> Is there are routine, method or library to do this already?

The difference engine from XMLUnit?
http://xmlunit.sourceforge.net/userguide/html/ar01s03.html


Andy
-- 
http://pseudoq.sourceforge.net/  Open source java Sudoku solver

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: How to compare Documents? Existing library/method available? or use DOMTreeWalker?

Posted by Rob Davis-5 <te...@robertjdavis.co.uk>.
Unfortunately this did not work but I have another solution - see other
thread:
http://www.nabble.com/Filtering-whitespace-outside-of-xml-elements-using-LSParserFilter-td20918689.html#a20933774

More detail:
When I tried normalizeDocument() I got an LSException with "Premature end of
file":

Document mydoc = methodToGetDocument();
// at this point we have a Document ready for use
mydoc.normalizeDocument(); 



Sometimes I didn't get the Exception but my original problem was not solved
- the whitespaces were still there.


-- 
View this message in context: http://www.nabble.com/How-to-compare-Documents--Existing-library-method-available--or-use-DOMTreeWalker--tp20856968p20933784.html
Sent from the Xerces - J - Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: How to compare Documents? Existing library/method available? or use DOMTreeWalker?

Posted by Jacob Kjome <ho...@visi.com>.
Have you tried using normalizeDocument() with validation enabled (assuming you 
have a DTD and/or Schema).  It is supposed to consolidate adjacent text nodes 
into a single node

http://www.w3.org/TR/2003/CR-DOM-Level-3-Core-20031107/core.html#Document3-normalizeDocument
http://www.ibm.com/developerworks/xml/library/x-keydom2.html


Jake

On Tue, 9 Dec 2008 09:00:46 -0800 (PST)
  Rob Davis-5 <te...@robertjdavis.co.uk> wrote:
> 
> 
> Rob Davis-5 wrote:
>> 
>> This works. Thank you Michael!
>> 
> 
> Actually it partially works - it doesn't ignore whitespace outside of
> elements - which is what I require. I have started a new thread.
> http://www.nabble.com/Filtering-whitespace-outside-of-xml-elements-using-LSParserFilter-td20918689.html
> 
> -- 
> View this message in context: 
>http://www.nabble.com/How-to-compare-Documents--Existing-library-method-available--or-use-DOMTreeWalker--tp20856968p20918814.html
> Sent from the Xerces - J - Users mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
>For additional commands, e-mail: j-users-help@xerces.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: How to compare Documents? Existing library/method available? or use DOMTreeWalker?

Posted by Rob Davis-5 <te...@robertjdavis.co.uk>.

Rob Davis-5 wrote:
> 
> This works. Thank you Michael!
> 

Actually it partially works - it doesn't ignore whitespace outside of
elements - which is what I require. I have started a new thread.
http://www.nabble.com/Filtering-whitespace-outside-of-xml-elements-using-LSParserFilter-td20918689.html

-- 
View this message in context: http://www.nabble.com/How-to-compare-Documents--Existing-library-method-available--or-use-DOMTreeWalker--tp20856968p20918814.html
Sent from the Xerces - J - Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: How to compare Documents? Existing library/method available? or use DOMTreeWalker?

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Right. Can only count on what is in the DOM specification. Java-isms like
equals(), hashCode() and toString() aren't defined and may behave
differently in each implementation. You cannot rely on them.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Rob Davis-5 <te...@robertjdavis.co.uk> wrote on 12/09/2008 05:47:42 AM:

> This works. Thank you Michael!
>
> Sorry I missed it. I was looking for an overridden equals() method but I
> guess the naming has to comply with the language-independent W3C
> specification.
>
> If interested, what I'm doing is polling for a smallish XML file (2KB)
being
> changed on a Windows file system. Date stamp checking and File content
> String comparison with a previous version is clearly not robust: what if
a
> whitespace is inserted - the Document is still the same. So your answer
to
> my problem is precisely what I need. Once again thank you.
>
>
>
> Michael Glavassevich-3 wrote:
> >
> > Have you tried Node.isEqualNode() [1]?
> >
> > [1]
> > http://xerces.apache.org/xerces2-j/javadocs/api/org/w3c/dom/Node.
> html#isEqualNode(org.w3c.dom.Node)
> >
> > Michael Glavassevich
> > XML Parser Development
> > IBM Toronto Lab
> > E-mail: mrglavas@ca.ibm.com
> > E-mail: mrglavas@apache.org
> >
> > Rob Davis-5 <te...@robertjdavis.co.uk> wrote on 12/05/2008 10:57:29 AM:
> >
> >> I want to compare two Document objects.
> >>
> >> That is compare their contents to see if they are exactly the same.
This
> >> would mean recursively traversing through each node in the Document
and
> >> comparing it with the other Document.
> >>
> >> Is there are routine, method or library to do this already?
> >>
> >> If not then I could write my own, using a DOMTreeWalker for each
> > Document,
> >> they would be iterated through and as soon as an inequality
encountered
> > then
> >> the code would indicating not equal.
> >>
> >> Another way might be to generate a checksum and compare, or perhaps
even
> > use
> >> the checksum available from the file that the Document is created
from.
> >>
> >> I've already searched google and this forum using the terms compare
and
> >> comparison. The closest I got was:
> >> http://www.nabble.com/Node-equals()---to7319083.html#a7319083
> >>
> >> Which seems to be what I'm looking for in the original post of this
> > thread
> >> but the responses seem off topic and talk about "serializing entity
defs
> >> with quotes".
> >>
> >> Thoughts? Thanks in advance.
> >>
> >> --
> >> View this message in context: http://www.nabble.com/How-to-compare-
> >> Documents--Existing-library-method-available--or-use-DOMTreeWalker--
> >> tp20856968p20856968.html
> >> Sent from the Xerces - J - Users mailing list archive at Nabble.com.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> >> For additional commands, e-mail: j-users-help@xerces.apache.org
> >
>
> --
> View this message in context: http://www.nabble.com/How-to-compare-
> Documents--Existing-library-method-available--or-use-DOMTreeWalker--
> tp20856968p20912379.html
> Sent from the Xerces - J - Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org

Re: How to compare Documents? Existing library/method available? or use DOMTreeWalker?

Posted by Rob Davis-5 <te...@robertjdavis.co.uk>.
This works. Thank you Michael!

Sorry I missed it. I was looking for an overridden equals() method but I
guess the naming has to comply with the language-independent W3C
specification.

If interested, what I'm doing is polling for a smallish XML file (2KB) being
changed on a Windows file system. Date stamp checking and File content
String comparison with a previous version is clearly not robust: what if a
whitespace is inserted - the Document is still the same. So your answer to
my problem is precisely what I need. Once again thank you.



Michael Glavassevich-3 wrote:
> 
> Have you tried Node.isEqualNode() [1]?
> 
> [1]
> http://xerces.apache.org/xerces2-j/javadocs/api/org/w3c/dom/Node.html#isEqualNode(org.w3c.dom.Node)
> 
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
> 
> Rob Davis-5 <te...@robertjdavis.co.uk> wrote on 12/05/2008 10:57:29 AM:
> 
>> I want to compare two Document objects.
>>
>> That is compare their contents to see if they are exactly the same. This
>> would mean recursively traversing through each node in the Document and
>> comparing it with the other Document.
>>
>> Is there are routine, method or library to do this already?
>>
>> If not then I could write my own, using a DOMTreeWalker for each
> Document,
>> they would be iterated through and as soon as an inequality encountered
> then
>> the code would indicating not equal.
>>
>> Another way might be to generate a checksum and compare, or perhaps even
> use
>> the checksum available from the file that the Document is created from.
>>
>> I've already searched google and this forum using the terms compare and
>> comparison. The closest I got was:
>> http://www.nabble.com/Node-equals()---to7319083.html#a7319083
>>
>> Which seems to be what I'm looking for in the original post of this
> thread
>> but the responses seem off topic and talk about "serializing entity defs
>> with quotes".
>>
>> Thoughts? Thanks in advance.
>>
>> --
>> View this message in context: http://www.nabble.com/How-to-compare-
>> Documents--Existing-library-method-available--or-use-DOMTreeWalker--
>> tp20856968p20856968.html
>> Sent from the Xerces - J - Users mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
>> For additional commands, e-mail: j-users-help@xerces.apache.org
> 

-- 
View this message in context: http://www.nabble.com/How-to-compare-Documents--Existing-library-method-available--or-use-DOMTreeWalker--tp20856968p20912379.html
Sent from the Xerces - J - Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


Re: How to compare Documents? Existing library/method available? or use DOMTreeWalker?

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Have you tried Node.isEqualNode() [1]?

[1]
http://xerces.apache.org/xerces2-j/javadocs/api/org/w3c/dom/Node.html#isEqualNode(org.w3c.dom.Node)

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

Rob Davis-5 <te...@robertjdavis.co.uk> wrote on 12/05/2008 10:57:29 AM:

> I want to compare two Document objects.
>
> That is compare their contents to see if they are exactly the same. This
> would mean recursively traversing through each node in the Document and
> comparing it with the other Document.
>
> Is there are routine, method or library to do this already?
>
> If not then I could write my own, using a DOMTreeWalker for each
Document,
> they would be iterated through and as soon as an inequality encountered
then
> the code would indicating not equal.
>
> Another way might be to generate a checksum and compare, or perhaps even
use
> the checksum available from the file that the Document is created from.
>
> I've already searched google and this forum using the terms compare and
> comparison. The closest I got was:
> http://www.nabble.com/Node-equals()---to7319083.html#a7319083
>
> Which seems to be what I'm looking for in the original post of this
thread
> but the responses seem off topic and talk about "serializing entity defs
> with quotes".
>
> Thoughts? Thanks in advance.
>
> --
> View this message in context: http://www.nabble.com/How-to-compare-
> Documents--Existing-library-method-available--or-use-DOMTreeWalker--
> tp20856968p20856968.html
> Sent from the Xerces - J - Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-users-help@xerces.apache.org