You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by "Thamm, Russell" <ru...@dsto.defence.gov.au> on 2005/09/07 01:19:16 UTC

Version Control of XML data

Hi,

I'm investigating ways of providing version control for XML documents residing on an eXist database.

As I use subversion of source version control, I am looking at subversion for this job.

The basic idea is to have a standard working copy on disk and write a special tool that:

A) on checkout/update, does a standard subversion checkout/update and then copies any updated files from WC to the data-base
B) on commit, copies any modified files from the database to the WC and then performs a subversion commit

This all seems relatively straight forward. The problem is differencing/merging.

The standard diff tools (although usable) suffer a number of drawbacks when handling XML data.

For example, many XML tools like to sort attributes. Two documents with identical content will be determined to have many differences by traditional tools because the atttributes have been reordered. 

There are differencing/merging tools for XML. The differences records produced by these tools are completely different than those produced by diff.

I understand that subversion stores the complete HEAD revision and differences for previous revisions and that old revisions are generated by applying the differences in reverse.

I understand that subversion allows you to override the default diff/merge tools.

If I configure subversion to use external differencing tools, will it use these exclusively for all operations involving differences?
In particular, does subversion use store differences supplied by the external tools and does it use these tools for reconstructing old revisions?

Thanks
Russell Thamm


Re: Version Control of XML data

Posted by Robert Koberg <ro...@koberg.com>.
Hi,

I have my app 'pretty print' when serializing to disk. Basically 
everything gets its own line unless the element contains mixed content. 
If someone works in a text editor, they are encouraged to pretty print 
before committing (heh...). This way you minimize conflicts to the node 
level as best as possible.

best,
-Rob

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Version Control of XML data

Posted by Mark Phippard <Ma...@softlanding.com>.
Dave Pawson <da...@gmail.com> wrote on 09/07/2005 08:52:01 AM:

> On 07/09/05, Mark Phippard <Ma...@softlanding.com> wrote:
> 
> > > Which makes it unsuitable for handling XML files, if a diff is 
needed?
> > 
> > Subversion may very well be unsuitable for handling certain types of 
XML
> > files, but it isn't for this reason.  The internal storage format used 
by
> > Subversion does nothing that will prevent you getting a proper diff
> > between revisions.
> Fair comment.
> I guess it is XML definitions that makes 'identical' different.... if
> you see what I mean :-)
> 
> <x><a:b/><x>
> can be 'identical' to
> 
> <x>
>     <a:b/>
> </x:
> 
> 
>   If you had a diff tool that gave the results you are
> > looking for and could be plugged into Subversion, then it would work
> > regardless of the Subversion internal formats.
> 
> and svn accepted the output of that diff engine, yes.
> 
> Ryan said
> No, Subversion always uses its internal differencing engine to store
> diffs. Specifying alternate diff tools is solely for your benefit
> when you're examining files.
> 
> which implies that is not the case?

No.  You are comparing two different things.  How Subversion internally 
stores its revisions is irrelevant (unless you are making a disk space 
argument).  Subversion would never allow an external program to be 
involved in this as you could not guarantee integrity.  Subversion uses 
the same binary differncing algorithm in all file types.  The output of 
this is used to store the revisions internally in the repository, but this 
has nothing to do with svn diff.  svn diff just constructs the full texts 
of the revisions being compared and then diffs them.  The internal 
representation is only used to construct the full texts.  If you plugin in 
an external diff tool, Subversion will just be feeding the full texts to 
that tool so that it can diff them. 

Mark


_____________________________________________________________________________
Scanned for SoftLanding Systems, Inc. by IBM Email Security Management Services powered by MessageLabs. 
_____________________________________________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Version Control of XML data

Posted by Dave Pawson <da...@gmail.com>.
On 07/09/05, Mark Phippard <Ma...@softlanding.com> wrote:

> > Which makes it unsuitable for handling XML files, if a diff is needed?
> 
> Subversion may very well be unsuitable for handling certain types of XML
> files, but it isn't for this reason.  The internal storage format used by
> Subversion does nothing that will prevent you getting a proper diff
> between revisions.
Fair comment.
I guess it is XML definitions that makes 'identical' different.... if
you see what I mean :-)

<x><a:b/><x>
can be 'identical' to

<x>
    <a:b/>
</x:






  If you had a diff tool that gave the results you are
> looking for and could be plugged into Subversion, then it would work
> regardless of the Subversion internal formats.

and svn accepted the output of that diff engine, yes.

Ryan said
No, Subversion always uses its internal differencing engine to store
diffs. Specifying alternate diff tools is solely for your benefit
when you're examining files.

which implies that is not the case?



regards 

-- 
Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org


Re: Version Control of XML data

Posted by Mark Phippard <Ma...@softlanding.com>.
Dave Pawson <da...@gmail.com> wrote on 09/07/2005 07:35:47 AM:

> > No, Subversion always uses its internal differencing engine to store
> > diffs. Specifying alternate diff tools is solely for your benefit
> > when you're examining files.
> 
> Which makes it unsuitable for handling XML files, if a diff is needed?

Subversion may very well be unsuitable for handling certain types of XML 
files, but it isn't for this reason.  The internal storage format used by 
Subversion does nothing that will prevent you getting a proper diff 
between revisions.  If you had a diff tool that gave the results you are 
looking for and could be plugged into Subversion, then it would work 
regardless of the Subversion internal formats.

On Windows, TortoiseSVN lets you associate different diff tools for 
different file extensions.  As an example, it comes with scripts to diff 
MS Word as well as OpenOffice.org documents.

Mark


_____________________________________________________________________________
Scanned for SoftLanding Systems, Inc. by IBM Email Security Management Services powered by MessageLabs. 
_____________________________________________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Version Control of XML data

Posted by Dave Pawson <da...@gmail.com>.
On 07/09/05, Ryan Schmidt <su...@ryandesign.com> wrote:

> > For example, many XML tools like to sort attributes. Two documents
> > with identical content will be determined to have many differences
> > by traditional tools because the atttributes have been reordered.
> 
> You could build functionality into your wrapper script to normalize
> your XML files into a predictable format. If the tools you use like
> to write entries in an unpredictable order, for example, then your
> wrapper could sort the entries before doing anything. That would make
> the diff make sense again, and show only real differences.
Except for whitespace handling in XML.
XML diff tools are really a long way from text file diff tools.
There is quite a history of this on xml-dev list.


> > If I configure subversion to use external differencing tools, will
> > it use these exclusively for all operations involving differences?

> 
> No, Subversion always uses its internal differencing engine to store
> diffs. Specifying alternate diff tools is solely for your benefit
> when you're examining files.

Which makes it unsuitable for handling XML files, if a diff is needed?

http://www.jclark.com/xml/ has xmltest.java which provides
a cononical version of an instance.
It may help.

-- 
Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org


Re: Version Control of XML data

Posted by Ryan Schmidt <su...@ryandesign.com>.
On Sep 7, 2005, at 03:19, Thamm, Russell wrote:

> I'm investigating ways of providing version control for XML  
> documents residing on an eXist database.

[snip]

> The standard diff tools (although usable) suffer a number of  
> drawbacks when handling XML data.
>
> For example, many XML tools like to sort attributes. Two documents  
> with identical content will be determined to have many differences  
> by traditional tools because the atttributes have been reordered.

You could build functionality into your wrapper script to normalize  
your XML files into a predictable format. If the tools you use like  
to write entries in an unpredictable order, for example, then your  
wrapper could sort the entries before doing anything. That would make  
the diff make sense again, and show only real differences.


> I understand that subversion stores the complete HEAD revision and  
> differences for previous revisions and that old revisions are  
> generated by applying the differences in reverse.

That is only the case for BerkeleyDB repositories. It is not the case  
for FSFS repositories, which do the opposite (store the initial  
version, and then the differences from that to get to HEAD). Unless  
you have a specific reason not to, you should be using FSFS, because  
there are circumstances under which BDB repositories can get wedged  
or worse, and this has not so far been seen with FSFS.


> I understand that subversion allows you to override the default  
> diff/merge tools.
>
> If I configure subversion to use external differencing tools, will  
> it use these exclusively for all operations involving differences?
> In particular, does subversion use store differences supplied by  
> the external tools and does it use these tools for reconstructing  
> old revisions?

No, Subversion always uses its internal differencing engine to store  
diffs. Specifying alternate diff tools is solely for your benefit  
when you're examining files.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org