You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Sebastian Rönnau <Se...@unibw-muenchen.de> on 2004/09/26 20:35:28 UTC

Implementing XML-versioning?

Hello,

I'm writing my master thesis on versioning of XML documents, especially 
OpenOffice-files.
Up to now, I tested several diff algorithms and I'm trying to figure out, 
whether it is possible to implement one of them into subversion.
The problems I see are:
- the delta of the xml-diff has another format (but it's still an xml-file)
- the xml-diff can't handle three-way merges
- up to now, you can't check whether two deltas commutate (however, this 
should be solveable)

My main questions are:
- is it possible to change subversions' diff-algorithm at all? Jim Blandy et 
al. mention a client side diff-plugin support which should be implemented 
after 1.0. Where can I find more information about it? 
- where would be the best point to start implementation? 

I would be thankful if anyone could give me some hints concerning above 
questions.

Sebastian Roennau

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Implementing XML-versioning?

Posted by Paul de Vrieze <pa...@gentoo.org>.
On Friday 01 October 2004 21:17, Ben Reser wrote:
> >
> > For two reasons the answer might be no:
> > 1. OOo files are a zip archive containing several xml-files.
> > 2. OOo uses XML files without any white spaces or newlines. If you apply
> > GNU diff on such a file, it marks all the file (which consists of only
> > one line) as updated. According to this, the delta has twice the size of
> > the original file. I don't know, whether svndiff performs better.
>
> svndiff is a binary difference.  Whitespace and CR/LF have no relevence
> to how it determines a differece.  You could have no whitespace or
> whitespace and the difference in svndiff size will be negliable.
>
> Now the zip compression will ineed probably make svndiff behave poorly
> for keeps lots of history of OOo files.  Simply because even a small
> change in a file will produce large differences in the compressed copy
> of it.
>
> While we could use less storage by using different delta algorithms for
> different types of data it would be at the cost of significantly more
> complexity.  I seriously doubt that it's worth it.  One of the mantras
> of this project is that disk space is cheap.  The effort to implement
> content specific deltas is far more expensive in terms of actual
> implemention, maintenance and operation than buying more disk space
> would be.

I basically agree, although it might be nice to do compression aware diffing. 
The only sensible way to implement would be through some kind of plugins. 
These plugins should then perform the compression decompression. Probably 
this could even be done client side conditionally dependend on a property 
(binary stability is not guaranteed).

Paul

-- 
Paul de Vrieze
Gentoo Developer
Mail: pauldv@gentoo.org
Homepage: http://www.devrieze.net

Re: Implementing XML-versioning?

Posted by Ben Reser <be...@reser.org>.
On Fri, Oct 01, 2004 at 08:23:15PM +0200, Sebastian Rönnau wrote:
> > Maybe this answers your question, maybe not! Is plain old text merging
> > not sufficient for OpenOffice's file format?
> For two reasons the answer might be no:
> 1. OOo files are a zip archive containing several xml-files. 
> 2. OOo uses XML files without any white spaces or newlines. If you apply GNU 
> diff on such a file, it marks all the file (which consists of only one line) 
> as updated. According to this, the delta has twice the size of the original 
> file. I don't know, whether svndiff performs better.

svndiff is a binary difference.  Whitespace and CR/LF have no relevence
to how it determines a differece.  You could have no whitespace or
whitespace and the difference in svndiff size will be negliable.

Now the zip compression will ineed probably make svndiff behave poorly
for keeps lots of history of OOo files.  Simply because even a small
change in a file will produce large differences in the compressed copy
of it.

While we could use less storage by using different delta algorithms for
different types of data it would be at the cost of significantly more
complexity.  I seriously doubt that it's worth it.  One of the mantras
of this project is that disk space is cheap.  The effort to implement
content specific deltas is far more expensive in terms of actual
implemention, maintenance and operation than buying more disk space
would be.

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Implementing XML-versioning?

Posted by Sebastian Rönnau <Se...@unibw-muenchen.de>.
> >- is it possible to change subversions' diff-algorithm at all? 

> On the client, "svn diff" takes a --diff-cmd switch which allows you to
> use a different diff tool when displaying differences on the client.
which doesnt't seem to affect the way subversion handles the two revisions of 
one file, does it?

> "svn update" takes a --diff3-cmd switch which allows you to use a 3-way
> merge tool other than the one built into Subversion. These are all
> client-side only, they won't change the way Subversion stores deltas
> internally (binary diffs between revisions).
My thought was to implement the xml-diff directly in subversion including its 
own diff format. But I fear that this would involve a re-implementation of 
the complete libsvn_client...
But I think that svndiff format is not ideal for storing xml-files (the 
deltas), isnt it? I think, that my initial idea was too naive. If anyone has 
a new idea, how to link a new diff / patch to subversion, please let me know.

> Maybe this answers your question, maybe not! Is plain old text merging
> not sufficient for OpenOffice's file format?
For two reasons the answer might be no:
1. OOo files are a zip archive containing several xml-files. 
2. OOo uses XML files without any white spaces or newlines. If you apply GNU 
diff on such a file, it marks all the file (which consists of only one line) 
as updated. According to this, the delta has twice the size of the original 
file. I don't know, whether svndiff performs better.

> Cheers,
> Mike.
Thank you for the answer,

Sebastian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Implementing XML-versioning?

Posted by Mike Mason <mg...@thoughtworks.net>.
Sebastian Rönnau wrote:

>My main questions are:
>- is it possible to change subversions' diff-algorithm at all? Jim Blandy et 
>al. mention a client side diff-plugin support which should be implemented 
>after 1.0. Where can I find more information about it? 
>  
>

On the client, "svn diff" takes a --diff-cmd switch which allows you to 
use a different diff tool when displaying differences on the client. 
"svn update" takes a --diff3-cmd switch which allows you to use a 3-way 
merge tool other than the one built into Subversion. These are all 
client-side only, they won't change the way Subversion stores deltas 
internally (binary diffs between revisions).

Maybe this answers your question, maybe not! Is plain old text merging 
not sufficient for OpenOffice's file format?

Cheers,
Mike.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org