You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Chris Jensen <cj...@edex.com.au> on 2004/07/07 00:23:32 UTC

Diffing XML files or storing binary documents

Hi,
Has anyone else here had experience with trying to store documentation 
into subversion? Specifically I'm looking at OpenOffice files.
I was thinking of using the FlatXMLFilter which will avoid the zipping, 
so the files should diff better and thus store more efficiently in the 
repository shouldn't they?

Any suggestions, caveats, neat tricks when doing this?

-- 
---------------------------------------------------------------------
Chris Jensen cjensen@edex.com.au

Educational Experience (Australia)
Postal Address: PO Box 860, Newcastle NSW 2300
Freecall:       1-800-025 270      International: +61-2-4923 8222
Fax:            (02) 4942 1991     International: +61-2-4942 1991

Visit our online Toy store! http://www.toysandmore.com.au/
---------------------------------------------------------------------

Re: Diffing XML files or storing binary documents

Posted by Paul Bijnens <pa...@xplanation.com>.
Chris Jensen wrote:

> Has anyone else here had experience with trying to store documentation 
> into subversion? Specifically I'm looking at OpenOffice files.
> I was thinking of using the FlatXMLFilter which will avoid the zipping, 
> so the files should diff better and thus store more efficiently in the 
> repository shouldn't they?
> 
> Any suggestions, caveats, neat tricks when doing this?


I'm currently testing some versioning of XML-documents, where I do
like to have a finegrained view on the differences in the XML-files.

I don't use OOo for this (although some people suggest I could/should).

I make sure I always safe my xml-files in a format produced by the
perlmodule XML::Twig "nsgmls" style; from the man page:

   Line breaks are inserted in safe places: that is within tags, between
   a tag and an attribute, between attributes and before the > at the
   end of a tag.
   This is quite ugly but better than "none", and it is very safe, the
   document will still be valid (conforming to its DTD).
   This is how the SGML parser "sgmls" splits documents, hence the name.

Diff-ing these line-oriented XML-files works good for my application.

Currently I have to make sure that my xml-application always writes
the files in that format.  My users however can, and are permitted,
to edit the xml-files using any xml-editor or even simple text editor.
Because the xml is easier to read when using other formatting
conventions, they sometimes forget to reformat the file into that
ugly "nsgmls" format.  Tracing the "real" diffs is difficult for
such files.

I'm new to subversion, and if I somehow could automate subversion to
always preprocess a file to force the "nsgmls" format before checking
in that would help a lot (and at the same time do a dtd-check too?).
Anybody an idea how to handle this using subversion?

Using OOo with some filter similar to FlatXMLFilter (which I'm
unfamiliar with too) is one of the possibilities, but still leaves
an opportunity to mess with the xml-file using other means.

I'm just testing out some possibilities, nothing real to show yet.

-- 
Paul Bijnens, Xplanation                            Tel  +32 16 397.511
Technologielaan 21 bus 2, B-3001 Leuven, BELGIUM    Fax  +32 16 397.512
http://www.xplanation.com/          email:  Paul.Bijnens@xplanation.com
***********************************************************************
* I think I've got the hang of it now:  exit, ^D, ^C, ^\, ^Z, ^Q, F6, *
* quit,  ZZ, :q, :q!,  M-Z, ^X^C,  logoff, logout, close, bye,  /bye, *
* stop, end, F3, ~., ^]c, +++ ATH, disconnect, halt,  abort,  hangup, *
* PF4, F20, ^X^X, :D::D, KJOB, F14-f-e, F8-e,  kill -1 $$,  shutdown, *
* kill -9 1,  Alt-F4,  Ctrl-Alt-Del,  AltGr-NumLock,  Stop-A,  ...    *
* ...  "Are you sure?"  ...   YES   ...   Phew ...   I'm out          *
***********************************************************************



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Diffing XML files or storing binary documents

Posted by John Szakmeister <jo...@szakmeister.net>.
> This is currently being worked on, and is planned for 1.2.  As always,
> there are no guarantees it'll actually make into 1.2 though.

s/into/it into/

-John

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Diffing XML files or storing binary documents

Posted by John Szakmeister <jo...@szakmeister.net>.
[snip]
> I think a locking scheme (exclusive checkout) would be better for OOo
> docs than the normal SVN functionality.  Although having said this, it
> would be possible to take the OOo source and add support to SVN for
> intelligent merging and conflict resolution.  Anyone feel up to the
> challenge?  This would give subversion a major advantage over any other
> revision control package out there.  Even MS Source Safe doesn't have
> support for MS Word docs (well not last time I checked).

This is currently being worked on, and is planned for 1.2.  As always, 
there are no guarantees it'll actually make into 1.2 though.

-John

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

RE: Diffing XML files or storing binary documents

Posted by Vincent Thornley <vt...@iee.org>.
Felix Collins wrote:
> I think a locking scheme (exclusive checkout) would be better for OOo
> docs than the normal SVN functionality.  Although having said this, it
> would be possible to take the OOo source and add support to SVN for
> intelligent merging and conflict resolution.  Anyone feel up to the
> challenge?  This would give subversion a major advantage over any other
> revision control package out there.  Even MS Source Safe doesn't have
> support for MS Word docs (well not last time I checked).
>

Perhaps a more generally applicable solution would be for subversion to
recognise archive files (by content not extension to include .sx*) and delve
into them when performing merges, diffs, etc. Then the command can be
applied individually to each component of the archive file.

Vince


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Diffing XML files or storing binary documents

Posted by Felix Collins <fe...@keyghost.com>.
Roger Keays wrote:

> Hi Felix,
> 
> OOo documents under svn sound very interesting!

> Do you do merges often? How often do you get conflicts and is svn an
> effective way to allow a document to be edited by different users in
> parallel?

Any merge will result in a conflict because SVN only does text merges. 
We don't have too many files that are worked on by multiple people yet. 
  I have had to do three conflict resolution/merge operations so far. 
It is quite easy using the Compare Documents command in OOo. (at least 
in OO writer).

I think a locking scheme (exclusive checkout) would be better for OOo 
docs than the normal SVN functionality.  Although having said this, it 
would be possible to take the OOo source and add support to SVN for 
intelligent merging and conflict resolution.  Anyone feel up to the 
challenge?  This would give subversion a major advantage over any other 
revision control package out there.  Even MS Source Safe doesn't have 
support for MS Word docs (well not last time I checked).

> Also, where can you get the FlatXMLFilter ? My debian distribution doesn't
> seem to include it.

Don't know.


Felix

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Diffing XML files or storing binary documents

Posted by Roger Keays <ro...@ninthave.net>.
Hi Felix,

OOo documents under svn sound very interesting!

> Yep we store open office files in subversion.  We just check in the
> *.sx* file and hope for the best.  The only tricky thing is when you
> have a conflict.  You have to open one of the files in OO and select the
>   compare documents menu option.  The merging interface of OO is a
> little tricky to follow so stay alert when merging.

Do you do merges often? How often do you get conflicts and is svn an
effective way to allow a document to be edited by different users in
parallel?

Also, where can you get the FlatXMLFilter ? My debian distribution doesn't
seem to include it.

> It would be cool if OO could have the compare command started from the
> command line so that the merge could be started by your subversion client.

Cheers,

Roger

--
Ninth Avenue Software
http://www.ninthave.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Diffing XML files or storing binary documents

Posted by Felix Collins <fe...@keyghost.com>.
Chris Jensen wrote:

  > I doubt it matters how good subversions diffing is, correct me if I'm
> wrong, but because the nature of compression, a small change in the 

In general yes I agree.  But it depends on how the compression is 
implemented.  If it has a block structure for instance then you may just 
change one block. Actually this has been discussed on this list before 
without any conclusion as I recall.

You are correct though that it doesn't matter how good subversions diff is.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Diffing XML files or storing binary documents

Posted by Chris Jensen <cj...@edex.com.au>.
> Just how much benefit would have to be 
> tested as apparently Subversion's diffing is pretty good and is binary 
> for all files anyway.

I doubt it matters how good subversions diffing is, correct me if I'm 
wrong, but because the nature of compression, a small change in the 
document could potentially result in the entire compressed file being 
changed, so your almost always going to be storing a completely new file 
into the repository.

-- 
---------------------------------------------------------------------
Chris Jensen cjensen@edex.com.au

Educational Experience (Australia)
Postal Address: PO Box 860, Newcastle NSW 2300
Freecall:       1-800-025 270      International: +61-2-4923 8222
Fax:            (02) 4942 1991     International: +61-2-4942 1991

Visit our online Toy store! http://www.toysandmore.com.au/
---------------------------------------------------------------------

Re: Diffing XML files or storing binary documents

Posted by Felix Collins <fe...@keyghost.com>.
Chris Jensen wrote:

>> Yep we store open office files in subversion.  We just check in the 
>> *.sx* file and hope for the best.
> 
> 
> What sort of content is there to the docs? Do you see much repository 
> bloat with this?
> Some of the files we'll be storing are quite graphic heavy (over 20 
> screen shots), I would;ve if these are just zipped up with the rest of 
> the document in the sxc, then they're going to go into subversion every 
> time, and I would've thought this would bloat the repository pretty 
> quickly.
> 

We have some drawings etc.  I agree that opening up the oo file would 
gain you compression benefits.  Just how much benefit would have to be 
tested as apparently Subversion's diffing is pretty good and is binary 
for all files anyway.  We haven't noticed any great growth in the 
repository but then it depends on how often you commit too.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Diffing XML files or storing binary documents

Posted by Felix Collins <fe...@keyghost.com>.
Chris Jensen wrote:

> Hi,
> Has anyone else here had experience with trying to store documentation 
> into subversion? Specifically I'm looking at OpenOffice files.

Yep we store open office files in subversion.  We just check in the 
*.sx* file and hope for the best.  The only tricky thing is when you 
have a conflict.  You have to open one of the files in OO and select the 
  compare documents menu option.  The merging interface of OO is a 
little tricky to follow so stay alert when merging.

It would be cool if OO could have the compare command started from the 
command line so that the merge could be started by your subversion client.

Regards,
Felix

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org