You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by David Brown <op...@davidb.org> on 2002/08/14 16:20:44 UTC

Re: [opencm-dev] Re: [A few SCM lists] Diff/Comparison of file formats others than ASCII/source code?

On Wed, Aug 14, 2002 at 09:07:53AM +0200, Alessandro Bottoni wrote:

> Strangely enough, there are very few of such "file-format-specific" 
> Diff/Merge tools around. This is strange because it is clear that such tools 
> could have a huge market. Just think to how many companies have large 
> repositories of CAD drawings, RTF (or, worse, MS Word) documents and HTML 
> files (that is: web sites). A RCS tool that was able to manage such file 
> formats would be of great help for a lot of people.
> 
> I hope that some developer of the list will think over this market 
> niche (even if as a commercial, not open source, one).

Part of the reason may just be the definition of "diference".  For text
files, it is simple to define a difference.  Break the file into lines,
and report which lines are different.  Most of the additional features
of modern text diff tools has to do with how to present the information.

Other file formats cause the issue to become significantly more
complicated.  Take MS-word for example.  Word itself will gladly show
you document differences between two documents where small changes have
been made to the contents of text.  But what if one document has had
significant formatting changes (change the paragraph style).  How would
you do a merge if one branch changed the formatting, and the other
branch changed the text, or maybe a different aspect of the formatting.

CAD drawings are even more difficult.  It may not be difficult to
determine what has changed, but how do you represent that change.  A
diff tool is plausible (show both drawings, in different colors, for
example), but a merge is even more difficult.

Dave Brown

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [opencm-dev] Re: [A few SCM lists] Diff/Comparison of file formats others than ASCII/source code?

Posted by Noel Yap <ya...@yahoo.com>.
--- David Brown <op...@davidb.org> wrote:
> Other file formats cause the issue to become
> significantly more
> complicated.  Take MS-word for example.  Word itself
> will gladly show
> you document differences between two documents where
> small changes have
> been made to the contents of text.  But what if one
> document has had
> significant formatting changes (change the paragraph
> style).  How would
> you do a merge if one branch changed the formatting,
> and the other
> branch changed the text, or maybe a different aspect
> of the formatting.

This would really depend on how the info is stored. 
It sounds like there's a viable algorithm for
tree-based representations so if Word stored it's data
in trees (isn't MS moving towards XML
representations?), one would exist for Word.  Doesn't
ClearCase have a diffing algorithm for Word and other
MS docs.

> CAD drawings are even more difficult.  It may not be
> difficult to
> determine what has changed, but how do you represent
> that change.  A
> diff tool is plausible (show both drawings, in
> different colors, for
> example), but a merge is even more difficult.

I partially agree.  The real question is, "What does
'merge' mean for images?"

Technically, if a diffing algorithm exists, a merge
algorithm can also exist (eg do something similar to
diff3).  The difficulty is whether the merge algorithm
conforms to the human interpretation of merging for
the particular item.

MTC,
Noel

__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [A few SCM lists] Diff/Comparison of file formats others than ASCII/source code?

Posted by Michael Poole <po...@troilus.org>.
Alan Langford <ja...@ambitonline.com> writes:

> I've been pondering this and it's even more fun than that. A good
> number of Word documents contain drawn graphics,  embedded images,
> embedded spreadsheets, embedded Visio diagrams... you name it. Any
> diff utility that understands Word needs to be able to accommodate
> plug-ins that understand most major types of embedded object. It seems
> that implementing anything close to a comprehensive Word diff is
> prohibitively complex.

That is a poor reason to not try it.  It is much more useful to say
"This sub-object is being treated as an opaque blob, and differs
between these revisions" than to say "We cannot handle some cases of
this file type, so we won't try at all."  You can have the same
problem in any compound document.  Although I don't use Word much, the
number of cases for which it *will* work seems to justify the effort.

Michael Poole

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [opencm-dev] Re: [A few SCM lists] Diff/Comparison of file formats others than ASCII/source code?

Posted by Alan Langford <ja...@ambitonline.com>.
At 2002/08/14 09:20 -0700, David Brown wrote:
>Other file formats cause the issue to become significantly more
>complicated.  Take MS-word for example.  Word itself will gladly show
>you document differences between two documents where small changes have
>been made to the contents of text.  But what if one document has had
>significant formatting changes (change the paragraph style).  How would
>you do a merge if one branch changed the formatting, and the other
>branch changed the text, or maybe a different aspect of the formatting.

I've been pondering this and it's even more fun than that. A good number of 
Word documents contain drawn graphics,  embedded images, embedded 
spreadsheets, embedded Visio diagrams... you name it. Any diff utility that 
understands Word needs to be able to accommodate plug-ins that understand 
most major types of embedded object. It seems that implementing anything 
close to a comprehensive Word diff is prohibitively complex.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org