You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Kevin Pilch-Bisson <ke...@pilch-bisson.net> on 2002/05/15 20:59:12 UTC

Problem invoking diff3.

I recently bootstrapped svn onto a Solaris machine which I have non-root
access to.  I happily compiled and installed all of the utils I needed,
including diffutils.

Then I got the svn tarball, bootstrapped to HEAD (1954 at the time), and
compiled again.  Everything up till here went great.  The problem came when I
ran make check.

I got a whole bunch of failed tests, which according to the log seemed to be
related to diff/diff3.

After much frustration, I discovered something strange about the way diff3
works.  It doesn't automatically invoke the diff from the same diffutils
package.  Instead it runs the first diff it finds in PATH if PATH is set,
otherwise, it checks a some hardcoded locations.  The first two are
/usr/ccs/bin/diff and /usr/bin/diff.  This is all I know about, since
/usr/bin/diff exists and is Solaris diff, which doesn't understand the
arguments that diff3 tries to pass it.

I fixed this for that particular machine by changing the inherit_environment
flag in svn_io_run_diff3's call to svn_io_run_cmd to TRUE, but I don't think
this is the best solution.

Note that this could also bite FreeBSD users who will end up with diff3
running the hacked BSD version of diff instead of the gdiff found by
configure.

Anyone have an idea as to what a good solution is?
-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Kevin Pilch-Bisson                    http://www.pilch-bisson.net
     "Historically speaking, the presences of wheels in Unix
     has never precluded their reinvention." - Larry Wall
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Re: Problem invoking diff3.

Posted by "Glenn A. Thompson" <gt...@cdr.net>.
Hey

Bill Tutt wrote:

> It certainly would help out certain cases involving differencing
> possibly small changes to very large binary files, but there's still a
> very important problem with librsysnc: it uses the LGPL. License
> incompatibility is so annoying.

I read somewhere in their page about contacting them if you had licensing
issues.
Maybe they would change the licensing.

gat

> Thankfully, most of the logic for rsync
> is specified in Tridge's Ph.D. which is just ideas that are in the
> public domain, and not a specific code implementation.
>
> Bill
> ----
> Do you want a dangerous fugitive staying in your flat?
> No.
> Well, don't upset him and he'll be a nice fugitive staying in your flat.
>
>
> > -----Original Message-----
> > From: Glenn A. Thompson [mailto:gthompson@cdr.net]
> > Sent: Thursday, May 16, 2002 3:12 PM
> > To: Subversion Dev list
> > Subject: Re: Problem invoking diff3.
> >
> > Hey:
> >
> > I don't know anything about delta generation.  But I was curious.
> Have
> > you guys looked at:
> >
> > http://rproxy.samba.org/doxygen/librsync
> >
> > Heck, subversion could be way ahead of this method and I wouldn't know
> it.
> >
> > gat
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> > For additional commands, e-mail: dev-help@subversion.tigris.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Problem invoking diff3.

Posted by "Glenn A. Thompson" <gt...@cdr.net>.
Whoops.

I searched the archive.  Should have done that before posting.  Sorry.

gat


Bill Tutt wrote:

> It certainly would help out certain cases involving differencing
> possibly small changes to very large binary files, but there's still a
> very important problem with librsysnc: it uses the LGPL. License
> incompatibility is so annoying. Thankfully, most of the logic for rsync
> is specified in Tridge's Ph.D. which is just ideas that are in the
> public domain, and not a specific code implementation.
>
> Bill
> ----
> Do you want a dangerous fugitive staying in your flat?
> No.
> Well, don't upset him and he'll be a nice fugitive staying in your flat.
>
>
> > -----Original Message-----
> > From: Glenn A. Thompson [mailto:gthompson@cdr.net]
> > Sent: Thursday, May 16, 2002 3:12 PM
> > To: Subversion Dev list
> > Subject: Re: Problem invoking diff3.
> >
> > Hey:
> >
> > I don't know anything about delta generation.  But I was curious.
> Have
> > you guys looked at:
> >
> > http://rproxy.samba.org/doxygen/librsync
> >
> > Heck, subversion could be way ahead of this method and I wouldn't know
> it.
> >
> > gat
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> > For additional commands, e-mail: dev-help@subversion.tigris.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: Re: Problem invoking diff3.

Posted by Bill Tutt <ra...@lyra.org>.
It certainly would help out certain cases involving differencing
possibly small changes to very large binary files, but there's still a
very important problem with librsysnc: it uses the LGPL. License
incompatibility is so annoying. Thankfully, most of the logic for rsync
is specified in Tridge's Ph.D. which is just ideas that are in the
public domain, and not a specific code implementation.

Bill
----
Do you want a dangerous fugitive staying in your flat?
No.
Well, don't upset him and he'll be a nice fugitive staying in your flat.
 

> -----Original Message-----
> From: Glenn A. Thompson [mailto:gthompson@cdr.net]
> Sent: Thursday, May 16, 2002 3:12 PM
> To: Subversion Dev list
> Subject: Re: Problem invoking diff3.
> 
> Hey:
> 
> I don't know anything about delta generation.  But I was curious.
Have
> you guys looked at:
> 
> http://rproxy.samba.org/doxygen/librsync
> 
> Heck, subversion could be way ahead of this method and I wouldn't know
it.
> 
> gat
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Problem invoking diff3.

Posted by "Glenn A. Thompson" <gt...@cdr.net>.
Hey:

I don't know anything about delta generation.  But I was curious.  Have
you guys looked at:

http://rproxy.samba.org/doxygen/librsync

Heck, subversion could be way ahead of this method and I wouldn't know it.

gat



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Problem invoking diff3.

Posted by Philip Martin <ph...@codematters.co.uk>.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: Diff code overhaul, WAS: RE: Problem invoking diff3.

Posted by Sander Striker <st...@apache.org>.
> From: Philip Martin [mailto:pm@localhost]On Behalf Of Philip Martin
> Sent: 17 May 2002 20:59

> "Sander Striker" <st...@apache.org> writes:
>>> While the token/vtable interface in svn_diff.h provides a generic diff
>>> capability, I think it may be a performance limitation. My original
>>> C++ code was template based, the "function" interface was non-virtual
>>> and could be inlined. It will be interesting to see if the algorithm
>>> used by Sander's code is similarly affected by this interface.
>> 
>> It could benefit a speedup by inlining it, although I don't know how
>> much.  That's something I will try.  We might need an API change there,
>> although I most certainly hope we can keep it generic.
> 
> The speedup obviously depends on how many token lookups and
> comparisons the algorithm requires.  It's important to remember just
> how many that can be for 'difficult' files, it's lots and lots.  The
> current interface requires a function call to get each token, and a
> function call to compare them.  My O(NP) implementation did something
> similar, removing the function calls more than doubled the speed.  The
> algorithm can't really cache the tokens, that more or less defeats the
> purpose of providing the function interface in the first place.

The BV-HS function is single pass, top-down, so no token is required more
than once.  With a small trick we can even do merging without rereading.
However, other algorithms (like O(NP)) need random access to both files.
It all comes down to which algorithm we end up using.
 
> I worry that we have an overly generic interface, that will only be
> used for one specific problem.  Producing conventional diff output
> pretty much requires a line based diff. It is unlikely that the tokens
> will ever be anything other than a variable length string of
> characters.  I guess if one is comparing digital images, say, then the
> tokens might be different.  However, that would use a totally
> different algorithm so the ability to support such tokens isn't
> terribly useful.

In subversion that is pretty much true.  For a visual diff/merge tool
I guess it holds aswell.  If we gain much performance by moving to a
less generic API, we'll move to that.

>>> It may be that Subversion needs different algorithms depending on the
>>> size of the file. A minimal match algorithm for small/medium files to
>>> reduce the number of conflicts, and a non-minimal algorithm for large
>>> files to get better speed/memory performance.
>> 
>> Yes, that may be the case.  I received a book I ordered this week:
>> "Algorithms on strings, trees, and sequences" by Dan Gusfield
>> (ISBN 0-521-58519-8).
>> 
>> This is really enlighting me in what (should) work and what not.  Both
>> our selected algorithms are suboptimal in speed, judging from what I
>> read.
>> 
>> I've spent a reasonable amount of time in researching what we need for the
>> diff lib and like to continue doing some more research for a little while.
>> Like I said, next week has some time allocated for this.
> 
> Fine, I'm not trying to tread on your toes :)

Don't worry, you didn't :)

> The only reason I looked at my code again was because you asked me about
> performance.  I then only posted it when I realised that it was a) so close
> to GNU diff in performance

For the --minimal case.  I want to see if we can get their performance all
the time.  Not a showstopper though, if it works at a reasonable speed I'll
be satisfied for alpha.

> and b) only a few hundred lines to provide diff and merge.
>
> It's quite possible there are better algorithms, I'm happy that you
> are working on it and I hope you come up with a solution.

Me too! :)
 
> Philip

Sander

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Diff code overhaul, WAS: RE: Problem invoking diff3.

Posted by Philip Martin <ph...@codematters.co.uk>.
"Sander Striker" <st...@apache.org> writes:

> > While the token/vtable interface in svn_diff.h provides a generic diff
> > capability, I think it may be a performance limitation. My original
> > C++ code was template based, the "function" interface was non-virtual
> > and could be inlined. It will be interesting to see if the algorithm
> > used by Sander's code is similarly affected by this interface.
> 
> It could benefit a speedup by inlining it, although I don't know how
> much.  That's something I will try.  We might need an API change there,
> although I most certainly hope we can keep it generic.

The speedup obviously depends on how many token lookups and
comparisons the algorithm requires.  It's important to remember just
how many that can be for 'difficult' files, it's lots and lots.  The
current interface requires a function call to get each token, and a
function call to compare them.  My O(NP) implementation did something
similar, removing the function calls more than doubled the speed.  The
algorithm can't really cache the tokens, that more or less defeats the
purpose of providing the function interface in the first place.

I worry that we have an overly generic interface, that will only be
used for one specific problem.  Producing conventional diff output
pretty much requires a line based diff. It is unlikely that the tokens
will ever be anything other than a variable length string of
characters.  I guess if one is comparing digital images, say, then the
tokens might be different.  However, that would use a totally
different algorithm so the ability to support such tokens isn't
terribly useful.

> 
> > It may be that Subversion needs different algorithms depending on the
> > size of the file. A minimal match algorithm for small/medium files to
> > reduce the number of conflicts, and a non-minimal algorithm for large
> > files to get better speed/memory performance.
> 
> Yes, that may be the case.  I received a book I ordered this week:
> "Algorithms on strings, trees, and sequences" by Dan Gusfield
> (ISBN 0-521-58519-8).
> 
> This is really enlighting me in what (should) work and what not.  Both
> our selected algorithms are suboptimal in speed, judging from what I
> read.
> 
> I've spent a reasonable amount of time in researching what we need for the
> diff lib and like to continue doing some more research for a little while.
> Like I said, next week has some time allocated for this.

Fine, I'm not trying to tread on your toes :)  The only reason I looked
at my code again was because you asked me about performance.  I then
only posted it when I realised that it was a) so close to GNU diff in
performance and b) only a few hundred lines to provide diff and merge.
It's quite possible there are better algorithms, I'm happy that you
are working on it and I hope you come up with a solution.

-- 
Philip

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Diff code overhaul, WAS: RE: Problem invoking diff3.

Posted by Sander Striker <st...@apache.org>.
> From: Philip Martin [mailto:pm@localhost]On Behalf Of Philip Martin
> Sent: 16 May 2002 14:24

> Philip Martin <ph...@codematters.co.uk> writes:
> 
> Eek!  I don't know what went wrong, but the body of my last mail
> appears to have gone walkies :-(  Here is what I meant to say...

*grin*
 
> "Sander Striker" <st...@apache.org> writes:
> 
>> I will have time somewhere next week to continue work on it.  One
>> of the things I want to try is to implement the algorithm described
>> in "An O(NP) Sequence Comparison Algorithm" by Wu, Manber, Myers and
>> Miller.
> 
> I have some old C++ code that implements O(NP). I was porting this to
> Subversion when Sander's code appeared. He and I discussed our
> approaches and I decided to stop work on my code.  Sander contacted me
> yesterday, asking about the performance of O(NP). This prompted me to
> look out my old code and test it against GNU diff.  At first my code
> used a vtable function based approach to tokens, a bit like the one in
> svn_diff.h.  However that proved to be a performance killer, the
> algorithm has a tight loop comparing tokens and the function call
> overhead is significant.  Yesterday I stripped out the token
> interface, moving to an array interface (which is also more like GNU
> diff) and added some simple line hashing. It now runs almost as fast
> as GNU diff --minimal, its about 5% slower.

I want to see if my BV-HS implementation will benefit from some optimizations
I have in mind.
 
> While the token/vtable interface in svn_diff.h provides a generic diff
> capability, I think it may be a performance limitation. My original
> C++ code was template based, the "function" interface was non-virtual
> and could be inlined. It will be interesting to see if the algorithm
> used by Sander's code is similarly affected by this interface.

It could benefit a speedup by inlining it, although I don't know how
much.  That's something I will try.  We might need an API change there,
although I most certainly hope we can keep it generic.

> It may be that Subversion needs different algorithms depending on the
> size of the file. A minimal match algorithm for small/medium files to
> reduce the number of conflicts, and a non-minimal algorithm for large
> files to get better speed/memory performance.

Yes, that may be the case.  I received a book I ordered this week:
"Algorithms on strings, trees, and sequences" by Dan Gusfield
(ISBN 0-521-58519-8).

This is really enlighting me in what (should) work and what not.  Both
our selected algorithms are suboptimal in speed, judging from what I
read.

I've spent a reasonable amount of time in researching what we need for the
diff lib and like to continue doing some more research for a little while.
Like I said, next week has some time allocated for this.

In the mean time I'll commit a fix for a bug in the current codebase that
I stumbled over.  That way we still have something that works, albeit on
the slow side.

Sander

PS.  For the impatient and interested, these papers deserve some attention:
     
     "A sublinear algorithm for approximate keyword searching." by E.W. Meyers
     Algorithmica, 12:345-74, 1994

     "Algorithmic advances for searching biosequence databases" by E. Meyers
     Computational Methods in Genome Research, 121-35, 1994.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Problem invoking diff3.

Posted by Philip Martin <ph...@codematters.co.uk>.
Philip Martin <ph...@codematters.co.uk> writes:

Eek!  I don't know what went wrong, but the body of my last mail
appears to have gone walkies :-(  Here is what I meant to say...

"Sander Striker" <st...@apache.org> writes:

> I will have time somewhere next week to continue work on it.  One
> of the things I want to try is to implement the algorithm described
> in "An O(NP) Sequence Comparison Algorithm" by Wu, Manber, Myers and
> Miller.

I have some old C++ code that implements O(NP). I was porting this to
Subversion when Sander's code appeared. He and I discussed our
approaches and I decided to stop work on my code.  Sander contacted me
yesterday, asking about the performance of O(NP). This prompted me to
look out my old code and test it against GNU diff.  At first my code
used a vtable function based approach to tokens, a bit like the one in
svn_diff.h.  However that proved to be a performance killer, the
algorithm has a tight loop comparing tokens and the function call
overhead is significant.  Yesterday I stripped out the token
interface, moving to an array interface (which is also more like GNU
diff) and added some simple line hashing. It now runs almost as fast
as GNU diff --minimal, its about 5% slower.

While the token/vtable interface in svn_diff.h provides a generic diff
capability, I think it may be a performance limitation. My original
C++ code was template based, the "function" interface was non-virtual
and could be inlined. It will be interesting to see if the algorithm
used by Sander's code is similarly affected by this interface.

It may be that Subversion needs different algorithms depending on the
size of the file. A minimal match algorithm for small/medium files to
reduce the number of conflicts, and a non-minimal algorithm for large
files to get better speed/memory performance.

Anyway here's my code(*). It contains the "library" code implementing
svn_diff/svn_merge, and a command line test harness.  It's certainly
not fully tested, but here's the performance on some "hard" files,
ones with 10000 lines chosen from a set of 500 different lines, i.e.,
lots of matching lines.

% time diff --minimal foo bar > /dev/null

real    0m1.894s
user    0m1.890s
sys     0m0.010s

% time ./svn-diff foo bar > /dev/null 

real    0m1.971s
user    0m1.970s
sys     0m0.020s

-- 
Philip

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Problem invoking diff3.

Posted by Philip Martin <ph...@codematters.co.uk>.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: Problem invoking diff3.

Posted by Sander Striker <st...@apache.org>.
> From: sussman@collab.net [mailto:sussman@collab.net]
> Sent: 16 May 2002 00:23

> Garrett Rooney <ro...@electricjellyfish.net> writes:
> 
> > seriously, i think this is a hell of a good reason to start putting
> > more work into sander's internal diff library.  being dependent on
> > external tools like this is an inherantly problematic arrangement.
> 
> The Collabnet folks (me, kfogel, gstein, cmpilato) don't really have
> time to spend on making Sander's code work, but I would be *thrilled*
> if someone else got it up to speed... preferably sooner rather than
> later, because Alpha is approaching soon, and it would be nice to have
> a long testing period.  The API is fully documented in svn_diff.h.
> 
> Here's the todo list:
> 
>  1. Sander already wrote an output-vtable that produces unified diff
>     between two sources.  Someone needs to write an output-vtable to
>     produce a 'merged' file with conflict markers when doing a 3 way
>     diff.  Sander and I have already discussed how to do it;  the
>     interface is clear.  It's just a matter of someone spending a day
>     or two writing it.

The merge table can be written in a few hours.  That's not really a
big problem.
 
>  2. Need to wrap svn_io_run_diff[3] functions around the new
>     svn_diff.h API.  Pretty easy.

*nod*

>  3. Need one hell of a test suite for the library.
> 
> Number 3 is the Big Problem.  Even Sander himself admits that there
> are bugs in RAM consumption in his algorithms.  They need to be
> optimized and tested to *death*.

I have some (a _lot_) of local changes because the time spent in
the code was unacceptable.  Philip pointed out to me that running
diff with the --minimal option brings the times of diff more into
the range for comparison.

My local changes consist of the following:

 - reimplementation of the LCS algorithm, doing BV-HS instead of
   HS.  This algorithm is described in "Speeding-up Hirschberg and
   Hunt-Szymanski LCS Algorithms" by Crochemore, Iliopoulus and Pinzon.

 - use of a red-black tree for token/position storage.

 - removal of the svn_diff__hat support functions.

 - reimplementation of diff_file.c such that the file is read into
   memory and lines are compared in a more direct fashion.  md5 was
   overkill and way too time consuming.

 - breakage of diff3... :(  [needs some work to get it in operable
   state again]

> Maybe someone will get inspired here... I hate having external
> dependencies, especially on Win32.

I will have time somewhere next week to continue work on it.  One
of the things I want to try is to implement the algorithm described
in "An O(NP) Sequence Comparison Algorithm" by Wu, Manber, Myers and
Miller.

Sander


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Problem invoking diff3.

Posted by Ben Collins-Sussman <su...@collab.net>.
Garrett Rooney <ro...@electricjellyfish.net> writes:

> seriously, i think this is a hell of a good reason to start putting
> more work into sander's internal diff library.  being dependent on
> external tools like this is an inherantly problematic arrangement.

The Collabnet folks (me, kfogel, gstein, cmpilato) don't really have
time to spend on making Sander's code work, but I would be *thrilled*
if someone else got it up to speed... preferably sooner rather than
later, because Alpha is approaching soon, and it would be nice to have
a long testing period.  The API is fully documented in svn_diff.h.

Here's the todo list:

 1. Sander already wrote an output-vtable that produces unified diff
    between two sources.  Someone needs to write an output-vtable to
    produce a 'merged' file with conflict markers when doing a 3 way
    diff.  Sander and I have already discussed how to do it;  the
    interface is clear.  It's just a matter of someone spending a day
    or two writing it.

 2. Need to wrap svn_io_run_diff[3] functions around the new
    svn_diff.h API.  Pretty easy.

 3. Need one hell of a test suite for the library.

Number 3 is the Big Problem.  Even Sander himself admits that there
are bugs in RAM consumption in his algorithms.  They need to be
optimized and tested to *death*.

Maybe someone will get inspired here... I hate having external
dependencies, especially on Win32.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Problem invoking diff3.

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On Wed, May 15, 2002 at 10:59:43PM +0100, Philip Martin wrote:
 
> Yuck indeed.  Debian still ships 2.7 and RedHat appears to ship 2.7.2,
> so the obvious move of simply requiring the latest version would not
> be popular in Linux land :-(

oh sure, it's fine to require a specific kind of diff that FreeBSD
people have to go out and install, but if the Linux people have to,
it's a tragedy ;-)

seriously, i think this is a hell of a good reason to start putting
more work into sander's internal diff library.  being dependent on
external tools like this is an inherantly problematic arrangement.

-garrett

-- 
garrett rooney                    Remember, any design flaw you're 
rooneg@electricjellyfish.net      sufficiently snide about becomes  
http://electricjellyfish.net/     a feature.       -- Dan Sugalski

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Problem invoking diff3.

Posted by Philip Martin <ph...@codematters.co.uk>.
Kevin Pilch-Bisson <ke...@pilch-bisson.net> writes:

> Actually, before making the fix I just mentioned I decided to investigate the
> matter a little more.
> 
> ftp.gnu.org has three versions of diffutils available: 2.7, 2.8, and 2.8.1.  
> 
> 2.7 behaves as you describe, 2.8 and 2.8.1 do as I describe, but have a
> --diff-program=/path/to/diff argument which we could use.  However, this means
> we need to check the version of diff3 in configure.  Yuck :(

Yuck indeed.  Debian still ships 2.7 and RedHat appears to ship 2.7.2,
so the obvious move of simply requiring the latest version would not
be popular in Linux land :-(

-- 
Philip

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Problem invoking diff3.

Posted by Kevin Pilch-Bisson <ke...@pilch-bisson.net>.
On Wed, May 15, 2002 at 10:31:44PM +0100, Philip Martin wrote:
> Kevin Pilch-Bisson <ke...@pilch-bisson.net> writes:
> 
> > After much frustration, I discovered something strange about the way diff3
> > works.  It doesn't automatically invoke the diff from the same diffutils
> > package.  Instead it runs the first diff it finds in PATH if PATH is set,
> > otherwise, it checks a some hardcoded locations.  The first two are
> > /usr/ccs/bin/diff and /usr/bin/diff.  This is all I know about, since
> > /usr/bin/diff exists and is Solaris diff, which doesn't understand the
> > arguments that diff3 tries to pass it.
> 
> This sounds like a broken diff3 installation.  The code for GNU diff3
> calls execve() with an absolute path to diff.  When I build it locally
> this is what I see

Actually, before making the fix I just mentioned I decided to investigate the
matter a little more.

ftp.gnu.org has three versions of diffutils available: 2.7, 2.8, and 2.8.1.  

2.7 behaves as you describe, 2.8 and 2.8.1 do as I describe, but have a
--diff-program=/path/to/diff argument which we could use.  However, this means
we need to check the version of diff3 in configure.  Yuck :(
> 
> Breakpoint 1, read_diff (filea=0xbffffcd8 "b", fileb=0xbffffcda "c", 
>     output_placement=0xbffffa68) at ./diff3.c:1151
> 1151      ap = argv;
> (gdb) n
> 1152      *ap++ = diff_program;
> (gdb) 
> 1153      if (always_text)
> (gdb) 
> 1155      sprintf (horizon_arg, "--horizon-lines=%d", horizon_lines);
> (gdb) 
> 1156      *ap++ = horizon_arg;
> (gdb) 
> 1157      *ap++ = "--";
> (gdb) 
> 1158      *ap++ = filea;
> (gdb) 
> 1159      *ap++ = fileb;
> (gdb) 
> 1160      *ap = 0;
> (gdb) 
> 1162      if (pipe (fds) != 0)
> (gdb) p argv[0]
> $1 = 0x804bc80 "/usr/local/bin/diff"
> (gdb) p argv[1]
> $2 = 0xbffff974 "--horizon-lines=10"
> (gdb) p argv[2]
> $3 = 0x804c4c4 "--"
> (gdb) p argv[3]
> $4 = 0xbffffcd8 "b"
> (gdb) p argv[4]
> $5 = 0xbffffcda "c"
> 
> The path is a -D define in the Makefile:
> 
> $ grep -u3 DIFF_PROGRAM Makefile
> infodir = $(prefix)/info
> 
> DEFAULT_EDITOR_PROGRAM = ed
> DIFF_PROGRAM = $(bindir)/`echo diff | $(edit_program_name)`
> NULL_DEVICE = /dev/null
> PR_PROGRAM = /usr/bin/pr
> 
> --
>         $(COMPILE) -DNULL_DEVICE=\"$(NULL_DEVICE)\" $(srcdir)/cmp.c
> 
> diff3.o: diff3.c
>         $(COMPILE) -DDIFF_PROGRAM=\"$(DIFF_PROGRAM)\" $(srcdir)/diff3.c
> 
> sdiff.o: sdiff.c
>         $(COMPILE) -DDEFAULT_EDITOR_PROGRAM=\"$(DEFAULT_EDITOR_PROGRAM)\" \
>                 -DDIFF_PROGRAM=\"$(DIFF_PROGRAM)\" $(srcdir)/sdiff.c
> 
> util.o: util.c
>         $(COMPILE) -DPR_PROGRAM=\"$(PR_PROGRAM)\" $(srcdir)/util.c
> 
> 
> which gives me the compile command:
> 
> gcc -c  -DHAVE_CONFIG_H -I. -I. -g -DDIFF_PROGRAM=\"/usr/local/bin/`echo diff | sed 's,x,x,'`\" ./diff3.c 
> 
> > 
> > I fixed this for that particular machine by changing the inherit_environment
> > flag in svn_io_run_diff3's call to svn_io_run_cmd to TRUE, but I don't think
> > this is the best solution.
> > 
> > Note that this could also bite FreeBSD users who will end up with diff3
> > running the hacked BSD version of diff instead of the gdiff found by
> > configure.
> > 
> > Anyone have an idea as to what a good solution is?
> 
> Fix your diff3 installation.
> 
> -- 
> Philip
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
> 

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Kevin Pilch-Bisson                    http://www.pilch-bisson.net
     "Historically speaking, the presences of wheels in Unix
     has never precluded their reinvention." - Larry Wall
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Re: Problem invoking diff3.

Posted by Philip Martin <ph...@codematters.co.uk>.
Kevin Pilch-Bisson <ke...@pilch-bisson.net> writes:

> After much frustration, I discovered something strange about the way diff3
> works.  It doesn't automatically invoke the diff from the same diffutils
> package.  Instead it runs the first diff it finds in PATH if PATH is set,
> otherwise, it checks a some hardcoded locations.  The first two are
> /usr/ccs/bin/diff and /usr/bin/diff.  This is all I know about, since
> /usr/bin/diff exists and is Solaris diff, which doesn't understand the
> arguments that diff3 tries to pass it.

This sounds like a broken diff3 installation.  The code for GNU diff3
calls execve() with an absolute path to diff.  When I build it locally
this is what I see

Breakpoint 1, read_diff (filea=0xbffffcd8 "b", fileb=0xbffffcda "c", 
    output_placement=0xbffffa68) at ./diff3.c:1151
1151      ap = argv;
(gdb) n
1152      *ap++ = diff_program;
(gdb) 
1153      if (always_text)
(gdb) 
1155      sprintf (horizon_arg, "--horizon-lines=%d", horizon_lines);
(gdb) 
1156      *ap++ = horizon_arg;
(gdb) 
1157      *ap++ = "--";
(gdb) 
1158      *ap++ = filea;
(gdb) 
1159      *ap++ = fileb;
(gdb) 
1160      *ap = 0;
(gdb) 
1162      if (pipe (fds) != 0)
(gdb) p argv[0]
$1 = 0x804bc80 "/usr/local/bin/diff"
(gdb) p argv[1]
$2 = 0xbffff974 "--horizon-lines=10"
(gdb) p argv[2]
$3 = 0x804c4c4 "--"
(gdb) p argv[3]
$4 = 0xbffffcd8 "b"
(gdb) p argv[4]
$5 = 0xbffffcda "c"

The path is a -D define in the Makefile:

$ grep -u3 DIFF_PROGRAM Makefile
infodir = $(prefix)/info

DEFAULT_EDITOR_PROGRAM = ed
DIFF_PROGRAM = $(bindir)/`echo diff | $(edit_program_name)`
NULL_DEVICE = /dev/null
PR_PROGRAM = /usr/bin/pr

--
        $(COMPILE) -DNULL_DEVICE=\"$(NULL_DEVICE)\" $(srcdir)/cmp.c

diff3.o: diff3.c
        $(COMPILE) -DDIFF_PROGRAM=\"$(DIFF_PROGRAM)\" $(srcdir)/diff3.c

sdiff.o: sdiff.c
        $(COMPILE) -DDEFAULT_EDITOR_PROGRAM=\"$(DEFAULT_EDITOR_PROGRAM)\" \
                -DDIFF_PROGRAM=\"$(DIFF_PROGRAM)\" $(srcdir)/sdiff.c

util.o: util.c
        $(COMPILE) -DPR_PROGRAM=\"$(PR_PROGRAM)\" $(srcdir)/util.c


which gives me the compile command:

gcc -c  -DHAVE_CONFIG_H -I. -I. -g -DDIFF_PROGRAM=\"/usr/local/bin/`echo diff | sed 's,x,x,'`\" ./diff3.c 

> 
> I fixed this for that particular machine by changing the inherit_environment
> flag in svn_io_run_diff3's call to svn_io_run_cmd to TRUE, but I don't think
> this is the best solution.
> 
> Note that this could also bite FreeBSD users who will end up with diff3
> running the hacked BSD version of diff instead of the gdiff found by
> configure.
> 
> Anyone have an idea as to what a good solution is?

Fix your diff3 installation.

-- 
Philip

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Problem invoking diff3.

Posted by Philip Martin <ph...@codematters.co.uk>.
Kevin Pilch-Bisson <ke...@pilch-bisson.net> writes:

> Nevermind, diff3 has an apparently undocumented argument called --diff-program
> which should do the trick.  Commit coming up shortly.

Careful, it's not merely undocumented...

% diff3 --diff-program /usr/bin/diff a b c
diff3: unrecognized option `--diff-program'
diff3: Try `diff3 --help' for more information.
% diff3 -v
diff3 - GNU diffutils version 2.7
% uname -a
Linux debian2 2.4.18-rc4 #1 SMP Mon May 13 19:30:13 BST 2002 i686 unknown


-- 
Philip

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Problem invoking diff3.

Posted by Kevin Pilch-Bisson <ke...@pilch-bisson.net>.
Nevermind, diff3 has an apparently undocumented argument called --diff-program
which should do the trick.  Commit coming up shortly.

On Wed, May 15, 2002 at 04:59:12PM -0400, Kevin Pilch-Bisson wrote:
> I recently bootstrapped svn onto a Solaris machine which I have non-root
> access to.  I happily compiled and installed all of the utils I needed,
> including diffutils.
> 
> Then I got the svn tarball, bootstrapped to HEAD (1954 at the time), and
> compiled again.  Everything up till here went great.  The problem came when I
> ran make check.
> 
> I got a whole bunch of failed tests, which according to the log seemed to be
> related to diff/diff3.
> 
> After much frustration, I discovered something strange about the way diff3
> works.  It doesn't automatically invoke the diff from the same diffutils
> package.  Instead it runs the first diff it finds in PATH if PATH is set,
> otherwise, it checks a some hardcoded locations.  The first two are
> /usr/ccs/bin/diff and /usr/bin/diff.  This is all I know about, since
> /usr/bin/diff exists and is Solaris diff, which doesn't understand the
> arguments that diff3 tries to pass it.
> 
> I fixed this for that particular machine by changing the inherit_environment
> flag in svn_io_run_diff3's call to svn_io_run_cmd to TRUE, but I don't think
> this is the best solution.
> 
> Note that this could also bite FreeBSD users who will end up with diff3
> running the hacked BSD version of diff instead of the gdiff found by
> configure.
> 
> Anyone have an idea as to what a good solution is?
> -- 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Kevin Pilch-Bisson                    http://www.pilch-bisson.net
>      "Historically speaking, the presences of wheels in Unix
>      has never precluded their reinvention." - Larry Wall
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Kevin Pilch-Bisson                    http://www.pilch-bisson.net
     "Historically speaking, the presences of wheels in Unix
     has never precluded their reinvention." - Larry Wall
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~