You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Daniel Shahaf <d....@daniel.shahaf.name> on 2011/02/04 07:56:19 UTC
Re: diff4-optimization-bytes
Johan Corveleyn wrote on Mon, Jan 31, 2011 at 02:47:50 +0100:
> On Fri, Jan 28, 2011 at 7:58 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> > Johan Corveleyn wrote on Fri, Jan 28, 2011 at 14:04:07 +0100:
> >> On Fri, Jan 28, 2011 at 9:29 AM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> >> > May I suggest that, if this code is to be released, then you validate
> >> > its correctnss? �For example, a minimal regression test that is written
> >> > to record trunk's pre-branch behaviour might be sufficient.
> >>
> >> ... I'll accept your suggestion. I'll try to take some time for that
> >> this weekend. If anyone beats me to it, that would be fine as well of
> >> course :).
> >>
>
> Sorry, didn't get around to it yet. Maybe sometime during next week.
>
> > Thanks :-). �I've looked into it, but it seems the output functions in
> > svn_diff.h are oriented to diff2/diff3 only (they don't have an
> > 'ancestor' enum or parameters); and in a quick test, I couldn't get
> > tools/diff/diff4 to output anything sensible:
> >
> > % for i in 1 2 3 4; do echo $i>$i; done
> > % ./tools/diff/diff4 1 2 3 4
> > 3
> > %
>
> I think it's more or less sensible, although I'm not completely sure.
> If you look at the "usage" of diff4, you see:
>
> Usage: /path/to/diff4 <mine> <older> <yours> <ancestor>
>
> I think what you did is: take the difference between 2 and 3 (older
> and yours), and apply that to 1 (mine), using 4 as the common
> ancestor. Apparently that yields "3" :-).
>
> Compare that with the use of diff3, with the files 1, 2 and 3:
>
> Usage: /path/to/diff3 <mine> <older> <yours>
>
> $ diff3 1 2 3
> <<<<<<< 1
> 1
> =======
> 3
> >>>>>>> 3
>
> Hmmm, that looks like a merge conflict, which seems logical (trying to
> apply the diff between 2 and 3, to the file 1 which doesn't contain a
> line '2').
>
Agreed.
> So now I also don't really understand why diff4 successfully merges
> it, given the ancestor 4. Maybe the reasoning is as follows, given
> that 4 is the common ancestor:
>
> - 1 has evolved out of 4 (so '1' is the mine's replacement for '4')
>
> - 2 has evolved out of 4 (so in the merge source, '2' is the
> replacement for '4')
>
> - So with "variance-adjusted patching", the diff between 2 and 3
> translates into: "replace whatever '4' was transformed into (in this
> case '1') into '3'". That would yield '3'.
>
> Not sure though.
>
> > How can we test diff4 then? �Do we have to add public diff4 APIs in
> > order to be able to test svn_diff_diff4_2()?
>
> Maybe we should come up with a better example. In the meantime, I'm
> trying to understand diff4 (reading kfogel's article at [1], as
> referenced in diff4.c).
>
That article was very helpful --- thanks for the pointer!
> > Daniel
> > (on the one hand I'd much prefer to test the API changes before
> > releasing them; on the other hand, adding API's related to an API
> > that no one (to our knowledge) uses seems odd)
>
> Yes. I think we should give it some effort to come up with some
> minimal testing. But if it takes too long, we should probably not let
> this be blocking....
>
I've went ahead and made a patch out of the second example in that
article. However, I admit that I got it to work by fiddling with the
order of the four parameters until the test passed :-)
Could you have a look? (attached)
Thanks,
Daniel
> Cheers,
> --
> Johan
>
> [1] http://svn.apache.org/repos/asf/subversion/trunk/notes/variance-adjusted-patching.html
Re: diff4-optimization-bytes
Posted by Johan Corveleyn <jc...@gmail.com>.
On Wed, Feb 9, 2011 at 10:20 AM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> In other news, I looked into the cause --- tried to make
> datasource_get_next_token() do one more loop in the place where
> currently it does 'return if at_start_of_suffix()' --- but that didn't
> fix the truncation...
Indeed, that won't fix it.
I think the best (cleanest, most robust, least special-cased) approach
is to leave the suffix stripping as it is (and let
datasource_get_next_token() stop when it encounters the suffix), but
to count the number of suffix lines while scanning them (like we do
for prefix). Then that suffix_lines result can be passed around, like
the prefix_lines, so it ends up in the call to lcs.c#svn_diff__lcs,
where a piece of "suffix_lcs" can be added to the lcs chain (just like
the prefix_lcs is prepended to it). If we get there, the rest will
work automagically :-).
The only drawback of this is a minor loss of performance of the
optimization, since we now have to count newlines while scanning the
suffix (i.e. need to compare every byte scanned). But I can live with
that :-). I'll see how much impact this has once I get it working.
I've done part of this locally (adapting the function parameters and
passing the value around), but still have to do the hard parts:
- actually counting the newlines in find_identical_suffix
- create and append the suffix_lcs in lcs.c#svn_diff__lcs.
It might take me a couple more days to finish that.
> In the meantime, I tweaked a test to make it XFail (r1068798). From
> a quick glance it seemed to be relevant, but in second thought I'll
> admit I didn't study the test thoroughly before making the patch.
Great, thanks. The fact that it's random is a bit annoying, but at
least it's something. I was thinking of writing a specific merge test
(or another one in diff-diff3-test.c) with say 100 lines of identical
suffix. But for now I don't have the time to do that. Maybe later ...
--
Johan
Re: diff4-optimization-bytes
Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
In other news, I looked into the cause --- tried to make
datasource_get_next_token() do one more loop in the place where
currently it does 'return if at_start_of_suffix()' --- but that didn't
fix the truncation...
In the meantime, I tweaked a test to make it XFail (r1068798). From
a quick glance it seemed to be relevant, but in second thought I'll
admit I didn't study the test thoroughly before making the patch.
Daniel
Daniel Shahaf wrote on Wed, Feb 09, 2011 at 11:10:43 +0200:
> Johan Corveleyn wrote on Wed, Feb 09, 2011 at 08:42:20 +0100:
> > On Wed, Feb 9, 2011 at 4:54 AM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> > > The experimental code is svn_diff_diff4_2(); AFAIK svn_diff_diff4() is
> > > as stable as ever.
> >
> > I have no objections on adding such an annotation. However, from where
> > I'm sitting, svn_diff_diff4() is/was just as under-exercised as
> > svn_diff_diff4_2(), i.e. no known callers, only one unit test (thanks
> > for adding that test, BTW). Of course it's into the codebase a lot
> > longer than *_2, but it has never had any core code calling it, and no
> > unit tests (so could have been broken by any number of commits after
> > its inception till now).
> >
>
> In other words, svn_diff_diff4() is as experimental as svn_diff_diff4_2().
> Agreed.
>
> IMO the issue boils down to representation: if we haven't tested a given
> piece of code (it has no callers and a surfacial unit test), then we
> shouldn't represent to API consumers otherwise.
>
> Daniel
> (and yes, I respect all the work you've been doing; my opinion of
> diff4_2() is orthogonal to that)
>
> > --
> > Johan
Re: diff4-optimization-bytes
Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Johan Corveleyn wrote on Wed, Feb 09, 2011 at 08:42:20 +0100:
> On Wed, Feb 9, 2011 at 4:54 AM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> > The experimental code is svn_diff_diff4_2(); AFAIK svn_diff_diff4() is
> > as stable as ever.
>
> I have no objections on adding such an annotation. However, from where
> I'm sitting, svn_diff_diff4() is/was just as under-exercised as
> svn_diff_diff4_2(), i.e. no known callers, only one unit test (thanks
> for adding that test, BTW). Of course it's into the codebase a lot
> longer than *_2, but it has never had any core code calling it, and no
> unit tests (so could have been broken by any number of commits after
> its inception till now).
>
In other words, svn_diff_diff4() is as experimental as svn_diff_diff4_2().
Agreed.
IMO the issue boils down to representation: if we haven't tested a given
piece of code (it has no callers and a surfacial unit test), then we
shouldn't represent to API consumers otherwise.
Daniel
(and yes, I respect all the work you've been doing; my opinion of
diff4_2() is orthogonal to that)
> --
> Johan
Re: diff4-optimization-bytes
Posted by Johan Corveleyn <jc...@gmail.com>.
On Wed, Feb 9, 2011 at 4:54 AM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> Hyrum K Wright wrote on Tue, Feb 08, 2011 at 21:47:12 -0600:
>> On Tue, Feb 8, 2011 at 9:26 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
>> > Johan Corveleyn wrote on Fri, Feb 04, 2011 at 13:20:29 +0100:
>> >> On Fri, Feb 4, 2011 at 7:56 AM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
>> >> > Could you have a look? (attached)
>> >>
>> >> Nice. It looks good to me (haven't tested it, just looked at the code;
>> >> I assume it passes with trunk?)
>> >>
>> >
>> > Thanks, yes, r1068749.
>> >
>> > While I'm here, in light of the truncation bug in diff3 earlier today,
>> > how about adding a warning to svn_diff_diff4_2() family of API's to the
>> > effect of "@warning This code is experimental"?
>>
>> How much more experimental is it than other recent code we've added to
>> trunk?
>
> s/experimental/under-exercised/
> (no known callers, only one unit test)
>
>> (And I hope that such an appellation would be only temporary;
>> experimental code close to a release is a Bad Thing. Johan is working
>> hard to fix whatever Badness there may be.)
>>
>
> The experimental code is svn_diff_diff4_2(); AFAIK svn_diff_diff4() is
> as stable as ever.
I have no objections on adding such an annotation. However, from where
I'm sitting, svn_diff_diff4() is/was just as under-exercised as
svn_diff_diff4_2(), i.e. no known callers, only one unit test (thanks
for adding that test, BTW). Of course it's into the codebase a lot
longer than *_2, but it has never had any core code calling it, and no
unit tests (so could have been broken by any number of commits after
its inception till now).
--
Johan
Re: diff4-optimization-bytes
Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Hyrum K Wright wrote on Tue, Feb 08, 2011 at 21:47:12 -0600:
> On Tue, Feb 8, 2011 at 9:26 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> > Johan Corveleyn wrote on Fri, Feb 04, 2011 at 13:20:29 +0100:
> >> On Fri, Feb 4, 2011 at 7:56 AM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> >> > Could you have a look? (attached)
> >>
> >> Nice. It looks good to me (haven't tested it, just looked at the code;
> >> I assume it passes with trunk?)
> >>
> >
> > Thanks, yes, r1068749.
> >
> > While I'm here, in light of the truncation bug in diff3 earlier today,
> > how about adding a warning to svn_diff_diff4_2() family of API's to the
> > effect of "@warning This code is experimental"?
>
> How much more experimental is it than other recent code we've added to
> trunk?
s/experimental/under-exercised/
(no known callers, only one unit test)
> (And I hope that such an appellation would be only temporary;
> experimental code close to a release is a Bad Thing. Johan is working
> hard to fix whatever Badness there may be.)
>
The experimental code is svn_diff_diff4_2(); AFAIK svn_diff_diff4() is
as stable as ever.
> -Hyrum
Re: diff4-optimization-bytes
Posted by Hyrum K Wright <hy...@hyrumwright.org>.
On Tue, Feb 8, 2011 at 9:26 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> Johan Corveleyn wrote on Fri, Feb 04, 2011 at 13:20:29 +0100:
>> On Fri, Feb 4, 2011 at 7:56 AM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
>> > Could you have a look? (attached)
>>
>> Nice. It looks good to me (haven't tested it, just looked at the code;
>> I assume it passes with trunk?)
>>
>
> Thanks, yes, r1068749.
>
> While I'm here, in light of the truncation bug in diff3 earlier today,
> how about adding a warning to svn_diff_diff4_2() family of API's to the
> effect of "@warning This code is experimental"?
How much more experimental is it than other recent code we've added to
trunk? (And I hope that such an appellation would be only temporary;
experimental code close to a release is a Bad Thing. Johan is working
hard to fix whatever Badness there may be.)
-Hyrum
Re: diff4-optimization-bytes
Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Johan Corveleyn wrote on Fri, Feb 04, 2011 at 13:20:29 +0100:
> On Fri, Feb 4, 2011 at 7:56 AM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> > Could you have a look? (attached)
>
> Nice. It looks good to me (haven't tested it, just looked at the code;
> I assume it passes with trunk?)
>
Thanks, yes, r1068749.
While I'm here, in light of the truncation bug in diff3 earlier today,
how about adding a warning to svn_diff_diff4_2() family of API's to the
effect of "@warning This code is experimental"?
> The order of the parameters is correct: tools/diff/diff4.c does
> exactly the same thing (switch the first and second argument), before
> it passes them to svn_diff_file_diff4.
Re: diff4-optimization-bytes
Posted by Johan Corveleyn <jc...@gmail.com>.
On Fri, Feb 4, 2011 at 7:56 AM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> Johan Corveleyn wrote on Mon, Jan 31, 2011 at 02:47:50 +0100:
>> On Fri, Jan 28, 2011 at 7:58 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
>> > Johan Corveleyn wrote on Fri, Jan 28, 2011 at 14:04:07 +0100:
>> >> On Fri, Jan 28, 2011 at 9:29 AM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
>> >> > May I suggest that, if this code is to be released, then you validate
>> >> > its correctnss? For example, a minimal regression test that is written
>> >> > to record trunk's pre-branch behaviour might be sufficient.
>> >>
>> >> ... I'll accept your suggestion. I'll try to take some time for that
>> >> this weekend. If anyone beats me to it, that would be fine as well of
>> >> course :).
>> >>
>>
>> Sorry, didn't get around to it yet. Maybe sometime during next week.
>>
>> > Thanks :-). I've looked into it, but it seems the output functions in
>> > svn_diff.h are oriented to diff2/diff3 only (they don't have an
>> > 'ancestor' enum or parameters); and in a quick test, I couldn't get
>> > tools/diff/diff4 to output anything sensible:
>> >
>> > % for i in 1 2 3 4; do echo $i>$i; done
>> > % ./tools/diff/diff4 1 2 3 4
>> > 3
>> > %
>>
>> I think it's more or less sensible, although I'm not completely sure.
>> If you look at the "usage" of diff4, you see:
>>
>> Usage: /path/to/diff4 <mine> <older> <yours> <ancestor>
>>
>> I think what you did is: take the difference between 2 and 3 (older
>> and yours), and apply that to 1 (mine), using 4 as the common
>> ancestor. Apparently that yields "3" :-).
>>
>> Compare that with the use of diff3, with the files 1, 2 and 3:
>>
>> Usage: /path/to/diff3 <mine> <older> <yours>
>>
>> $ diff3 1 2 3
>> <<<<<<< 1
>> 1
>> =======
>> 3
>> >>>>>>> 3
>>
>> Hmmm, that looks like a merge conflict, which seems logical (trying to
>> apply the diff between 2 and 3, to the file 1 which doesn't contain a
>> line '2').
>>
>
> Agreed.
>
>> So now I also don't really understand why diff4 successfully merges
>> it, given the ancestor 4. Maybe the reasoning is as follows, given
>> that 4 is the common ancestor:
>>
>> - 1 has evolved out of 4 (so '1' is the mine's replacement for '4')
>>
>> - 2 has evolved out of 4 (so in the merge source, '2' is the
>> replacement for '4')
>>
>> - So with "variance-adjusted patching", the diff between 2 and 3
>> translates into: "replace whatever '4' was transformed into (in this
>> case '1') into '3'". That would yield '3'.
>>
>> Not sure though.
>>
>> > How can we test diff4 then? Do we have to add public diff4 APIs in
>> > order to be able to test svn_diff_diff4_2()?
>>
>> Maybe we should come up with a better example. In the meantime, I'm
>> trying to understand diff4 (reading kfogel's article at [1], as
>> referenced in diff4.c).
>>
>
> That article was very helpful --- thanks for the pointer!
>
>> > Daniel
>> > (on the one hand I'd much prefer to test the API changes before
>> > releasing them; on the other hand, adding API's related to an API
>> > that no one (to our knowledge) uses seems odd)
>>
>> Yes. I think we should give it some effort to come up with some
>> minimal testing. But if it takes too long, we should probably not let
>> this be blocking....
>>
>
> I've went ahead and made a patch out of the second example in that
> article. However, I admit that I got it to work by fiddling with the
> order of the four parameters until the test passed :-)
>
> Could you have a look? (attached)
Nice. It looks good to me (haven't tested it, just looked at the code;
I assume it passes with trunk?)
The order of the parameters is correct: tools/diff/diff4.c does
exactly the same thing (switch the first and second argument), before
it passes them to svn_diff_file_diff4.
tools/diff/diff4.c, lines 75-77:
[[[
svn_err = do_diff4(ostream,
argv[2], argv[1], argv[3], argv[4],
pool);
]]]
And do_diff4 look thusly, mirroring exactly what you're doing in the test:
[[[
static svn_error_t *
do_diff4(svn_stream_t *ostream,
const char *original,
const char *modified,
const char *latest,
const char *ancestor,
apr_pool_t *pool)
{
svn_diff_t *diff;
SVN_ERR(svn_diff_file_diff4(&diff,
original, modified, latest, ancestor,
pool));
SVN_ERR(svn_diff_file_output_merge(ostream, diff,
original, modified, latest,
NULL, NULL, NULL, NULL,
FALSE,
FALSE,
pool));
return NULL;
}
]]]
So apparently, if we compare the arguments from diff4's usage message
with the names of the parameters, we have:
mine == modified
older == original
yours == latest
ancestor == ancestor
Cheers,
Johan
> Thanks,
>
> Daniel
>
>> Cheers,
>> --
>> Johan
>>
>> [1] http://svn.apache.org/repos/asf/subversion/trunk/notes/variance-adjusted-patching.html
>