You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Tom Lord <lo...@emf.net> on 2003/04/08 22:15:21 UTC

isn't variance adjusted patching horribly dangerous?


I've been reading:

          http://svn.collab.net/repos/svn/trunk/www/variance-adjusted-patching.html


There are times when variance adjusted patching would certainly do
"the right thing" --- and if you know you have such a situation, well,
heck it's a fine tool to add to the toolbox.

But in general, it seems quite dangerous precisely because it
accomplishes its goal of eliminating conflicts that would otherwise
arise from contextual changes.

Perhaps I've misunderstood how it works, so let me illustrate what I'm
thinking of:

Consider, for example, a common ancestor that starts with:


	T:8 == B:1
        ----------

	last = x;
        val = 2 * x;
        foo (val);



On the trunk:

	T:19  (unchanged from T:8)
        ----

        last = x;
        val = 2 * x;
        foo (val);


	T:20 (second line changed from T:19)
        ----

	last = x;
        val = 2 * (counter += x);
        foo (val);



And on the branch:

	B:15  (first line changed from B:1)
        ----

	last = x, counter += x;
        val = 2 * x;
        foo (val);


In this case, both the trunk and branch programmers added a needed
increment of `counter', but they did it in different ways, on nearby
but different lines.

The programmer wants to apply T:19-T:20 to B:15.

Between T:19 and T:20, only one line changed here (the assignment to
val).   The context diff will be something like:

        *** T:19
        --- T:20
	*****
	    last = x;
        !   val = 2 * x;
            foo (val);
        -----
	    last = x;
        !   val = 2 * (counter += x);
            foo (val);

That will be adjusted to account for the changes between T:8 and T:19
(there are none), and then for the changes from B:1 to B:15.  After
that adjustment (for an "in-range editted line"), the patch will look
like this:

        *** T:19
        --- T:20
	*****
	    last = x, counter += x;
        !   val = 2 * x;
            foo (val);
        -----
	    last = x, counter += x;
        !   val = 2 * (counter += x);
            foo (val);

which will apply without conflict to B:15, yielding:

	B:16
        ----

	last = x, counter += x;
        val = 2 * (counter += x);
        foo (val);

which (because of the duplicated assignments to counter) is likely to
be the wrong result.

Now, to be fair -- even regular textual patching, without the variance
algorithm, can introduce surprising bugs.   Patching is inherently
dangerous that way.

But one of the ways we traditionally reduce that risk is by using
context diffs -- and variance adjusted patching appears to be a
mechanism precisely designed to undo the benefits of that safety
measure.

The example above illustrates a case where ordinary patching would do
the right thing, generate a conflict and help a programmer to notice
the bug, but variance adjusted patching will silently hide the bug.

A less aggressive form of variance adjustment would seem to me to be
safer: and that would be just to adjust the line offsets of hunks (not
their contents) in response to out-range additions and deletions.
That would defeat the window-size limitation of the patching tool

Referring to 

          http://svn.collab.net/repos/svn/trunk/www/variance-adjusted-patching.html

that would be cases 1 and 2 of the two 7-case adjustment algorithms
(The text starting with "To adjust P, we first walk over the diff
T:8-T:19,").

Yet even _that_ kind of adjustment presents new dangers, and I don't
think I'd want it to be the default.

-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: isn't variance adjusted patching horribly dangerous?

Posted by Matt Kraai <kr...@alumni.cmu.edu>.
On Thu, Apr 10, 2003 at 11:08:14AM +0200, Sander Striker wrote:
> > From: Lars Segerlund [mailto:lars.segerlund@comsys.se]
> >   Where is this doc referenced, I would like to find it :-).
> 
> http://svn.collab.net/repos/svn/trunk/www/variance-adjusted-patching.html
> 
> But currently svn.collab.net is offline for maintainence.  Check
> back when the 'it's back up' message is posted on this list.

Also

 http://subversion.tigris.org/variance-adjusted-patching.html

which is still up.

Matt
-- 
It's most certainly GNU/Linux, not Linux.  Read more at
http://www.gnu.org/gnu/why-gnu-linux.html.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: isn't variance adjusted patching horribly dangerous?

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
"Sander Striker" <st...@apache.org> writes:
> http://svn.collab.net/repos/svn/trunk/www/variance-adjusted-patching.html
> 
> But currently svn.collab.net is offline for maintainence.  Check
> back when the 'it's back up' message is posted on this list.

It's also in your working copy or Subversion distribution, in www/ :-)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: isn't variance adjusted patching horribly dangerous?

Posted by Sander Striker <st...@apache.org>.
> From: Lars Segerlund [mailto:lars.segerlund@comsys.se]
> Sent: Thursday, April 10, 2003 11:00 AM
> To: dev@subversion.tigris.org
> Subject: Re: isn't variance adjusted patching horribly dangerous?
> 
> 
> 
>   Where is this doc referenced, I would like to find it :-).

http://svn.collab.net/repos/svn/trunk/www/variance-adjusted-patching.html

But currently svn.collab.net is offline for maintainence.  Check
back when the 'it's back up' message is posted on this list.


Sander

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: isn't variance adjusted patching horribly dangerous?

Posted by Lars Segerlund <la...@comsys.se>.
  Where is this doc referenced, I would like to find it :-).

  / Lars Segerlund.

Tom Lord wrote:
snip
> 
> You need not.  Kfogel's document contains some examples.  Mentioning
> the existence of such examples is a logical non-sequitor.
> 
> Another example of where v.a.p. will silently and incorrectly supress
> a desirable conflict would be the ordering of two function calls.
> 
snip
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: isn't variance adjusted patching horribly dangerous?

Posted by Sander Striker <st...@apache.org>.
> From: Tom Lord [mailto:lord@emf.net]
> Sent: Thursday, April 10, 2003 2:19 AM

[...]
> All that said: I'd surely like a stand-alone diff4 and even a built-in
> diff4 (that emits a patch set rather than munging my tree) as a tool
> to have around -- it's just a bloody mess, though to apply diff4
> patches blindly as a day-to-day merging technique.  My (possibly
> faulty) understanding from off-list conversations was that blindly
> applying diff4 was begining to be thought of as a way around the "back
> and forth" merging problem.

I am going to propose that in a few days time yes.  Possibly with options
to mark adjustments as conflicts.  This would be the same as someone
requested for our internal diff3, always mark all changes as conflicts
(even when they don't conflict).  Every user can decide for themselves
what their default is going to be.  At installation we will set it to
the 'safest' value.

Anyway, the proposal isn't even out yet because I simply don't have the
time to catch up right now :(.

Sander


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: isn't variance adjusted patching horribly dangerous?

Posted by Tom Lord <lo...@emf.net>.

       Philip Martin:

       > Tom's example was contrived to give a conflict, I can
       > contrive one that does not


You need not.  Kfogel's document contains some examples.  Mentioning
the existence of such examples is a logical non-sequitor.

Another example of where v.a.p. will silently and incorrectly supress
a desirable conflict would be the ordering of two function calls.


If the trunk has:


	foo (a, b);
	bar (x, y);

but the branch:

        bar (x, y);
	foo (a, b);


and the trunk changeset makes that `foo (a, b, c)', then v.a.p. can
silently create:


	foo (a, b, c);
	foo (a, b);

(By the "in-range edited line" rules.)


For that matter, with:


	foo (a, b);

in the trunk, and 

	foo (munge(a), b);

in the branch, and a trunk change to:

	foo (a, b, c);

v.a.p. can silently give you:


	foo (a, b, c);

where

	foo (munge(a), b, c);

is what a human forced to resolve a conflict would want.

All that said: I'd surely like a stand-alone diff4 and even a built-in
diff4 (that emits a patch set rather than munging my tree) as a tool
to have around -- it's just a bloody mess, though to apply diff4
patches blindly as a day-to-day merging technique.  My (possibly
faulty) understanding from off-list conversations was that blindly
applying diff4 was begining to be thought of as a way around the "back
and forth" merging problem.

-t


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: isn't variance adjusted patching horribly dangerous?

Posted by Philip Martin <ph...@codematters.co.uk>.
Garrett Rooney <ro...@electricjellyfish.net> writes:

> > Are you saying that with internal diff Philip's example (one
> > containing "svn merge -r 2:3 wc/bar wc/foo") will not work as he
> > says it does?
> 
> i have no idea, i haven't tried it ;-)

It's what one gets from HEAD using the builtin diff.  If you want to
reproduce it then create wc/foo containing my "Original" and commit,
copy wc/foo to wc/bar and commit, edit wc/foo and wc/bar to create
"Change 1" and "Change 2" and commit.  Then run merge.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: isn't variance adjusted patching horribly dangerous?

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On Wednesday, April 9, 2003, at 08:29 PM, Sergey wrote:

>
>
> Garrett Rooney wrote:
>> our internal diff code is currently used by default.  you can 
>> override it on the command line or via a config file option.  this is 
>> not v.a.p. though.  we do have an implementation of v. a. p. in the 
>> tree, but it is not well tested, and the command line client does not 
>> use it yet (and as far as i know there has not yet been any 
>> discussion of the best way to work it into the interface).
> Are you saying that with internal diff Philip's example (one 
> containing "svn merge -r 2:3 wc/bar wc/foo") will not work as he says 
> it does?

i have no idea, i haven't tried it ;-)

-garrett


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: isn't variance adjusted patching horribly dangerous?

Posted by Sergey <se...@pisem.net>.

Garrett Rooney wrote:
> 
> our internal diff code is currently used by default.  you can override 
> it on the command line or via a config file option.  this is not v.a.p. 
> though.  we do have an implementation of v. a. p. in the tree, but it is 
> not well tested, and the command line client does not use it yet (and as 
> far as i know there has not yet been any discussion of the best way to 
> work it into the interface).
> 
Are you saying that with internal diff Philip's example (one containing "svn merge -r 2:3 wc/bar wc/foo") will not work as he 
says it does?



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: isn't variance adjusted patching horribly dangerous?

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On Wednesday, April 9, 2003, at 08:26 PM, Tom Lord wrote:

>
>
>        I have to be careful then. After experiencing third-party
>        source maintenance with CVS, it looks like I trust version
>        control systems too much. CVS 1.11.5 that I use doesn't work
>        this way, it surely generates a conflict on merge. Will
>        Subversion generate a conflict if the external diff is used? I
>        really hope to hear "yes".
>
> I'll speak out of turn here and give the answer as I understand it:
> "Yes".  (A core developer can correct me if I'm wrong.)  I'm not too
> clear on when or whether external vs. internal diff is used --- just
> that I asked and got the answer that svn does not currently use v.a.p.

our internal diff code is currently used by default.  you can override 
it on the command line or via a config file option.  this is not v.a.p. 
though.  we do have an implementation of v. a. p. in the tree, but it 
is not well tested, and the command line client does not use it yet 
(and as far as i know there has not yet been any discussion of the best 
way to work it into the interface).

-garrett


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: isn't variance adjusted patching horribly dangerous?

Posted by Tom Lord <lo...@emf.net>.

       I have to be careful then. After experiencing third-party
       source maintenance with CVS, it looks like I trust version
       control systems too much. CVS 1.11.5 that I use doesn't work
       this way, it surely generates a conflict on merge. Will
       Subversion generate a conflict if the external diff is used? I
       really hope to hear "yes".

I'll speak out of turn here and give the answer as I understand it:
"Yes".  (A core developer can correct me if I'm wrong.)  I'm not too
clear on when or whether external vs. internal diff is used --- just
that I asked and got the answer that svn does not currently use v.a.p.

The dramatic title of the thread isn't intended as FUD about what svn
currently does.

-t


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: isn't variance adjusted patching horribly dangerous?

Posted by Jack Repenning <jr...@collab.net>.
Serkey said:
> [quote of Philip's adjoining changes example]
> I'd like the system to generate a conflict here, I really do. 
> Wouldn't you?

I certainly would like a conflict in the case of adjoining deltas (this
is a common case of widening, which I just wrote about in another
response).

I once had this discussion with another merge implementer.  He pointed
out "if this were a wedding invitation list, there would be no
conflict."  I asked him if there was much money to be made in
version-control of wedding lists....


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: isn't variance adjusted patching horribly dangerous?

Posted by Sergey <se...@pisem.net>.

Philip Martin wrote:
> 
> Tom's example was contrived to give a conflict, I can contrive one
> that does not
> 
> Original                   Change 1                   Change 2
> 
> const char*email[]={       const char*email[]={       const char*email[]={
>   "foo@bar.com",             "foo@bar.com",             "foo@bar.com",
>   "zig@zag.com",             "new_zig@zag.com",         "zig@zag.com",
>   "bif@bof.com",             "bif@bof.com",             "new_bif@bof.com",
>   NULL                       NULL                       NULL
> };                         };                         };
> 

I'd like the system to generate a conflict here, I really do. Wouldn't you?

> 
>>I would refrain from using the system that would throw the context
>>away.
> 
> 
> Are you using Subversion?
> 

I have to be careful then. After experiencing third-party source maintenance with CVS, it looks like I trust version control 
systems too much. CVS 1.11.5 that I use doesn't work this way, it surely generates a conflict on merge. Will Subversion generate 
a conflict if the external diff is used? I really hope to hear "yes".
Thanks!

Sergey.



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: isn't variance adjusted patching horribly dangerous?

Posted by Tom Lord <lo...@emf.net>.

    >>> Karl Fogel:

    >>> There are a lot more important things to be worrying about
    >>> than this right now, even just in the area of improved merge
    >>> support.

    >> Tom Lord:
    >> Neat.  Like what?

    > Karl Fogel:
    > Like exactly how to record prior merge history :-).


The proposal on the table that I've heard about used the mechanism of
a property named (I think) "svn:mr" ("most recent"), attached to
individual file and directory nodes, recording the high-water-marks of
merges to those nodes from related branches.  And I believe it's well
known that that proposal has to be extended in order to handle cherry
picking.   My svn notes don't record any discussion of transitivity of
merges:  when you set svn:mr to reflect a merge, either (a), you'll
have to include the data about third branches from the svn:mr of the
merged-in node, or (b) when you look at a svn:mr value, you'll have to
transitively search the svn:mr properties of the nodes it refers to.

That does, indeed, sound like a useful mechanism: but not a sufficient
one, and one that will be difficult to design and implement compared
to the alternative I'll propose below.

The reasons it's difficult to fully design that mechanism have to do
with the potentially different granularity of changes being merged and
merges being applied.  Suppose my trunk has:

	proj/A/hello.c
	proj/B/

and my branch renames A/hello.c to B/hello.c I merge the changes to
directory A from the branch, but not to directory B.  How should that
be handled?  At a later time, I merge the changes to directory B.
What is the final result?  There are multiple logically consistent
answers, none of which seems to me to be ideal: (a) such partial
merges are forbidden; (b) hello.c disappears, then in the second merge
a version of hello.c is added to directory B: but what version of
hello.c  (b1) the version from the branch;  (b2) the resurrected
version from my tree;  (b3) a merge of those?   Supposing we want to
avoid the three subcases of (b), we might have (c) such partial merges
are sometimes permitted, but not if they would delete a file that
differs between the merged-into and merged-from branch (but then how
do I reasonablly and convenient work around that?).   I'm not saying
that problems such as this make it impossible to design solutions --
just that the solutions are going to hard to explain, and hard to
design to be maximally useful.

The granularity also interferes with project mgt/business rules.
Tracking of changes as they propogate between branches and out into
deliverables has meaning well beyond the scope of low-level version
control -- it's part of how progress is monitored and part of how bug
archaeology is conducted.  But with fine-grain patching, when you ask
"is change such and such present in this tree", (a) instead of "yes"
or "no" the answer can be "partly"; (b) computing that yes/no/partly
answer accurately involves searching the svm:mr-ish properties of
every node in the tree.

And of course, it isn't easy to implement that mechanism just for the
simple reason that it involves adding new features to your
database-schema, and surfacing APIs at multiple layers to deal with
those new properties, then writing custom merge tools that consult
those properties using those APIs.

A complementary and simpler to implement approach, that also simply
avoids the granularity issues listed above, is to take the route 
we talked about earlier:  add some higher-level structure to the
layout of the top-level organizational directories in a repository,
and some conventional structure to project trees, and add a layer
which does whole-project-tree merging.    Merge history for
whole-project-tree merging doesn't have to be recorded as new
properties:  it can be stored in ordinary (controlled) text files
along side the project trees themselves.    Tools which perform merges
using and extending that history barely need to interact with svn at
all -- and can conceivably do so using only the ordinary CLI interface
or a scripting language binding for that interface.   

You could have history-sensitive whole-tree merging, _and_ a big
chunk of what's needed for distributed branching, with maybe a couple
10K lines of python or scheme code.

The "patch log" mechanism in arch records whole tree merge history
exactly as I've suggested: in ordinary "source" files stored along
with the project tree itself.   In the context of arch, it
demonstrates its usefulness for both smart merging and
project-mgt/business rule auditting.    And, by keeping the data in
regular files (and representing the data carefully) -- the patch log
mechanism helps make distributed branching a snap.

To maybe give you a taste of how nicely this mechanism works out, in
the first paragraph I talked about the "transitivity" problem with
svn:mr-ish approaches.   The patch-log mechanism solves that problem
automatically -- it "falls out" of the solution for free:  a generic
whole-tree "patch" algorithm doesn't know anything special about patch
logs, it treats them just like regular files.   But in treating patch
logs as ordinary files, it implicitly records the transitive merge
history just as part of its default operation (i.e., a merged-into
tree picks up all the patch log entries its missing from the
merged-in changes.)


-t


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: isn't variance adjusted patching horribly dangerous?

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
Tom Lord <lo...@emf.net> writes:
> 	> There are a lot more important things to be worrying about
> 	> than this right now, even just in the area of improved merge
> 	> support.
> 
> Neat.  Like what?

Like exactly how to record prior merge history :-).

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: isn't variance adjusted patching horribly dangerous?

Posted by Tom Lord <lo...@emf.net>.

	Karl:

	> There are a lot more important things to be worrying about
	> than this right now, even just in the area of improved merge
	> support.

Neat.  Like what?

-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: isn't variance adjusted patching horribly dangerous?

Posted by Sander Striker <st...@apache.org>.
> From: Karl Fogel [mailto:kfogel@newton.ch.collab.net]
> Sent: Thursday, April 10, 2003 6:13 PM

> I feel like this whole discussion of whether variance-adjusted
> patching is "dangerous" or not is something of a useless tangent right
> now.
> 
> The paper itself points out that the algorithm may be overeager for
> some use cases, and recommends attaching "selectivity knobs" to the
> various steps in the algorithm, to avoid this overeagerness when
> appropriate.  If you're versioning wedding invitations, then turn the
> knobs down; for code, turn them up.
> 
> As to where the knobs' defaults should be, that's a question that can
> be settled by experience and/or debate when the feature is actually
> available.  There are a lot more important things to be worrying about
> than this right now, even just in the area of improved merge support.

+1.

That having said, those knobs really aren't rocket science and frankly
easily implemented.

Improved merge support proposal comming up in about an hour...

Sander


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: isn't variance adjusted patching horribly dangerous?

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
I feel like this whole discussion of whether variance-adjusted
patching is "dangerous" or not is something of a useless tangent right
now.

The paper itself points out that the algorithm may be overeager for
some use cases, and recommends attaching "selectivity knobs" to the
various steps in the algorithm, to avoid this overeagerness when
appropriate.  If you're versioning wedding invitations, then turn the
knobs down; for code, turn them up.

As to where the knobs' defaults should be, that's a question that can
be settled by experience and/or debate when the feature is actually
available.  There are a lot more important things to be worrying about
than this right now, even just in the area of improved merge support.

Just my $0.02,
-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: isn't variance adjusted patching horribly dangerous?

Posted by Tom Lord <lo...@emf.net>.

       jrepenning@collab.net:

       > If variance adjusting really turns an ugly delta into an
       > unsignaled lie, that would be a major offense; this is, I
       > think, a restatement of Tom's concern.

Heh.  We speak similar languages.  In a very general sense, a common
design failure of programs is to try to hide what's going on at the
low level, and make it look like something subtlely (but deeply)
different is going on at the user interface level.  That's a design
that "lies" and it's always a bad idea (but sometimes sells for a
while).   Users eventually bump into the edge cases of the lie, and
then they get into trouble.

The alternative to lying is to design user interfaces that _explain_
and help manage the implications of the low level.


	> My question would be, "how often do these conflicting fixes
	> in the same area arise?"  If often, we should worry; if
	> rarely, perhaps not.

The in-range rules are particularly problematic and I would be
prepared to bet my last $110 that they will cause problems often.
The function call examples in my last mail illustrate.   They
effectively say "Discard or ignore your changes in this code -- where
the same lines haven't changed in the changeset, ignore your changes;
where the same lines have changed, discard your changes."   That's not
never the right thing, but it's often not.


	> But this is exactly and solely the question that has divided
	> "change-based" advocates from "version-based" advocates for
	> fifty years;

?!?  I don't think it's "exactly and solely" any such thing ... or
even that that question really dominates much literature ... but
you've made me curious what you're looking at from 1953....


-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: isn't variance adjusted patching horribly dangerous?

Posted by Jack Repenning <jr...@collab.net>.
Phillip Martin writes:
> Tom's example was contrived to give a conflict, I can 
> contrive one that does not

It's not clear to me that that's relevent.  We can't any of us come up
with a perfect scheme, so what we must settle for is an optimal one,
which:
(1) Does the right thing as much as possible.
(2) Clearly signals the other cases, so a human can resolve the matter.
(3) Really, really minimize the cases where it thinks it got it right,
but is in fact wrong.

I claim that (1) is the ultimate goal, but perversely enough (2) is more
important than (1), and (3) is the most important of all.

That there are non-conflicting instances is good, of course, we want as
many of them as possible, but the real measure of the system is (3): not
lying.

I agree with Tom, that conflicting changes (in this hard-to-detect
sense) will arise, and need to be taken into account.  I won't quibble
with Philip over whether they represent process flaws, I will only point
out that this tool exists, in part, to identify and protect us from our
errors, while getting as much as possible right to save us time.

The classic strategy for dealing with these conflicts is "region
widening" -- not only the elsewhere-quoted widening of the context, but
also widening of the delta region itself.  This often "works" only in
that it produces an uglier merge failure, satisfying (2).  If variance
adjusting really turns an ugly delta into an unsignaled lie, that would
be a major offense; this is, I think, a restatement of Tom's concern.

My question would be, "how often do these conflicting fixes in the same
area arise?"  If often, we should worry; if rarely, perhaps not.  I have
a lot of experience in a situation where this sort of thing was in fact
the norm, representing well over 50% of a change rate of several
thousand deltas a day.  While I recognize this was an extreme case, I
have to say it makes me very chary of stepping into this mess.  But this
is exactly and solely the question that has divided "change-based"
advocates from "version-based" advocates for fifty years; I dare suggest
we won't actually resolve it on this list, either.

What I do suggest is that the variance adjusting might be tweakable, to
reduce its exposure here.  A possibility: use a word-list culled from
the various regions, to see if any of the variances being "based" refer
to words also in the putative delta region; if so, then decide that
these variances are too risky for the tool to autodecide: widen the
delta region and try again.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: isn't variance adjusted patching horribly dangerous?

Posted by Philip Martin <ph...@codematters.co.uk>.
"Sergey A. Lipnevich" <se...@pisem.net> writes:

> Philip Martin wrote:
> > Fixing the same bug in two different ways is a mistake, it should
> > be fixed in one place and merged to the other.  Yes, mistakes will
> > sometimes happen, but at other times adjacent changes like those
> > above will not be a conflict.
> 
> This is the process flaw, you can't control it.

That's what I said.

> From common sense, they /are/ a conflict.

That's not true in general.  It is only a conflict when one interprets
the text in a particular way, and any such interpretation is beyond
the scope of version control.

(It appears that your "common sense" and my "common sense" don't have
all that much in common!)

> > I disagree, this is not the wrong result--from a version control
> > point of view the changes have been combined perfectly.
> 
> No matter what the theory is behind this, Tom's example definitely
> looks right to me,

Tom's example was contrived to give a conflict, I can contrive one
that does not

Original                   Change 1                   Change 2

const char*email[]={       const char*email[]={       const char*email[]={
  "foo@bar.com",             "foo@bar.com",             "foo@bar.com",
  "zig@zag.com",             "new_zig@zag.com",         "zig@zag.com",
  "bif@bof.com",             "bif@bof.com",             "new_bif@bof.com",
  NULL                       NULL                       NULL
};                         };                         };

> I would refrain from using the system that would throw the context
> away.

Are you using Subversion?

$ svn diff -r 2:3 wc/bar
Index: wc/bar
===================================================================
--- wc/bar      (revision 2)
+++ wc/bar      (revision 3)
@@ -1,7 +1,7 @@
 const char*email[]={
   "foo@bar.com",
   "zig@zag.com",
-  "bif@bof.com",
+  "new_bif@bof.com",
   NULL
 };
$ svn merge -r 2:3 wc/bar wc/foo 
U  wc/foo
$ svn diff wc/
Index: wc/foo
===================================================================
--- wc/foo      (revision 3)
+++ wc/foo      (working copy)
@@ -1,7 +1,7 @@
 const char*email[]={
   "foo@bar.com",
   "new_zig@zag.com",
-  "bif@bof.com",
+  "new_bif@bof.com",
   NULL
 };

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: isn't variance adjusted patching horribly dangerous?

Posted by "Sergey A. Lipnevich" <se...@pisem.net>.

Philip Martin wrote:
> 
> Fixing the same bug in two different ways is a mistake, it should be
> fixed in one place and merged to the other.  Yes, mistakes will
> sometimes happen, but at other times adjacent changes like those above
> will not be a conflict.

This is the process flaw, you can't control it. From common sense, they /are/ a conflict.

> 
> I disagree, this is not the wrong result--from a version control point
> of view the changes have been combined perfectly.
> 

No matter what the theory is behind this, Tom's example definitely looks right to me, I would refrain from using the system that 
would throw the context away. Practical experience shows that at least three lines guaranteed by `diff -u' are necessary. 
Consider also the fact that Mozilla developers usually demand 30 (!) lines context on the patches.

Thanks!

Sergey.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: isn't variance adjusted patching horribly dangerous?

Posted by Tom Lord <lo...@emf.net>.

       Philip Martin <ph...@codematters.co.uk>:

       Traditional patch works without having access to the full text
       of all the files involved, it only has one full text.  The
       multiple merge implied by variance adjusted patching involves
       access to all the full texts, the line numbers can be adjusted
       and the hunks applied without resorting to the fuzzy context
       matching used by patch.  It's not clear to me why one would
       want to use context at all if the full texts are available.

Because textual patching is semantically insensitive, but context
provides a partial (but significant) compensation for that.

Indeed -- the revision history enables tricks like adjusting hunk line
numbers or tweaking context in such a way to help figure out where a
conflicting hunk would otherwise go -- but context also helps to 
avoid blindly applying inappropriate changes.   That's why I like the
"soft conflict" idea.

-t

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: isn't variance adjusted patching horribly dangerous?

Posted by Philip Martin <ph...@codematters.co.uk>.
Tom Lord <lo...@emf.net> writes:

You quote me without using my name.  Your quoting style makes it
awkward to distinguish my material from the bits of your material that
you decided to indent.  This does not encourage me to participate in
the discussion.

> 	It boils down to the question: how often should adjacent,
> 	non-overlapping changes result in a conflict?  I prefer the
> 	answer "never", you appear to prefer "always".  I don't
> 	suppose either of us can prove which is the most likely to be
> 	correct in the real world.
> 
> There is considerable empirical evidence that I'm basically right.

I suspect we may have to agree to disagree.

> For example, for how many years now has the documentation for Larry
> Wall's patch suggested -- what is it ... uh -- three lines of context?
> Or for how many years has that been the default in various versions of
> diff?  And in that time, how much folklore has spread around to the
> effect that that's a lousy default and you should override those
> parameters when using those tools?

Traditional patch works without having access to the full text of all
the files involved, it only has one full text.  The multiple merge
implied by variance adjusted patching involves access to all the full
texts, the line numbers can be adjusted and the hunks applied without
resorting to the fuzzy context matching used by patch.  It's not clear
to me why one would want to use context at all if the full texts are
available.

An example

  # Create a file on the trunk
  rm -rf repo wc
  svnadmin create repo
  svn mkdir -m rev1 file://`pwd`/repo/trunk
  svn checkout file://`pwd`/repo/trunk wc
  printf "foo\nbar\n" > wc/file
  svn add wc/file
  svn commit -m rev2 wc

  # Copy to file to branch
  svn copy -m rev3 file://`pwd`/repo/trunk file://`pwd`/repo/branch

  # Commit a trunk change
  printf "foo2\nbar\n" > wc/file
  svn commit -m rev4 wc

  # Commit a branch change
  svn switch file://`pwd`/repo/branch wc
  printf "foo\nbar2\n" > wc/file
  svn commit -m rev5 wc
  svn switch file://`pwd`/repo/trunk wc

  # Local, uncommitted trunk change
  printf "zig\nfoo2\nbar\n" > wc/file

So now I have a local, uncommitted change

svn diff wc/file
Index: wc/file
===================================================================
--- wc/file     (revision 5)
+++ wc/file     (working copy)
@@ -1,2 +1,3 @@
+zig
 foo2
 bar

and a change on the branch

svn diff -r4:5 file://`pwd`/repo/branch/file
Index: file
===================================================================
--- file        (revision 4)
+++ file        (revision 5)
@@ -1,2 +1,2 @@
 foo
-bar
+bar2

Running Subversion's builtin merge 

svn merge -r4:5 file://`pwd`/repo/branch wc

produces the result

cat wc/file
zig
foo2
bar2

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: isn't variance adjusted patching horribly dangerous?

Posted by Tom Lord <lo...@emf.net>.

	Fixing the same bug in two different ways is a mistake, it
	should be fixed in one place and merged to the other.

That detail of the example -- that a bug was fixed in two different
ways -- is in no way essential to the analysis.   I mentioned it only
to make it easier to read all the code fragments without getting lost.

We could easilly construct a realistic example that had nothing to do
with redundant bug fixes.   For example, the context lines may be
obtaining and releasing various locks -- the code in the middle
requiring certain locks to be held.   A patch from T:19-T:20 that
relies one one set of locks being held may, by the same merging trick,
be silently applied to a B:15 in which different locks are held.


	Context diffs could be viewed as a safety measure, but they
	can also be viewed as way of allowing a patch to be applied to
	a modified source.  Personally I value the latter more than
	the former.

        [....]

        Whether it is a bug depends on the interpretation of the code,
        and that is beyond the scope of version control.  There will
        be cases where adjacent changes should behave exactly as you
        show above.

Indeed there may be.  The context mechanism and the finite window for
hunk offsets are proven good heuristics for detecting cases where a
tool, such as `patch' or a version control system, should not presume to
guess whether or not the changes should be combined.   It was
considered quite a cool thing -- what 15 or 20 years ago? -- when
those heuristics were realized.   Variance adjusted patching is a
mechanism designed specifically to disable those heuristics in exactly
the kind of context where they are almost always winning.

That said, you've given me a good idea for something better than
variance adjusted patching.   When adjusting a patch, there are two
kinds of changes made:  the line numbers of hunks may be altered, and
the contents of hunks may be altered.

A variation on variance adjusted patching would follow these three
rules:

	1) Whenever the _contents_ of a hunk are modified, the 
           hunk is marked "soft conflict".

	2) Whenever the line numbers of a hunk are modified 
           such that, instead of being a mere performance enhancement,
           application of the hunk would violate the usual search
           window of the patch algorithm, the hunk is marked "soft
           conflict".

	3) The outcome of patching has three, instead of two kinds of
           result for each hunk.  A hunk may generate a conflict or
           apply cleanly, as before -- but if a hunk is marked "soft
           conflict", then it is treated as a conflict, but the
           conflict record is distinct from ordinary conflicts. (In
           terms of conflict markers, instead of "mine" and "your"
           sections, you'd have "mine", "yours", and "variance
           adjusted".)

Those rules would give users the opportunity to review those cases
where variance adjusted patching has overridden the usual semantics of
patch, but they would also be able to suggest the result of the
current variance adjusted patching algorithm as a resolution.  


	It's not as if merge affects the repository, the user gets to
	compare the sources of the merge before merging, and also gets
	to view the result of the merge before committing.

Indeed they do.  But such reviews always imply a cost/benefit
trade-off.  In a large merge, the cost of a detailed review can be
quite high.  Context is used to help optimize the use of resources
dedicated to review.  Conflicts highlight areas of the code where,
probabilistically, detailed review and human intervention is most
appropriate.  Soft conflicts, as described above, could enhance this.


	It boils down to the question: how often should adjacent,
	non-overlapping changes result in a conflict?  I prefer the
	answer "never", you appear to prefer "always".  I don't
	suppose either of us can prove which is the most likely to be
	correct in the real world.

There is considerable empirical evidence that I'm basically right.
For example, for how many years now has the documentation for Larry
Wall's patch suggested -- what is it ... uh -- three lines of context?
Or for how many years has that been the default in various versions of
diff?  And in that time, how much folklore has spread around to the
effect that that's a lousy default and you should override those
parameters when using those tools?

Just in general, textual patching is inherently risky business.
Variance adjusted patching increases that risk in some notable ways.
While for many projects, the costs of introducing a bug are
inconsequential, for many others, the costs are quite high.  It seems
to me to be a cavalier attitude to regard these costs as a mere matter
of subjective opinion -- and a more sensible approach to manage them
with care.


-t



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: isn't variance adjusted patching horribly dangerous?

Posted by Philip Martin <ph...@codematters.co.uk>.
Tom Lord <lo...@emf.net> writes:

> Consider, for example, a common ancestor that starts with:
> 
> 	T:8 == B:1
>         ----------
> 
> 	last = x;
>         val = 2 * x;
>         foo (val);
> 
> On the trunk:
> 
> 	T:20 (second line changed from T:19)
>         ----
> 
> 	last = x;
>         val = 2 * (counter += x);
>         foo (val);
> 
> And on the branch:
> 
> 	B:15  (first line changed from B:1)
>         ----
> 
> 	last = x, counter += x;
>         val = 2 * x;
>         foo (val);
> 
> In this case, both the trunk and branch programmers added a needed
> increment of `counter', but they did it in different ways, on nearby
> but different lines.

Fixing the same bug in two different ways is a mistake, it should be
fixed in one place and merged to the other.  Yes, mistakes will
sometimes happen, but at other times adjacent changes like those above
will not be a conflict.

[...]
> which will apply without conflict to B:15, yielding:
> 
> 	B:16
>         ----
> 
> 	last = x, counter += x;
>         val = 2 * (counter += x);
>         foo (val);
> 
> which (because of the duplicated assignments to counter) is likely to
> be the wrong result.

I disagree, this is not the wrong result--from a version control point
of view the changes have been combined perfectly.

> Now, to be fair -- even regular textual patching, without the variance
> algorithm, can introduce surprising bugs.   Patching is inherently
> dangerous that way.
> 
> But one of the ways we traditionally reduce that risk is by using
> context diffs -- and variance adjusted patching appears to be a
> mechanism precisely designed to undo the benefits of that safety
> measure.

Context diffs could be viewed as a safety measure, but they can also
be viewed as way of allowing a patch to be applied to a modified
source.  Personally I value the latter more than the former.

> The example above illustrates a case where ordinary patching would do
> the right thing, generate a conflict and help a programmer to notice
> the bug, but variance adjusted patching will silently hide the bug.

Whether it is a bug depends on the interpretation of the code, and
that is beyond the scope of version control.  There will be cases
where adjacent changes should behave exactly as you show above.

> A less aggressive form of variance adjustment would seem to me to be
> safer: and that would be just to adjust the line offsets of hunks (not
> their contents) in response to out-range additions and deletions.
> That would defeat the window-size limitation of the patching tool

I've not yet examined the Subversion merge code, but I know that when
I wrote some merge code in the past I did it without context at all, I
just did line adjustment as you describe above.  I was pleased with
the results it produced, I considered your "wrong result" above to be
a positive advantage.

It's not as if merge affects the repository, the user gets to compare
the sources of the merge before merging, and also gets to view the
result of the merge before committing.

[...]
> Yet even _that_ kind of adjustment presents new dangers, and I don't
> think I'd want it to be the default.

It boils down to the question: how often should adjacent,
non-overlapping changes result in a conflict?  I prefer the answer
"never", you appear to prefer "always".  I don't suppose either of us
can prove which is the most likely to be correct in the real world.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org