You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Stefan Sperling <st...@elego.de> on 2008/04/02 17:29:53 UTC

tree conflicts and directories discussion

Hello tree conflict fans,

I've started thinking a bit about tree conflicts and directories.

The problem: We want to be able to also mark directories as
tree conflict victims. The current code, however, only considers
files. See http://subversion.tigris.org/issues/show_bug.cgi?id=3150

I thought it might be a good idea for all of us to agree on
how directories can become tree conflict victims before we
start implementing anything :)

To kick off a discussion, I've translated the entries about
each use case in notes/tree-conflicts/detection.txt from
files to directories.

Here is the result, please note the open question in use case 5.

Comments and additions welcome, as usual.

-----------------------------------------------------------------------

In all the cases below (except possibly one exception, noted inline)
we never descend into directories recursively. We only consider
the direct children of the directory in question.

==========
USE CASE 1
==========

File:
If 'svn update' modifies a file that has been scheduled for deletion
in the working copy, the file is a tree conflict victim.

Directory:
If 'svn update' modifies a file in a directory that has been scheduled
for deletion in the working copy, the directory is a tree conflict victim.

==========
USE CASE 2
==========

File:
If 'svn update' deletes a file that has local modifications, the file
is a tree conflict victim.

Directory:
If 'svn update' deletes a directory that contains a file with local
modifications, the directory is a tree conflict victim.

==========
USE CASE 3
==========

File:
If 'svn update' deletes a file that has been scheduled for deletion in
the working copy, the file is a tree conflict victim.

Directory:
If 'svn update' deletes a directory that has been scheduled for deletion in
the working copy, the directory is a tree conflict victim.

==========
USE CASE 4
==========

Files:
If 'svn merge' tries to modify a file that does not exist in the
target working copy, then the target file is a tree conflict victim.

Directories:
If 'svn merge' tries to modify a file in a directory that does not exist
in the target working copy, then the target directory is a tree conflict victim.

==========
USE CASE 5
==========

Files:
If 'svn merge' deletes an existing file, the file is a tree conflict
victim if its text is different from the corresponding file on the left
side of the merge source.

Directories:
If 'svn merge' deletes an existing directory, the directory is a tree conflict
victim if its content is different from the corresponding directory on the left
side of the merge source.

Question here: What makes two directories "equal"? Do we need to consider
all subdirectories of a directory? Or should we keep it simple and only
consider direct children as in all the other cases?
Warning: The former might perform quite badly!

==========
USE CASE 6
==========

File:
If 'svn merge' tries to delete a file that does not exist in the
target working copy, then the target file is a tree conflict victim.

Directory:
If 'svn merge' tries to delete a directory that does not exist in the
target working copy, then the target directory is a tree conflict victim.

=========================
OBSTRUCTIONS DURING MERGE
=========================

File:
If 'svn merge' fails to apply an operation to a file because the
file is obstructed (i.e. an unversioned item of the same name is
in the file's place), the obstructed file is a tree conflict victim.

Directory:
If 'svn merge' fails to apply an operation to a directory because the
directory is obstructed (i.e. an unversioned item of the same name is
in the directory's place), the obstructed directory is a tree conflict
victim.


-- 
Stefan Sperling <st...@elego.de>                 Software Developer
elego Software Solutions GmbH                            HRB 77719
Gustav-Meyer-Allee 25, Gebaeude 12        Tel:  +49 30 23 45 86 96 
13355 Berlin                              Fax:  +49 30 23 45 86 95
http://www.elego.de                 Geschaeftsfuehrer: Olaf Wagner

Re: tree conflicts and directories discussion

Posted by Stefan Sperling <st...@elego.de>.
On Mon, Apr 14, 2008 at 02:03:16PM +0100, Julian Foad wrote:
> Stefan Sperling wrote:
>> I think this needs to be decided on a case-by-case basis. E.g. we had
>> some discussion about what to do with deleted files, and ended up
>> carrying out the delete, but leaving the file behind unversioned. This was 
>> a good decision in my opinion.
> There are too many possible cases to discuss them all on a case-by-case 
> basis. That's why I want some rules or principles instead.

I feel like I'd only be able to make up a good rule by considering
all possible cases, taking the common denominator, and defining
special cases the common rule does not cover.

Otherwise I'd be afraid of missing something that we could run into later.

> We need to be careful about whether we're talking about a definition of 
> what it means for a directory to be "modified", or about how the 
> implementation can detect such a modification.

True. I am probably thinking too much in terms of implementation.

The limitations of the current design and implementation of Subversion
as a whole has repercussions on how we can handle tree conflicts in a
reasonable way without changing an awful lot of things. So in my thinking
I tend to take my idea of the current status of Subversion as a base to
start off from, rather than looking at the tree conflict requirements
in isolation of what's already there. This may explain why I am wording
some things differently than you do, even though we mean the same thing :)

> So we don't need to debate our definition, we only need to decide how to 
> implement it.

Sure.

>>> I have attached some notes on what tree-conflicts work is happening 
>>> where, and my opinion of the level of agreement we've reached on the 
>>> various aspects of this work.
>> 
>> Very nice. The doc overhaul may need an issue in the tracker.
> 
> Nah. It's not clear exactly what needs to be done to "overhaul" the docs, 
> so we can't know when we've finished doing it.

OK. So "The docs need to be synced with current discussion and code"
is a constant TODO item.
 
-- 
Stefan Sperling <st...@elego.de>                 Software Developer
elego Software Solutions GmbH                            HRB 77719
Gustav-Meyer-Allee 25, Gebaeude 12        Tel:  +49 30 23 45 86 96 
13355 Berlin                              Fax:  +49 30 23 45 86 95
http://www.elego.de                 Geschaeftsfuehrer: Olaf Wagner

Re: tree conflicts and directories discussion

Posted by Julian Foad <ju...@btopenworld.com>.
Stefan Sperling wrote:
> On Thu, Apr 10, 2008 at 11:32:06PM +0100, Julian Foad wrote:
> 
>>I had been thinking the following would be a clean and desirable design, 
>>bearing in mind that the topic is tree-conflict _detection_ only:
>>
>>  - "Leave the WC untouched by this attempted incoming change."
>>
>>However, the way text conflicts work is:
>>
>>  - "Leave the WC containing (files that have) all of the information about 
>>both sides of the conflict."
>>
>>The doc <notes/tree-conflicts/scratch-pad.txt> contains a "Scenario 
>>Playground" in which the thoughts appear to be more along the lines of:
>>
>>  - "Leave the WC as if the incoming change had been applied first and then 
>>the user had made the necessary changes to get back to the state he 
>>wanted."
>>
>>I have not yet tried to define how either of these other styles would apply 
>>consistently to all cases.
>>
>>Your thoughts, please.
> 
> 
> I think this needs to be decided on a case-by-case basis. E.g. we had
> some discussion about what to do with deleted files, and ended up
> carrying out the delete, but leaving the file behind unversioned. 
> This was a good decision in my opinion.
> 
> If we decided now not to make any changes to the working copy in case
> of tree conflicts, we'd have to reexamine each use case and have a huge
> discussion whether this behaviour is desirable in all use cases.
> 
> If we keep on discussing the correct behaviour on a case-by-case basis,
> we likely won't end up changing former decisions we've made -- and thus
> we won't end up doing work again that's already been done.

There are too many possible cases to discuss them all on a case-by-case basis. 
That's why I want some rules or principles instead. I want us to be able to say 
something like:

   * The aim is to leave the WC in a state where a simple "svn resolved; svn 
commit" would bring the branch back to the way the user had it, overriding the 
repository changes that were merged/updated into the WC.

or some other set of statements that guide what we do in all cases. (I don't 
necessarily agree with this statement that I used as an example.)


>>WHICH CASES WE WANT TO DETECT
>>
>>(This section is long. A practical difficulty is below, marked with "!!!".)
>>
>>I think the conflicts we want to detect on directory operations are exactly 
>>analagous to those on files, with the difference that to "modify" a 
>>directory means to modify anything in the whole directory tree.

(I meant to include adding and deleting children, grandchildren, etc. More 
formally: to "modify" a parent directory P means to modify P's properties or to 
add or delete or modify any node in the whole directory tree inside P.)

> I'm still not sure on this definition of "modify". I admit that I
> probably haven't considered all possible scenarios, but I think it
> would be nice to try to define modification of a directory as
> "a direct child of the directory was modified". If this definition

We need to be careful about whether we're talking about a definition of what it 
means for a directory to be "modified", or about how the implementation can 
detect such a modification.

If you intended your definition to be read as a recursive (self-referential) 
definition, as I'm sure you did, then it is the same as mine. The only 
difference between our definitions is that yours is expressed in recursive 
wording, and mine in non-recursive wording. They describe the same thing.

So we don't need to debate our definition, we only need to decide how to 
implement it.

> turns out to be enough (which it may well turn out to be) we can
> save us the trouble of recursing into deep directory trees to detect
> modifications.

Whether the implementation has to recurse in order to detect this modification 
depends on what information is already available in the part of the 
implementation we're considering.

For example, let's considering analysing the incoming changes in a merge, where 
the diff-callbacks provide the information. If we guarantee that a child 
modification is always done inside a (dir-open, dir-close) pair (recursively) 
and that such a pair always contains a modification, then we only need to see 
the dir-open and/or dir-close and we don't need to recurse any further to know 
that there is a modification somewhere in this directory tree. (I don't know 
how we could prevent recursion in this case, where the driver of the diff 
callbacks is in control, but that's not the point.)

Now let's consider the incoming changes in an update, where the "delta editor" 
provides the information. This is similar, in that activation of the "open dir" 
function necessarily means there is going to be a modification inside the dir.

Now let's consider analysing the local WC mods in an update. Here, the "open 
dir" while recursing into the WC does NOT mean there is going to be a 
modification found somewhere inside this dir, so we have to complete the 
recursion before we know the answer.


> Note that I consider "equality" between directories a different issue
> than "a directory has been modified". Detection equality may well need
> recursion as outlined elsewhere in this thread.

Yes. This is a significant difference between the "update/switch" cases where 
we need to detect modifications (which is easy because the WC tells us if a 
modification is scheduled), and the "merge" cases where we need to look for 
equality instead.


>>For both files and directories, and for all of "update", "switch" and 
>>"merge", the same principle applies: if the incoming change is trying to 
>>delete/move/modify something that has already gone away from the equivalent 
>>path in the target, or to create something that is already created in the 
>>target, that's a tree conflict.
>>
>>Formally:
>>
>>  | Change ... merged onto ... TargetChange
>>  |   |     |
>>  |   v     | CREATE  GO-AWAY REPLACE MODIFY
>>  | ------- + ------- ------- ------- -------
>>  | CREATE  |   C       X       X       X
>>  |         |
>>  | GO-AWAY |   X       C       C       C
>>  |         |
>>  | REPLACE |   X       C       C       C
>>  |         |
>>  | MODIFY  |   X       C       C     merge
> 
> 
> Yes.
> 
> 
>>!!!
>>A practical difficulty arises with determining the "target change" in a 
>>"merge" situation. The correct behaviour is to compare the merge-left 
>>source and the target in order to decide whether the target path under 
>>consideration is to be considered "created" or "gone away" (these being 
>>easy to determine) or "replaced" or "modified" (these being potentially 
>>much more expensive to determine). A directory tree could be modified only 
>>somewhere deep in its hierarchy and there is no way to determine this just 
>>by looking at the information immediately available for the directory 
>>itself.
>>
>>The implementation uses the diff_callbacks mechanism to communicate the 
>>change being applied by the merge. We are adding (in the "diff-callbacks3" 
>>branch) "dir open" and "dir close" callbacks. Perhaps we can use something 
>>like this to provide enough information to more quickly determine whether a 
>>given directory D in the merge-left source is in fact identical to the 
>>corresponding one in the target.
> 
>>The current approach taken in the implementation is, when a case occurs 
>>that might be a conflict (and the only such case is a merge trying to 
>>delete a file that might have be "modified" (meaning "different") on the 
>>target), we do a full-text comparison if necessary. Now that we want to 
>>extend this to the case of a directory tree, we think a full comparison may 
>>be prohibitively expensive.
>>
>>Thoughts?
> 
> 
> I think we will have to walk the directory tree, querying the repository
> for each node about the equivalent on the merge left. As this is very
> expensive, I really hope there is a more efficient way to do this.
> 
> I have no better idea currently, though :(

Yes, I think you are right. I will look for better ways when I get to that part.


>>MISC.
>>
>>I have attached some notes on what tree-conflicts work is happening where, 
>>and my opinion of the level of agreement we've reached on the various 
>>aspects of this work.
> 
> Very nice. The doc overhaul may need an issue in the tracker.

Nah. It's not clear exactly what needs to be done to "overhaul" the docs, so we 
can't know when we've finished doing it.


>>How Subversion presents tree conflicts:
>>
>>  - General agreement. The precise form of the messages and status indications
>>    are not seen as important as long as they give enough information.
>>
>>  - Exception: Unclear on the expected WC state when a conflict is raised, which
>>    is an important issue related to resolving. e.g. the outcomes proposed in
>>    <trunk/notes/tree-conflicts/scratch-pad.txt> are radically different from
>>    some of our thinking.
> 
> The scratch-pad contains very early notes by C. Michael Pilato.
> Since he has been giving feedback about the current implementation,
> I guess he won't object with the current state of things, even if
> some of it contradicts the scratch pad.

OK.


- Julian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: tree conflicts and directories discussion

Posted by Stefan Sperling <st...@elego.de>.
On Thu, Apr 10, 2008 at 11:32:06PM +0100, Julian Foad wrote:
> I had been thinking the following would be a clean and desirable design, 
> bearing in mind that the topic is tree-conflict _detection_ only:
> 
>   - "Leave the WC untouched by this attempted incoming change."
> 
> However, the way text conflicts work is:
> 
>   - "Leave the WC containing (files that have) all of the information about 
> both sides of the conflict."
> 
> The doc <notes/tree-conflicts/scratch-pad.txt> contains a "Scenario 
> Playground" in which the thoughts appear to be more along the lines of:
> 
>   - "Leave the WC as if the incoming change had been applied first and then 
> the user had made the necessary changes to get back to the state he 
> wanted."
> 
> I have not yet tried to define how either of these other styles would apply 
> consistently to all cases.
> 
> Your thoughts, please.

I think this needs to be decided on a case-by-case basis. E.g. we had
some discussion about what to do with deleted files, and ended up
carrying out the delete, but leaving the file behind unversioned. 
This was a good decision in my opinion.

If we decided now not to make any changes to the working copy in case
of tree conflicts, we'd have to reexamine each use case and have a huge
discussion whether this behaviour is desirable in all use cases.

If we keep on discussing the correct behaviour on a case-by-case basis,
we likely won't end up changing former decisions we've made -- and thus
we won't end up doing work again that's already been done.
 
> WHICH CASES WE WANT TO DETECT
> 
> (This section is long. A practical difficulty is below, marked with "!!!".)
> 
> I think the conflicts we want to detect on directory operations are exactly 
> analagous to those on files, with the difference that to "modify" a 
> directory means to modify anything in the whole directory tree.

I'm still not sure on this definition of "modify". I admit that I
probably haven't considered all possible scenarios, but I think it
would be nice to try to define modification of a directory as
"a direct child of the directory was modified". If this definition
turns out to be enough (which it may well turn out to be) we can
save us the trouble of recursing into deep directory trees to detect
modifications.

Note that I consider "equality" between directories a different issue
than "a directory has been modified". Detection equality may well need
recursion as outlined elsewhere in this thread.

> For both files and directories, and for all of "update", "switch" and 
> "merge", the same principle applies: if the incoming change is trying to 
> delete/move/modify something that has already gone away from the equivalent 
> path in the target, or to create something that is already created in the 
> target, that's a tree conflict.
> 
> Formally:
> 
>   | Change ... merged onto ... TargetChange
>   |   |     |
>   |   v     | CREATE  GO-AWAY REPLACE MODIFY
>   | ------- + ------- ------- ------- -------
>   | CREATE  |   C       X       X       X
>   |         |
>   | GO-AWAY |   X       C       C       C
>   |         |
>   | REPLACE |   X       C       C       C
>   |         |
>   | MODIFY  |   X       C       C     merge

Yes.

> !!!
> A practical difficulty arises with determining the "target change" in a 
> "merge" situation. The correct behaviour is to compare the merge-left 
> source and the target in order to decide whether the target path under 
> consideration is to be considered "created" or "gone away" (these being 
> easy to determine) or "replaced" or "modified" (these being potentially 
> much more expensive to determine). A directory tree could be modified only 
> somewhere deep in its hierarchy and there is no way to determine this just 
> by looking at the information immediately available for the directory 
> itself.
> 
> The implementation uses the diff_callbacks mechanism to communicate the 
> change being applied by the merge. We are adding (in the "diff-callbacks3" 
> branch) "dir open" and "dir close" callbacks. Perhaps we can use something 
> like this to provide enough information to more quickly determine whether a 
> given directory D in the merge-left source is in fact identical to the 
> corresponding one in the target.

> The current approach taken in the implementation is, when a case occurs 
> that might be a conflict (and the only such case is a merge trying to 
> delete a file that might have be "modified" (meaning "different") on the 
> target), we do a full-text comparison if necessary. Now that we want to 
> extend this to the case of a directory tree, we think a full comparison may 
> be prohibitively expensive.
> 
> Thoughts?

I think we will have to walk the directory tree, querying the repository
for each node about the equivalent on the merge left. As this is very
expensive, I really hope there is a more efficient way to do this.

I have no better idea currently, though :(
 
> MISC.
> 
> I have attached some notes on what tree-conflicts work is happening where, 
> and my opinion of the level of agreement we've reached on the various 
> aspects of this work.

Very nice. The doc overhaul may need an issue in the tracker.

> How Subversion presents tree conflicts:
> 
>   - General agreement. The precise form of the messages and status indications
>     are not seen as important as long as they give enough information.
> 
>   - Exception: Unclear on the expected WC state when a conflict is raised, which
>     is an important issue related to resolving. e.g. the outcomes proposed in
>     <trunk/notes/tree-conflicts/scratch-pad.txt> are radically different from
>     some of our thinking.

The scratch-pad contains very early notes by C. Michael Pilato.
Since he has been giving feedback about the current implementation,
I guess he won't object with the current state of things, even if
some of it contradicts the scratch pad.

-- 
Stefan Sperling <st...@elego.de>                 Software Developer
elego Software Solutions GmbH                            HRB 77719
Gustav-Meyer-Allee 25, Gebaeude 12        Tel:  +49 30 23 45 86 96 
13355 Berlin                              Fax:  +49 30 23 45 86 95
http://www.elego.de                 Geschaeftsfuehrer: Olaf Wagner

Re: tree conflicts and directories discussion

Posted by Julian Foad <ju...@btopenworld.com>.
Dear tree conflict fans,

Can you help me check whether we're going the right way about detecting 
conflicting directories? I have a few points in particular to raise.


HOW A TREE CONFLICT IS REPORTED

The reporting is not something we particularly need to discuss in this thread.


HOW A TREE CONFLICT AFFECTS THE WC

When a conflict occurs, the WC should be left in a state that enables the user 
to resolve the conflict. What does this mean in practice?

I had been thinking the following would be a clean and desirable design, 
bearing in mind that the topic is tree-conflict _detection_ only:

   - "Leave the WC untouched by this attempted incoming change."

However, the way text conflicts work is:

   - "Leave the WC containing (files that have) all of the information about 
both sides of the conflict."

The doc <notes/tree-conflicts/scratch-pad.txt> contains a "Scenario Playground" 
in which the thoughts appear to be more along the lines of:

   - "Leave the WC as if the incoming change had been applied first and then 
the user had made the necessary changes to get back to the state he wanted."

I have not yet tried to define how either of these other styles would apply 
consistently to all cases.

Your thoughts, please.


WHICH CASES WE WANT TO DETECT

(This section is long. A practical difficulty is below, marked with "!!!".)

I think the conflicts we want to detect on directory operations are exactly 
analagous to those on files, with the difference that to "modify" a directory 
means to modify anything in the whole directory tree.

For both files and directories, and for all of "update", "switch" and "merge", 
the same principle applies: if the incoming change is trying to 
delete/move/modify something that has already gone away from the equivalent 
path in the target, or to create something that is already created in the 
target, that's a tree conflict.

Formally:

   | Change ... merged onto ... TargetChange
   |   |     |
   |   v     | CREATE  GO-AWAY REPLACE MODIFY
   | ------- + ------- ------- ------- -------
   | CREATE  |   C       X       X       X
   |         |
   | GO-AWAY |   X       C       C       C
   |         |
   | REPLACE |   X       C       C       C
   |         |
   | MODIFY  |   X       C       C     merge

   Change = incoming change being applied by "update" or "switch" or "merge"
   TargetChange = how the target looks, compared with merge-left source

   C = tree conflict
   X = can't happen, by definition
   merge = the kind of merge that Subversion already does


   CREATE:
   # file-add(F) = add-new(F)        or copy(to F)(and modify?)
   # dir-add(D)  = add-new(D)(deep?) or copy(to D)(and modify?)

   GO-AWAY:
   # file-del(F) = del(F) or move(F away)
   # dir-del(D)  = del(D) or move(D away)

   REPLACE:
   # file-rep(F) = (file-del(F) + file-add(F))
   # dir-rep(D)  = (dir-del(D) + dir-add(D))

   MODIFY:
   # file-mod(F) = text-mod(F) and/or prop-mod(F)
   # dir-mod(D)  = prop-mod(D) and/or file-mod(child-F) and/or dir-mod(child-D)


Meaning of "target change":

For "update" and "switch", the base for both the incoming change and the target 
change is the same thing: the WC "base". The "target change" is the local 
change scheduled in the WC.

For "merge", the "target change" against a "base" is in fact easy to define. 
There may be a common ancestor of the two branches, but Subversion doesn't care 
about that when applying the change. All it cares about is whether the target 
looks the same as the merge-left source or whether it looks like it has already 
been changed, and that is what we mean here by a "target change". This "target 
change" that Subversion sees can be due to what the target branch already looks 
like in the repository, and/or due to local scheduled WC changes.


!!!
A practical difficulty arises with determining the "target change" in a "merge" 
situation. The correct behaviour is to compare the merge-left source and the 
target in order to decide whether the target path under consideration is to be 
considered "created" or "gone away" (these being easy to determine) or 
"replaced" or "modified" (these being potentially much more expensive to 
determine). A directory tree could be modified only somewhere deep in its 
hierarchy and there is no way to determine this just by looking at the 
information immediately available for the directory itself.

The implementation uses the diff_callbacks mechanism to communicate the change 
being applied by the merge. We are adding (in the "diff-callbacks3" branch) 
"dir open" and "dir close" callbacks. Perhaps we can use something like this to 
provide enough information to more quickly determine whether a given directory 
D in the merge-left source is in fact identical to the corresponding one in the 
target.

The current approach taken in the implementation is, when a case occurs that 
might be a conflict (and the only such case is a merge trying to delete a file 
that might have be "modified" (meaning "different") on the target), we do a 
full-text comparison if necessary. Now that we want to extend this to the case 
of a directory tree, we think a full comparison may be prohibitively expensive.

Thoughts?


MISC.

I have attached some notes on what tree-conflicts work is happening where, and 
my opinion of the level of agreement we've reached on the various aspects of 
this work.

- Julian

Re: tree conflicts and directories discussion

Posted by Nico Schellingerhout <ni...@philips.com>.
Stefan Sperling <st...@elego.de> wrote on 04/04/2008 12:29:36 PM:

> On Fri, Apr 04, 2008 at 11:08:58AM +0200, Nico Schellingerhout wrote:
> > Stefan Sperling <st...@elego.de> wrote on 04/03/2008 06:28:13 PM:
> > > I think your example is a good one, because I hadn't thought of this
> > > scenario yet. But all that comes to my mind about it is that,
> > > when dir1 is deleted, which contains tree conflict victim dir1/b',
> > > and the deletion of dir1 itself causes a tree conflict, we can 
either
> > >
> > > 1) forget about that dir1/b' is a tree conflict victim,
> > >    saying that the tree conflict with victim dir1 supersedes it
> > >
> > > or
> > >
> > > 2) don't discard any information but record dir1 as
> > >    an additional tree conflict victim
> > I think the second option is better: the user needs to understand that
> > the action that led to the conflict is the delete of dir1, but also
> > should get hints at what he has to do to fix the conflicts. Therefore,
> > a conflict on the modified files/dirs should be raised as well.
> > 
> > Let me see if I can sketch an example, to start off the discussion:
> > 
> > Start on source:
> > dir1/
> > dir1/a
> > dir1/b
> > dir1/dir2/
> > dir1/dir2/c
> > 
> > Revision to merge:
> > D dir1
> > 
> > Start on target:
> > dir1/
> > dir1/a
> > dir1/b'
> > dir1/dir2/
> > dir1/dir2/c
> > dir1/dir2/d
> > 
> > I would like to see the merge to have something like the following 
result:
> 
> > C dir1/ [could not be deleted because of local modifications in b'and 
dir2]
> 
> With "local modifications", you mean both modifications local to the
> working copy, and modifications made in the history of the branch the
> working copy is based on, which are not present in the source branch
> of the merge. Right?

Right.

> 
> In which case, "dir1 could not be deleted because of local modifications
> in b' and dir2" is the same as "dir1 could not be deleted because there
> are tree conflicts in the tree rooted at dir1". If we could use such a
> recursive definition, the implementation should not be too hard, and
> an extra crawl would not be required, since we already have information
> about tree conflicts inside a directory in the close_dir() callback
> during merge anyway.

Ok.

> 
> Also, I think the output should be:
> 
>   C . [dir1 could not be deleted because of local modifications in 
> b' and dir2]
>   D dir1/ 

You're right.

> 
> since the conflict is marked at the parent directory of dir1,
> which is the working copy root in this case.
> 
> Keep in mind that we never delete files that are tree conflict
> victims from disk (at least since r30144), see
> http://subversion.tigris.org/issues/show_bug.cgi?id=3149
> We just mark them for deletion.

Ok.

> 
> So we could define the same behaviour for directories, which would
> be consistent with both the behaviour for files, and your example.
> If we did that, dir1 (and all its children) would be marked for
> deletion, but dir1 and its children would not be removed from disk.
> 
> But I'm not sure yet what happens if we try to mark a tree conflicted
> item for deletion. Will the delete bail out?

That's what I was wondering about as well.

> 
> I think we'd need to do this though -- the current code will
> unconditionally mark the whole tree rooted at a directory deleted
> during merge for deletion. I don't really want to change that code
> unless we absolutely have to. We have changed a lot of code already.
> Merging the tree-conflicts branch into trunk will get increasingly
> complex if we touch even more code. So if at all possible, let's try
> to find a way of dealing with directories that makes as little
> changes to the current way of doing things as possible.

I think marking for deletion of non-conflicting files/dirs is ok, but
let's make sure that it is consistent: either everything (that is not
conflicting) is marked for deletion, or everything is left in place.

> 
> >   dir1/a
> >   dir1/b'
> > C dir1/dir2/ [could not be deleted because file d is not in the source
> > branch]
> 
> In this case, "could not be deleted because file d is not in the source
> branch" is the same as: "could not be deleted because there is a tree
> conflict in the tree rooted at dir2". Is it? If so, the recursive

Correct.

> definition I proposed above would also work here.

- Nico

Re: tree conflicts and directories discussion

Posted by Stefan Sperling <st...@elego.de>.
On Fri, Apr 04, 2008 at 11:08:58AM +0200, Nico Schellingerhout wrote:
> Stefan Sperling <st...@elego.de> wrote on 04/03/2008 06:28:13 PM:
> > I think your example is a good one, because I hadn't thought of this
> > scenario yet. But all that comes to my mind about it is that,
> > when dir1 is deleted, which contains tree conflict victim dir1/b',
> > and the deletion of dir1 itself causes a tree conflict, we can either
> >
> > 1) forget about that dir1/b' is a tree conflict victim,
> >    saying that the tree conflict with victim dir1 supersedes it
> >
> > or
> >
> > 2) don't discard any information but record dir1 as
> >    an additional tree conflict victim
> I think the second option is better: the user needs to understand that
> the action that led to the conflict is the delete of dir1, but also
> should get hints at what he has to do to fix the conflicts. Therefore,
> a conflict on the modified files/dirs should be raised as well.
> 
> Let me see if I can sketch an example, to start off the discussion:
> 
> Start on source:
> dir1/
> dir1/a
> dir1/b
> dir1/dir2/
> dir1/dir2/c
> 
> Revision to merge:
> D dir1
> 
> Start on target:
> dir1/
> dir1/a
> dir1/b'
> dir1/dir2/
> dir1/dir2/c
> dir1/dir2/d
> 
> I would like to see the merge to have something like the following result:

> C dir1/ [could not be deleted because of local modifications in b' and dir2]

With "local modifications", you mean both modifications local to the
working copy, and modifications made in the history of the branch the
working copy is based on, which are not present in the source branch
of the merge. Right?

In which case, "dir1 could not be deleted because of local modifications
in b' and dir2" is the same as "dir1 could not be deleted because there
are tree conflicts in the tree rooted at dir1". If we could use such a
recursive definition, the implementation should not be too hard, and
an extra crawl would not be required, since we already have information
about tree conflicts inside a directory in the close_dir() callback
during merge anyway.

Also, I think the output should be:

  C . [dir1 could not be deleted because of local modifications in b' and dir2]
  D dir1/ 

since the conflict is marked at the parent directory of dir1,
which is the working copy root in this case.

Keep in mind that we never delete files that are tree conflict
victims from disk (at least since r30144), see
http://subversion.tigris.org/issues/show_bug.cgi?id=3149
We just mark them for deletion.

So we could define the same behaviour for directories, which would
be consistent with both the behaviour for files, and your example.
If we did that, dir1 (and all its children) would be marked for
deletion, but dir1 and its children would not be removed from disk.

But I'm not sure yet what happens if we try to mark a tree conflicted
item for deletion. Will the delete bail out?

I think we'd need to do this though -- the current code will
unconditionally mark the whole tree rooted at a directory deleted
during merge for deletion. I don't really want to change that code
unless we absolutely have to. We have changed a lot of code already.
Merging the tree-conflicts branch into trunk will get increasingly
complex if we touch even more code. So if at all possible, let's try
to find a way of dealing with directories that makes as little
changes to the current way of doing things as possible.

>   dir1/a
>   dir1/b'
> C dir1/dir2/ [could not be deleted because file d is not in the source
> branch]

In this case, "could not be deleted because file d is not in the source
branch" is the same as: "could not be deleted because there is a tree
conflict in the tree rooted at dir2". Is it? If so, the recursive
definition I proposed above would also work here.

>   dir1/dir2/c
>   dir1/dir2/d
> 
> This gives the user the information needed to solve all problems in a
> systematic manner.

Yes.

> > My arguments may well be incorrect as long as I haven't fully understood
> > all problems involved yet. Which I apparently haven't, since I have
> > trouble understanding your point against not doing a crawl :(
> >
> Perhaps the above example helps. The problem is that I do not understand
> the merge algorithm in sufficient detail to predict what would happen,
> and that is why I raised the issue. Given the complexity of merging,
> we're probably both wrong ;)

:)

Well, I hope we can get some comments by people who know more
about merging than we do!

-- 
Stefan Sperling <st...@elego.de>                 Software Developer
elego Software Solutions GmbH                            HRB 77719
Gustav-Meyer-Allee 25, Gebaeude 12        Tel:  +49 30 23 45 86 96 
13355 Berlin                              Fax:  +49 30 23 45 86 95
http://www.elego.de                 Geschaeftsfuehrer: Olaf Wagner

Re: tree conflicts and directories discussion

Posted by Nico Schellingerhout <ni...@philips.com>.
Stefan Sperling <st...@elego.de> wrote on 04/03/2008 06:28:13 PM:

> On Thu, Apr 03, 2008 at 05:30:17PM +0200, Nico Schellingerhout wrote:
> > Stefan Sperling <st...@elego.de> wrote on 04/03/2008 11:31:08 AM:
> > > The recursive behaviour you want should come for free with the
> > > the way the merge operation walks the tree (depth first).
> > > I mean, suppose we had the following scenario during a merge:
> > >
> > > The diff wants to change the files 'a' and 'b' in these directories:
> > >
> > > dir1/dir2/dir3/a
> > > dir1/dir4/dir5/b
> > >
> > > The user has locally delted dir1/dir4/dir5 (and hence all its 
content).
> > >
> > > Now, if I understand correctly, with your definition, flagging the 
tree
> > > conflict at dir1 would be allowed: "modification anywhere in the 
subtree
> > > rooted at the directory under consideration" -- the directory under
> > > consideration happening to be dir1, for example.
> > >
> > > But this is redundant. The conflict can also be flagged directly
> > > at dir4, the direct parent of dir5. And this not only has the same
> > > effect as flagging it at dir1 (the effect being that a tree conflict
> > > is signalled), but it's also a much simpler design that is much 
easier
> > > to implement.
> > This looks ok for UC 1 and 4, but what about 2 and 5:
> >
> > Source:
> > dir1/
> > dir1/a
> > dir1/b
> >
> > Target:
> > dir1/
> > dir1/a
> > dir1/b'
> >
> > Revision on Source to be merged to target:
> > D dir1
> >
> > What happens?
> > If the algorithm crawls in the order dir1/a, dir1/b, dir, then dir1/a
> > will be deleted (no modification, therefore delete is ok), and tree
> > conflict when dir1/b is reached. The WC will then look like:
> >
> > C dir1/
> >   dir1/b'
> >
> > Right?
> 
> Seems to be the case. Michael Pilato just told me that Subversion
> visits children of deleted directories to check for unversioned
> items before pruning the subtree. So yes, with the current code,
> this is probably what would happen.
> 

Ok. That's good.

> But I don't understand why you stop your example at a point before
> the deletion of dir1 is actually carried out. Because that is what
> I think we are trying to define: The behaviour we want from Subversion
> in all our tree conflict use cases, but with directories instead of
> files as victims. Our entire current tree conflict design is only
> about files as victims. We cannot implement anything before we've
> defined how tree conflicts should behave with directories as victims.
> 
> > The same argument could be given for modifications at any depth in the
> > tree. Hence, the necessity to check the _entire_ subtree.
> 
> I don't see how your example contradicts with not doing a full subtree
> crawl anytime we find a use case 2 or use case 5 tree conflict with
> a directory as the victim.

You're right, of course. I was thinking out aloud, not having reached
a definite conclusion. My implied suggestion was that dir1 should not
be deleted, but left in conflict in the working copy.

> 
> I think your example is a good one, because I hadn't thought of this
> scenario yet. But all that comes to my mind about it is that,
> when dir1 is deleted, which contains tree conflict victim dir1/b',
> and the deletion of dir1 itself causes a tree conflict, we can either
> 
> 1) forget about that dir1/b' is a tree conflict victim,
>    saying that the tree conflict with victim dir1 supersedes it
> 
> or
> 
> 2) don't discard any information but record dir1 as
>    an additional tree conflict victim

I think the second option is better: the user needs to understand that
the action that led to the conflict is the delete of dir1, but also
should get hints at what he has to do to fix the conflicts. Therefore,
a conflict on the modified files/dirs should be raised as well.

Let me see if I can sketch an example, to start off the discussion:

Start on source:
dir1/
dir1/a
dir1/b
dir1/dir2/
dir1/dir2/c

Revision to merge:
D dir1

Start on target:
dir1/
dir1/a
dir1/b'
dir1/dir2/
dir1/dir2/c
dir1/dir2/d

I would like to see the merge to have something like the following result:

C dir1/ [could not be deleted because of local modifications in b' and 
dir2]
  dir1/a
  dir1/b'
C dir1/dir2/ [could not be deleted because file d is not in the source 
branch]
  dir1/dir2/c
  dir1/dir2/d

This gives the user the information needed to solve all problems in a
systematic manner.

> 
> We'd need to think through the consequences of either action
> and then decide what we want. There may be other things we could
> do that I've overlooked. But the point is that we need to define
> some behaviour.

See my proposal above.

> 
> > I think we agree on what a modified dir is, but I'm not convinced that
> > your arguments about the crawling are correct.
> 
> My arguments may well be incorrect as long as I haven't fully understood
> all problems involved yet. Which I apparently haven't, since I have
> trouble understanding your point against not doing a crawl :(
> 

Perhaps the above example helps. The problem is that I do not understand
the merge algorithm in sufficient detail to predict what would happen,
and that is why I raised the issue. Given the complexity of merging,
we're probably both wrong ;)

- Nico

Re: tree conflicts and directories discussion

Posted by Stefan Sperling <st...@elego.de>.
On Thu, Apr 03, 2008 at 05:30:17PM +0200, Nico Schellingerhout wrote:
> Stefan Sperling <st...@elego.de> wrote on 04/03/2008 11:31:08 AM:
> > The recursive behaviour you want should come for free with the
> > the way the merge operation walks the tree (depth first).
> > I mean, suppose we had the following scenario during a merge:
> >
> > The diff wants to change the files 'a' and 'b' in these directories:
> >
> > dir1/dir2/dir3/a
> > dir1/dir4/dir5/b
> >
> > The user has locally delted dir1/dir4/dir5 (and hence all its content).
> >
> > Now, if I understand correctly, with your definition, flagging the tree
> > conflict at dir1 would be allowed: "modification anywhere in the subtree
> > rooted at the directory under consideration" -- the directory under
> > consideration happening to be dir1, for example.
> >
> > But this is redundant. The conflict can also be flagged directly
> > at dir4, the direct parent of dir5. And this not only has the same
> > effect as flagging it at dir1 (the effect being that a tree conflict
> > is signalled), but it's also a much simpler design that is much easier
> > to implement.
> This looks ok for UC 1 and 4, but what about 2 and 5:
>
> Source:
> dir1/
> dir1/a
> dir1/b
>
> Target:
> dir1/
> dir1/a
> dir1/b'
>
> Revision on Source to be merged to target:
> D dir1
>
> What happens?
> If the algorithm crawls in the order dir1/a, dir1/b, dir, then dir1/a
> will be deleted (no modification, therefore delete is ok), and tree
> conflict when dir1/b is reached. The WC will then look like:
>
> C dir1/
>   dir1/b'
>
> Right?

Seems to be the case. Michael Pilato just told me that Subversion
visits children of deleted directories to check for unversioned
items before pruning the subtree. So yes, with the current code,
this is probably what would happen.

But I don't understand why you stop your example at a point before
the deletion of dir1 is actually carried out. Because that is what
I think we are trying to define: The behaviour we want from Subversion
in all our tree conflict use cases, but with directories instead of
files as victims. Our entire current tree conflict design is only
about files as victims. We cannot implement anything before we've
defined how tree conflicts should behave with directories as victims.

> The same argument could be given for modifications at any depth in the
> tree. Hence, the necessity to check the _entire_ subtree.

I don't see how your example contradicts with not doing a full subtree
crawl anytime we find a use case 2 or use case 5 tree conflict with
a directory as the victim.

I think your example is a good one, because I hadn't thought of this
scenario yet. But all that comes to my mind about it is that,
when dir1 is deleted, which contains tree conflict victim dir1/b',
and the deletion of dir1 itself causes a tree conflict, we can either

1) forget about that dir1/b' is a tree conflict victim,
   saying that the tree conflict with victim dir1 supersedes it

or

2) don't discard any information but record dir1 as
   an additional tree conflict victim

We'd need to think through the consequences of either action
and then decide what we want. There may be other things we could
do that I've overlooked. But the point is that we need to define
some behaviour.

> I think we agree on what a modified dir is, but I'm not convinced that
> your arguments about the crawling are correct.

My arguments may well be incorrect as long as I haven't fully understood
all problems involved yet. Which I apparently haven't, since I have
trouble understanding your point against not doing a crawl :(

-- 
Stefan Sperling <st...@elego.de>                 Software Developer
elego Software Solutions GmbH                            HRB 77719
Gustav-Meyer-Allee 25, Gebaeude 12        Tel:  +49 30 23 45 86 96 
13355 Berlin                              Fax:  +49 30 23 45 86 95
http://www.elego.de                 Geschaeftsfuehrer: Olaf Wagner

Re: tree conflicts and directories discussion

Posted by Nico Schellingerhout <ni...@philips.com>.
Stefan Sperling <st...@elego.de> wrote on 04/03/2008 11:31:08 AM:

> On Thu, Apr 03, 2008 at 08:57:47AM +0200, Nico Schellingerhout wrote:
> > Stefan Sperling <st...@elego.de> wrote on 04/02/2008 07:46:17 PM:
> > >
> > > How do we define modification of a directory?
> > > Is a directory modified only when direct children are removed/added?
> > > Or are there other cases?
> > I think modification means: modification anywhere in the subtree
> > rooted by the directory under consideration, including properties
> > on that directory.
> 
> Hey Nico,
> 
> I think defining the behaviour for only for a single directory is
> enough. I mean, for any kind of directory modification, we need
> only look at the direct children of a given directory.
> 
> The recursive behaviour you want should come for free with the
> the way the merge operation walks the tree (depth first).
> I mean, suppose we had the following scenario during a merge:
> 
> The diff wants to change the files 'a' and 'b' in these directories:
> 
> dir1/dir2/dir3/a
> dir1/dir4/dir5/b
> 
> The user has locally delted dir1/dir4/dir5 (and hence all its content).
> 
> Now, if I understand correctly, with your definition, flagging the tree
> conflict at dir1 would be allowed: "modification anywhere in the subtree
> rooted at the directory under consideration" -- the directory under
> consideration happening to be dir1, for example.
> 
> But this is redundant. The conflict can also be flagged directly
> at dir4, the direct parent of dir5. And this not only has the same
> effect as flagging it at dir1 (the effect being that a tree conflict
> is signalled), but it's also a much simpler design that is much easier
> to implement.

This looks ok for UC 1 and 4, but what about 2 and 5:

Source:
dir1/
dir1/a
dir1/b

Target:
dir1/
dir1/a
dir1/b'

Revision on Source to be merged to target:
D dir1

What happens?
If the algorithm crawls in the order dir1/a, dir1/b, dir, then dir1/a
will be deleted (no modification, therefore delete is ok), and tree
conflict when dir1/b is reached. The WC will then look like:

C dir1/
  dir1/b'

Right?

The same argument could be given for modifications at any depth in the
tree. Hence, the necessity to check the _entire_ subtree.

> 
> The corresponding 'svn merge' output to the above scenario might look
> like this:
> 
> G dir1/dir2/dir3/a
> C dir1/dir4

That is what I would expect.

> 
> Starting another subtree crawl while we are already crawling the tree
> anyway is not a good idea, IMHO. My concerns have nothing to do with
> performance here, I'm simply arguing about the complexity of the design
> and the complexity of the resulting implementation.

If it can be done in a simple manner, I agree, of course.

> 
> I'm not sure if you were implying spawning another crawl, but I want to
> make sure we have the same idea about what "modification of a directory"
> is supposed to mean.

I think we agree on what a modified dir is, but I'm not convinced that
your arguments about the crawling are correct.

- Nico

Re: tree conflicts and directories discussion

Posted by Stefan Sperling <st...@elego.de>.
On Thu, Apr 03, 2008 at 08:57:47AM +0200, Nico Schellingerhout wrote:
> Stefan Sperling <st...@elego.de> wrote on 04/02/2008 07:46:17 PM:
> >
> > How do we define modification of a directory?
> > Is a directory modified only when direct children are removed/added?
> > Or are there other cases?
> I think modification means: modification anywhere in the subtree
> rooted by the directory under consideration, including properties
> on that directory.

Hey Nico,

I think defining the behaviour for only for a single directory is
enough. I mean, for any kind of directory modification, we need
only look at the direct children of a given directory.

The recursive behaviour you want should come for free with the
the way the merge operation walks the tree (depth first).
I mean, suppose we had the following scenario during a merge:

The diff wants to change the files 'a' and 'b' in these directories:

dir1/dir2/dir3/a
dir1/dir4/dir5/b

The user has locally delted dir1/dir4/dir5 (and hence all its content).

Now, if I understand correctly, with your definition, flagging the tree
conflict at dir1 would be allowed: "modification anywhere in the subtree
rooted at the directory under consideration" -- the directory under
consideration happening to be dir1, for example.

But this is redundant. The conflict can also be flagged directly
at dir4, the direct parent of dir5. And this not only has the same
effect as flagging it at dir1 (the effect being that a tree conflict
is signalled), but it's also a much simpler design that is much easier
to implement.

The corresponding 'svn merge' output to the above scenario might look
like this:

G dir1/dir2/dir3/a
C dir1/dir4

Starting another subtree crawl while we are already crawling the tree
anyway is not a good idea, IMHO. My concerns have nothing to do with
performance here, I'm simply arguing about the complexity of the design
and the complexity of the resulting implementation.

I'm not sure if you were implying spawning another crawl, but I want to
make sure we have the same idea about what "modification of a directory"
is supposed to mean.

Thanks,
-- 
Stefan Sperling <st...@elego.de>                 Software Developer
elego Software Solutions GmbH                            HRB 77719
Gustav-Meyer-Allee 25, Gebaeude 12        Tel:  +49 30 23 45 86 96 
13355 Berlin                              Fax:  +49 30 23 45 86 95
http://www.elego.de                 Geschaeftsfuehrer: Olaf Wagner

Re: tree conflicts and directories discussion

Posted by Nico Schellingerhout <ni...@philips.com>.
Stefan Sperling <st...@elego.de> wrote on 04/02/2008 07:46:17 PM:

> On Wed, Apr 02, 2008 at 07:29:53PM +0200, Stefan Sperling wrote:
> > ==========
> > USE CASE 1
> > ==========
> > 
> > File:
> > If 'svn update' modifies a file that has been scheduled for deletion
> > in the working copy, the file is a tree conflict victim.
> > 
> > Directory:
> > If 'svn update' modifies a file in a directory that has been scheduled
> > for deletion in the working copy, the directory is a tree conflict 
victim.
> 
> This would mean, of course, to mark the same conflict redundantly.
> 
> Sorry.
> 
> This should probably have been:
> 
>  If 'svn update' modifies a directory that has been scheduled for
>  deletion in the working copy, the directory is a tree conflict victim.

Agreed.

> 
> How do we define modification of a directory?
> Is a directory modified only when direct children are removed/added?
> Or are there other cases?

I think modification means: modification anywhere in the subtree
rooted by the directory under consideration, including properties
on that directory.

> 
> > ==========
> > USE CASE 4
> > ==========
> > 
> > Files:
> > If 'svn merge' tries to modify a file that does not exist in the
> > target working copy, then the target file is a tree conflict victim.
> > 
> > Directories:
> > If 'svn merge' tries to modify a file in a directory that does not 
exist
> > in the target working copy, then the target directory is a tree 
> conflict victim.
> 
> Same here:
> 
>  If 'svn merge' tries to modify a directory that does not exist in the
>  target working copy, then the target directory is a tree conflict 
victim.

Agreed.

- Nico
 

Re: tree conflicts and directories discussion

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Apr 02, 2008 at 07:29:53PM +0200, Stefan Sperling wrote:
> ==========
> USE CASE 1
> ==========
> 
> File:
> If 'svn update' modifies a file that has been scheduled for deletion
> in the working copy, the file is a tree conflict victim.
> 
> Directory:
> If 'svn update' modifies a file in a directory that has been scheduled
> for deletion in the working copy, the directory is a tree conflict victim.

This would mean, of course, to mark the same conflict redundantly.

Sorry.

This should probably have been:

 If 'svn update' modifies a directory that has been scheduled for
 deletion in the working copy, the directory is a tree conflict victim.

How do we define modification of a directory?
Is a directory modified only when direct children are removed/added?
Or are there other cases?

> ==========
> USE CASE 4
> ==========
> 
> Files:
> If 'svn merge' tries to modify a file that does not exist in the
> target working copy, then the target file is a tree conflict victim.
> 
> Directories:
> If 'svn merge' tries to modify a file in a directory that does not exist
> in the target working copy, then the target directory is a tree conflict victim.

Same here:

 If 'svn merge' tries to modify a directory that does not exist in the
 target working copy, then the target directory is a tree conflict victim.

Thanks,
-- 
Stefan Sperling <st...@elego.de>                 Software Developer
elego Software Solutions GmbH                            HRB 77719
Gustav-Meyer-Allee 25, Gebaeude 12        Tel:  +49 30 23 45 86 96 
13355 Berlin                              Fax:  +49 30 23 45 86 95
http://www.elego.de                 Geschaeftsfuehrer: Olaf Wagner

Re: tree conflicts and directories discussion

Posted by Stefan Sperling <st...@elego.de>.
On Thu, Apr 03, 2008 at 01:06:42PM +0200, Stefan Sperling wrote:
> On Thu, Apr 03, 2008 at 08:40:27AM +0200, Nico Schellingerhout wrote:
> > Stefan Sperling <st...@elego.de> wrote on 04/02/2008 07:29:53 PM:
> > > Directories:
> > > If 'svn merge' deletes an existing directory, the directory is a tree conflict
> > > victim if its content is different from the corresponding directory
> > > on the left
> > > side of the merge source.
> > >
> > > Question here: What makes two directories "equal"? Do we need to consider
> > > all subdirectories of a directory? Or should we keep it simple and only
> > > consider direct children as in all the other cases?
> > > Warning: The former might perform quite badly!
> > I'm afraid the answer has to be: the former (see comment with use case 2).

> I agree that to be absolutely sure about the equality
> status of two directories, the subtrees of both directories need to
> be compared completely.
> 
> > Performance may be bad, but should scale with the size of the tree being
> > deleted, so that should be acceptable. (It doesn't happen every day that
> > you rename your entire source tree.)
> 
> I'm currently investigating how we could do this.
> I will try to provide more details later.

Here are some details:

It seems to me that it should be possible to implement this in a
relatively painless way. The proposal below requires a change to
the diff callback API. But since we're touching that already anyway,
that is not a problem.

When deleting a directory during merge, the merge_dir_deleted() callback
in libsvn_client/merge.c could trigger a depth-first traversal of the
tree in the working copy rooted at the directory to be deleted.

For each directory, it could retrieve the corresponding dir entry from
the repository as it existed in the merge-left source of the merge,
and compare the two for equality, i.e. check whether all fields in
the svn_dirent_t returned by the repo match their corresponding
attributes of the directory as found in the working copy.

The left-merge revision shall be a new additional parameter to
merge_dir_deleted(). The ra session needed to contact the repository
via the get_dir() method is already contained in the merge baton which
is passed to merge_dir_deleted().

As far as I can tell the performance overhead of this would be
N repository round trips for deleting a tree with N subdirectories.

Has anyone got serious objections, or should we go ahead and try
to implement it this way? If so, any ideas about alternative
approaches? I myself don't really feel competent enough to think
about what we could do on the server-side to help the client in
this case.

I'd especially like to get feedback from people who have experience
with the merge/diff editor, and/or the ra layer, and/or performance tuning.
(*hint* glasser *hint* -- I know it's not even 6AM in your timezone
yet, but I hope you get this message when you wake up -- Good Morning :)

Thanks,
-- 
Stefan Sperling <st...@elego.de>                 Software Developer
elego Software Solutions GmbH                            HRB 77719
Gustav-Meyer-Allee 25, Gebaeude 12        Tel:  +49 30 23 45 86 96 
13355 Berlin                              Fax:  +49 30 23 45 86 95
http://www.elego.de                 Geschaeftsfuehrer: Olaf Wagner

Re: tree conflicts and directories discussion

Posted by Stefan Sperling <st...@elego.de>.
On Thu, Apr 03, 2008 at 08:40:27AM +0200, Nico Schellingerhout wrote:
> Stefan Sperling <st...@elego.de> wrote on 04/02/2008 07:29:53 PM:
> > ==========
> > USE CASE 5
> > ==========
> >
> > Files:
> > If 'svn merge' deletes an existing file, the file is a tree conflict
> > victim if its text is different from the corresponding file on the left
> > side of the merge source.
> >
> > Directories:
> > If 'svn merge' deletes an existing directory, the directory is a tree conflict
> > victim if its content is different from the corresponding directory
> > on the left
> > side of the merge source.
> >
> > Question here: What makes two directories "equal"? Do we need to consider
> > all subdirectories of a directory? Or should we keep it simple and only
> > consider direct children as in all the other cases?
> > Warning: The former might perform quite badly!
> I'm afraid the answer has to be: the former (see comment with use case 2).

Hi again Nico,

I think the questions regarding use case 2 and use case 5 are
different.

Use case 2: How do we define a modification of a directory?

Use case 5: How do we define equality between directories?

I've added further input to the use case 2 question in another mail
in this thread.

About the latter: I agree that to be absolutely sure about the equality
status of two directories, the subtrees of both directories need to
be compared completely.

> Performance may be bad, but should scale with the size of the tree being
> deleted, so that should be acceptable. (It doesn't happen every day that
> you rename your entire source tree.)

I'm currently investigating how we could do this.
I will try to provide more details later.

-- 
Stefan Sperling <st...@elego.de>                 Software Developer
elego Software Solutions GmbH                            HRB 77719
Gustav-Meyer-Allee 25, Gebaeude 12        Tel:  +49 30 23 45 86 96 
13355 Berlin                              Fax:  +49 30 23 45 86 95
http://www.elego.de                 Geschaeftsfuehrer: Olaf Wagner

Re: tree conflicts and directories discussion

Posted by Nico Schellingerhout <ni...@philips.com>.
Stefan Sperling <st...@elego.de> wrote on 04/02/2008 07:29:53 PM:

> Hello tree conflict fans,
> 
> I've started thinking a bit about tree conflicts and directories.
> 
> The problem: We want to be able to also mark directories as
> tree conflict victims. The current code, however, only considers
> files. See http://subversion.tigris.org/issues/show_bug.cgi?id=3150
> 
> I thought it might be a good idea for all of us to agree on
> how directories can become tree conflict victims before we
> start implementing anything :)
> 
> To kick off a discussion, I've translated the entries about
> each use case in notes/tree-conflicts/detection.txt from
> files to directories.
> 
> Here is the result, please note the open question in use case 5.
> 
> Comments and additions welcome, as usual.
> 
> -----------------------------------------------------------------------
> 
> In all the cases below (except possibly one exception, noted inline)
> we never descend into directories recursively. We only consider
> the direct children of the directory in question.
> 
> ==========
> USE CASE 1
> ==========
> 
> File:
> If 'svn update' modifies a file that has been scheduled for deletion
> in the working copy, the file is a tree conflict victim.
> 
> Directory:
> If 'svn update' modifies a file in a directory that has been scheduled
> for deletion in the working copy, the directory is a tree conflict 
victim.

See my reaction to your next post.

> 
> ==========
> USE CASE 2
> ==========
> 
> File:
> If 'svn update' deletes a file that has local modifications, the file
> is a tree conflict victim.
> 
> Directory:
> If 'svn update' deletes a directory that contains a file with local
> modifications, the directory is a tree conflict victim.

Yes. Where local modifications would mean: any addition, deletion,
property change, _or_ file modification anywhere in the subtree rooted
by the directory.

> 
> ==========
> USE CASE 3
> ==========
> 
> File:
> If 'svn update' deletes a file that has been scheduled for deletion in
> the working copy, the file is a tree conflict victim.
> 
> Directory:
> If 'svn update' deletes a directory that has been scheduled for deletion 
in
> the working copy, the directory is a tree conflict victim.

Yes.

> 
> ==========
> USE CASE 4
> ==========
> 
> Files:
> If 'svn merge' tries to modify a file that does not exist in the
> target working copy, then the target file is a tree conflict victim.
> 
> Directories:
> If 'svn merge' tries to modify a file in a directory that does not exist
> in the target working copy, then the target directory is a tree 
> conflict victim.
> 
> ==========
> USE CASE 5
> ==========
> 
> Files:
> If 'svn merge' deletes an existing file, the file is a tree conflict
> victim if its text is different from the corresponding file on the left
> side of the merge source.
> 
> Directories:
> If 'svn merge' deletes an existing directory, the directory is a tree 
conflict
> victim if its content is different from the corresponding directory 
> on the left
> side of the merge source.
> 
> Question here: What makes two directories "equal"? Do we need to 
consider
> all subdirectories of a directory? Or should we keep it simple and only
> consider direct children as in all the other cases?
> Warning: The former might perform quite badly!

I'm afraid the answer has to be: the former (see comment with use case 2).

Performance may be bad, but should scale with the size of the tree being
deleted, so that should be acceptable. (It doesn't happen every day that
you rename your entire source tree.)

Actually, once Subversion gets a better understanding of "moves"
the performance could be improved a lot for the move cases (which are
most common), because in most cases no conflict would arise.

> 
> ==========
> USE CASE 6
> ==========
> 
> File:
> If 'svn merge' tries to delete a file that does not exist in the
> target working copy, then the target file is a tree conflict victim.
> 
> Directory:
> If 'svn merge' tries to delete a directory that does not exist in the
> target working copy, then the target directory is a tree conflict 
victim.

Yes.

> 
> =========================
> OBSTRUCTIONS DURING MERGE
> =========================
> 
> File:
> If 'svn merge' fails to apply an operation to a file because the
> file is obstructed (i.e. an unversioned item of the same name is
> in the file's place), the obstructed file is a tree conflict victim.
> 
> Directory:
> If 'svn merge' fails to apply an operation to a directory because the
> directory is obstructed (i.e. an unversioned item of the same name is
> in the directory's place), the obstructed directory is a tree conflict
> victim.

Yes.

- Nico