You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Greg Stein <gs...@gmail.com> on 2010/03/02 21:39:44 UTC

Re: renames/moves in wc-ng (was: (Re: [PATCH] wc-ng - Remove use of entry_t from) calculate_target_mergeinfo

On Tue, Mar 2, 2010 at 08:48, Stefan Sperling <st...@elego.de> wrote:
> On Tue, Mar 02, 2010 at 07:53:30AM -0500, Greg Stein wrote:
>...
>> We do *not* record moves today.
>>
>> We *will* at some point in the future, which is most likely *after*
>> the 1.7 release.
>
> So the question of how exactly we'll trace moves is still up in the air?
>
> I've been pondering whether we should use a git-like approach of guessing
> moves at commit time based on content, rather than explicitly tracking
> what the user did. Recording tree refactorings explicitly is very
> complex and prone to failure. E.g. tracking directory replacements plus
> additional post-replace refactorings inside replaced directories is
> giving me headaches, and I have doubts that our current wc_db schema is
> up to this task. I'd rather like to avoid this complexity if possible.
> Since we'll have content checksums in the pristine store anyway, maybe
> using a lazy content-based guessing approach in addition to copyfrom
> info will be fine if we can figure out a way to make it track directory
> renames as well?

Oh no no no... the schema is fully set up to record moves properly,
without any guesswork. It is just that nobody will invoke the function
(in 1.7!) to do the recording.

I'm comfortable that the schema will work. Please feel free to analyze
it for holes. But you're looking at the 'moved_here' and 'moved_to'
columns. You're also going to want to learn about scan_addition() and
scan_deletion() in more detail.

Doing moves at commit time also implies you've updated the repository
to understand that, which is a completely separate problem. Not to
mention the editor interface between the WC and the RA layers.

Cheers,
-g

Re: holes in the WC-NG schema?

Posted by Greg Stein <gs...@gmail.com>.

On Fri, Mar 5, 2010 at 11:43, Bert Huijben <be...@qqmail.nl> wrote:
>> -----Original Message-----
>> From: Neels J Hofmeyr [mailto:neels@elego.de]
>> Sent: vrijdag 5 maart 2010 11:45
>> To: dev@subversion.apache.org
>> Subject: Re: holes in the WC-NG schema?
>>
>> Neels J Hofmeyr wrote:
>> > I remember someone talking of a hole. It went something like: If a
> folder is
>> > copied-here, all its children have locally added status, and I
> understand
>> > they refer to the op-root of the add to find out their history, i.e.
> that
>> > they are copied. Now what if I replace such a child node with a fresh,
> new
>> > node -- it will still think that it's part of the copy-here. Just vague
>> > memory, haven't verified. This one should be fixed if it turns out to be
> so.
>>
>> Turns out we are aware of this problem. My noob shot at a solution would
> be
>> to have an indicator whether a WORKING node is the op-root of an add.
>> Then
>> we can have an op-root of an add within another op-root of an add without
>> confusion. We might indicate on the inner op-root that they are not the
> root
>> of all locally added nodes, but just the root of an add operation. (Or
> have
>> scan_addition() find out by also scanning the parent of each add-op-root,
>> which it does anyway when asked to find the repository path of the add,
>> which it derives from the op-root's parent's BASE tree node ('s
> ancestors).)
>
> This would fix local adds, but it is not a complete solution for allowing
> all revert scenarios from nested add with history trees. (especially the
> cases where you replace some subtree of an add with history with a different
> tree)

I've thought on this, and I think the current schema, with the
addition the singular marker we talked about at the root of a (local)
add will be complete.

First, stop saying add-with-history. Those are copies or moves. Very
different from an add. We have three terms, each meaning something
different. Continuing to use the "with history" is simply confusing.

Now. The main thing here is that the local database records enough
information to construct a *commit* to the target repository. We do
not need to remember each and every operation that arrived at that
state. Only what is necessary to replicate the local state up to the
repository.

The WORKING_NODE tables models adds/copies/moves/deletes. The specific
columns in the schema, when not-null, define the roots of copies and
moves. These can be nested since each copy/move specifies a new root.
Deletes do not require this nesting of operations -- once you delete
an ancestor, then any prior deletes of descendants are irrelevant. Our
problem is that we do not model adds beyond "hey! there is a
WORKING_NODE!". Attaching a "fake" copyfrom_* can delimit these adds,
thus allowing complex nesting of what is present in the WORKING tree.

Thus, I state we *can* do a complete modeling, once we define a
singleton to mean "add".

> We need a more advanced schema to fix this category. The simple fix would
> fix this behavior we supported in 1.6, but we might want the better fix.

I disagree.

Cheers,
-g

Re: holes in the WC-NG schema?

Posted by Neels J Hofmeyr <ne...@elego.de>.

Bert Huijben wrote:
> 
>> -----Original Message-----
>> From: Neels J Hofmeyr [mailto:neels@elego.de]
>> Sent: vrijdag 5 maart 2010 11:45
>> To: dev@subversion.apache.org
>> Subject: Re: holes in the WC-NG schema?
>>
>> Neels J Hofmeyr wrote:
>>> I remember someone talking of a hole. It went something like: If a
> folder is
>>> copied-here, all its children have locally added status, and I
> understand
>>> they refer to the op-root of the add to find out their history, i.e.
> that
>>> they are copied. Now what if I replace such a child node with a fresh,
> new
>>> node -- it will still think that it's part of the copy-here. Just vague
>>> memory, haven't verified. This one should be fixed if it turns out to be
> so.
>> Turns out we are aware of this problem. My noob shot at a solution would
> be
>> to have an indicator whether a WORKING node is the op-root of an add.
>> Then
>> we can have an op-root of an add within another op-root of an add without
>> confusion. We might indicate on the inner op-root that they are not the
> root
>> of all locally added nodes, but just the root of an add operation. (Or
> have
>> scan_addition() find out by also scanning the parent of each add-op-root,
>> which it does anyway when asked to find the repository path of the add,
>> which it derives from the op-root's parent's BASE tree node ('s
> ancestors).)
> 
> This would fix local adds, but it is not a complete solution for allowing
> all revert scenarios from nested add with history trees. (especially the
> cases where you replace some subtree of an add with history with a different
> tree)

...are you talking about being able to revert layer-wise, like, only the
replace within, so that the originally copied tree is brought back? (I won't
 get too involved here on top of the other stuff, am just curious to know.)
I'm thinking, if it was allowed to contact the repos, revert could cope
using the current schema + op-root flag thing, I guess?

~Neels

RE: holes in the WC-NG schema?

Posted by Bert Huijben <be...@qqmail.nl>.


> -----Original Message-----
> From: Neels J Hofmeyr [mailto:neels@elego.de]
> Sent: vrijdag 5 maart 2010 11:45
> To: dev@subversion.apache.org
> Subject: Re: holes in the WC-NG schema?
> 
> Neels J Hofmeyr wrote:
> > I remember someone talking of a hole. It went something like: If a
folder is
> > copied-here, all its children have locally added status, and I
understand
> > they refer to the op-root of the add to find out their history, i.e.
that
> > they are copied. Now what if I replace such a child node with a fresh,
new
> > node -- it will still think that it's part of the copy-here. Just vague
> > memory, haven't verified. This one should be fixed if it turns out to be
so.
> 
> Turns out we are aware of this problem. My noob shot at a solution would
be
> to have an indicator whether a WORKING node is the op-root of an add.
> Then
> we can have an op-root of an add within another op-root of an add without
> confusion. We might indicate on the inner op-root that they are not the
root
> of all locally added nodes, but just the root of an add operation. (Or
have
> scan_addition() find out by also scanning the parent of each add-op-root,
> which it does anyway when asked to find the repository path of the add,
> which it derives from the op-root's parent's BASE tree node ('s
ancestors).)

This would fix local adds, but it is not a complete solution for allowing
all revert scenarios from nested add with history trees. (especially the
cases where you replace some subtree of an add with history with a different
tree)

We need a more advanced schema to fix this category. The simple fix would
fix this behavior we supported in 1.6, but we might want the better fix.

	Bert

Re: holes in the WC-NG schema?

Posted by Neels J Hofmeyr <ne...@elego.de>.

Neels J Hofmeyr wrote:
> I remember someone talking of a hole. It went something like: If a folder is
> copied-here, all its children have locally added status, and I understand
> they refer to the op-root of the add to find out their history, i.e. that
> they are copied. Now what if I replace such a child node with a fresh, new
> node -- it will still think that it's part of the copy-here. Just vague
> memory, haven't verified. This one should be fixed if it turns out to be so.

Turns out we are aware of this problem. My noob shot at a solution would be
to have an indicator whether a WORKING node is the op-root of an add. Then
we can have an op-root of an add within another op-root of an add without
confusion. We might indicate on the inner op-root that they are not the root
of all locally added nodes, but just the root of an add operation. (Or have
scan_addition() find out by also scanning the parent of each add-op-root,
which it does anyway when asked to find the repository path of the add,
which it derives from the op-root's parent's BASE tree node ('s ancestors).)

Because this indicator is only really needed for non-copy adds within a
copy, it is tempting to abuse some other table row that is only used for
copies. But I think we can do ourselves a favour by having an explicit
indicator instead. Overloading values with special-case meanings is Not Good
(tm). That case with svn_opt_revision_working I described in my original
post is a good example of the snake pit that overloading can become. But ok,
if we can hermetically seal off the special-case-overloading in internal
API, that would be fine with me.

(In a side note, with an explicit indicator we would not need to "fall off
the top of the NG-WORKING tree" to detect the op-root. Not that it matters /
whatever.)

[...]
> - The first place is svn_client__get_revision_number() with a revision of
> svn_opt_revision_working or _base. This function returns a (much overloaded)
> revision number, which is derived from the parent in some added-cases. In
> above case (switched-away parent from a locally added child), deriving from
> the parent is not possible anymore. Regardless, we do not want to look up
> parents at all.
> ... But chances are good that we can eliminate this case by refactoring a
> few callers, properly defining svn_opt_revision_*'s meaning for added nodes
> to be SVN_INVALID_REVNUM, and coining it an API error, or something.
[...]

~Neels

Re: holes in the WC-NG schema?

Posted by Greg Stein <gs...@gmail.com>.

On Thu, Mar 4, 2010 at 19:36, Neels J Hofmeyr <ne...@elego.de> wrote:
> Greg Stein wrote:
>> Oh no no no... the schema is fully set up to record moves properly,
>> without any guesswork. It is just that nobody will invoke the function
>> (in 1.7!) to do the recording.
>>
>> I'm comfortable that the schema will work. Please feel free to analyze
>> it for holes. But you're looking at the 'moved_here' and 'moved_to'
>> columns. You're also going to want to learn about scan_addition() and
>> scan_deletion() in more detail.
>
> I remember someone talking of a hole. It went something like: If a folder is
> copied-here, all its children have locally added status, and I understand
> they refer to the op-root of the add to find out their history, i.e. that
> they are copied. Now what if I replace such a child node with a fresh, new
> node -- it will still think that it's part of the copy-here. Just vague
> memory, haven't verified. This one should be fixed if it turns out to be so.

For an add, yes. The schema doesn't handle this yet. I'm unclear on
how a flag of "added" can't fully handle this. Bert posted something
about it, which I didn't understand. Need to think further on the
scenario.

(and, iirc, props to Philip for discovering this one)

> I also know of another potential "future hole": If I add a node, it
> remembers the URL at which it wants to be added, which is derived from the
> parent during the 'add' operation. If I then even switch the parent away
> --depth=empty, the added child node still remembers its original URL. That
> is pretty cool. But, we have two places where we also may want to know the
> *revision* that the parent had at the time of the 'add' (which should also
> step up when 'svn update's happen):

IMO, that is a bug of wc-1. Today, if you switch the parent, then the
add will go to the new repository location.

Added/copied/moved nodes do not have a repository location (intended
or not) until they are committed. Their target repository location is
identified by the parent.

And all that said, I seem to recall that you are not allowed to switch
a node if there are local changes. Could be wrong, but I do know we
don't "rememeber" the old URL any more.

>...

I'm not sure how your two scenarios apply, since the memory is not present.

>...
> I'm, thirdly, wondering if we can really express all sorts of madly
> cascaded/circular local copies/moves/deletes/replaces from/to subtrees that
> make sense to our users... a proper test suite will tell, right, stsp? ;)

We should be able to. I've thought about a lot of these scenarios...
witness the madhouse docstring of scan_addition and scan_deletion.
Some nasty scenarios.

But that's also why I encourage a review. I certainly missed the
add-within-a-copy/move.

Cheers,
-g

holes in the WC-NG schema?

Posted by Neels J Hofmeyr <ne...@elego.de>.

Greg Stein wrote:
> Oh no no no... the schema is fully set up to record moves properly,
> without any guesswork. It is just that nobody will invoke the function
> (in 1.7!) to do the recording.
> 
> I'm comfortable that the schema will work. Please feel free to analyze
> it for holes. But you're looking at the 'moved_here' and 'moved_to'
> columns. You're also going to want to learn about scan_addition() and
> scan_deletion() in more detail.

I remember someone talking of a hole. It went something like: If a folder is
copied-here, all its children have locally added status, and I understand
they refer to the op-root of the add to find out their history, i.e. that
they are copied. Now what if I replace such a child node with a fresh, new
node -- it will still think that it's part of the copy-here. Just vague
memory, haven't verified. This one should be fixed if it turns out to be so.


I also know of another potential "future hole": If I add a node, it
remembers the URL at which it wants to be added, which is derived from the
parent during the 'add' operation. If I then even switch the parent away
--depth=empty, the added child node still remembers its original URL. That
is pretty cool. But, we have two places where we also may want to know the
*revision* that the parent had at the time of the 'add' (which should also
step up when 'svn update's happen):

- The first place is svn_client__get_revision_number() with a revision of
svn_opt_revision_working or _base. This function returns a (much overloaded)
revision number, which is derived from the parent in some added-cases. In
above case (switched-away parent from a locally added child), deriving from
the parent is not possible anymore. Regardless, we do not want to look up
parents at all.
... But chances are good that we can eliminate this case by refactoring a
few callers, properly defining svn_opt_revision_*'s meaning for added nodes
to be SVN_INVALID_REVNUM, and coining it an API error, or something.

- The second place is tree-conflict info. (This is a slightly odd concept,
but it makes sense:) If a tree-conflict happened on a locally added node,
'svn info' would like to tell the user at which URL@REV this added node was
*known to not exist*. We can get the URL easily, today. But the revision is
missing. That's why 'info' currently just prints "(none)" in that case.
I'm loosely thinking of an "added-at-rev" column in the NG-WORKING table,
which is derived from the parent during 'svn add', and is updated during
'svn update' and 'svn switch' (here --depth=infinity).
But frankly, this info output is not that pressing really. Just noting this
to indicate that this particular useful information is currently not
available in the schema.
(There is another situation that is related -- saying this here because it
illustrates how "real code" can just "cheat" around not having an
added-onto-revision. Philosophical experiment: If we had this data of
'revision-onto-which-node-is-added', a locally added node may,
hypothetically, *be detected to be out-of-date* in the same way that locally
modified nodes are; that would force an 'svn up', which would cause a
tree-conflict if this path has also been added in the repos. Currently, we
don't detect that, try to commit, which fails with an error and *not* a
tree-conflict. In reality, this can simply be fixed by saying 'out-of-date'
instead of erroring when we hit a local-add vs. repos-add during commit; it
obviously doesn't really need the revision.)


I'm, thirdly, wondering if we can really express all sorts of madly
cascaded/circular local copies/moves/deletes/replaces from/to subtrees that
make sense to our users... a proper test suite will tell, right, stsp? ;)

~Neels