You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Daniel Näslund <da...@longitudo.com> on 2010/04/22 14:05:50 UTC

[PATCH] Don't use entries for checking status in svn_wc_status3_t. Was: Re: [WIP] Some quirky parts of libsvn_wc/status.c:assemble_status() for retrieving revisions

Hi!

[[[
As part of WC-NG, replace entry calls for revisions from
svn_wc_status2_t related code.

The entries code set the revision of newly added nodes to 0 but the db
sets them to -1. Since too many tests needs to be changed and 'svn info'
also uses 0, I'll change those values in a follow-up patch instead.

* subversion/tests/cmdline/copy_tests.py
  (repos_to_wc): Change revision to 0 since the node is added. Don't
    check against entries since the behavior differs.

* subversion/tests/cmdline/stat_tests.py
  (status_with_tree_conflicts): An copied node should not have a
    changed_rev or author.

* subversion/tests/cmdline/svntest/actions.py
  (run_and_verify_status): Add new parameter check_entries. We only
    check the entries if check_entries is set to True.

* subversion/tests/cmdline/merge_tests.py
  (merge_into_missing): Add revision to status output for missing dir.
    Previously, missing dirs did not have a revision, only files had.
    Don't check entries since the behavior differs.

* subversion/svn/status.c
  (print_status): Replace checks for revisions using entries with the
    direct fields in svn_wc_status3_t. Set the revision for added and
    replaced nodes to 0. WC-NG sets those revisions to -1, but changing
    all the involved tests is for a follow-up.

* subversion/include/svn_wc.h
  (svn_wc_status2_t): Add fields revision, changed_rev and
    changed_author.

* subversion/libsvn_wc/status.c
  (assemble_status): Fill in the new fields with data from the wc db.
  (svn_wc_dup_status3): Copy the changed_author field.
]]]

Quirks
========

Checking entries
-------------------
I need to tweak the status output when doing a compare against the
entries in run_and_verify_status(). The code changes the behaviour for
revisions in three ways:

i) Missing dirs, now have a revision
ii) Copies from foreign repos to wc has the rev set to -1
iii) A copied node should not have a changed_author or changed_rev set.

I could change tweak_for_entries_compare() to fix i) but for the
other two I would have to write a func for tweaking the status returned
by 'entries-dump'. Can't I just use the check_entries parameter I've
added to run_and_verify_status() instead? I would only disable the
entries check for three tests.

Getting revisions of deleted nodes
-----------------------------------
I'm using base_get_info() for both nodes having the op_root from a plain
delete and those from a delete within an add. In the later case, the
revision, changed_rev and changed_author will be set to the default nil
values which match what I would get from read_info().

Cheers,
Daniel

Re: [PATCH] Don't use entries for checking status in svn_wc_status3_t. Was: Re: [WIP] Some quirky parts of libsvn_wc/status.c:assemble_status() for retrieving revisions

Posted by Greg Stein <gs...@gmail.com>.

On Fri, Apr 23, 2010 at 11:01, Daniel Näslund <da...@longitudo.com> wrote:
> On Fri, Apr 23, 2010 at 06:23:49AM -0400, Greg Stein wrote:
>> On Fri, Apr 23, 2010 at 05:38, Daniel Näslund <da...@longitudo.com> wrote:
>> > I wanted to keep the current behaviour since the status code is involved
>> > in so _many_ tests and rewriting all of those would cause me to loose
>> > momentum. Guess, what? I've already lost that momentum. :)
>>
>> I suspect you mean svn/status-cmd.c rather than "status code". The
>> real point is "how does 'svn status' present the values in
>> svn_wc_status3_t?". You can provide wc-ng semantics in that structure,
>> and then let status-cmd map those into old-style semantics and
>> display.
>>
>> Then, in a future step, we can discuss altering the output of
>> status-cmd to make more sense and to eliminate all the ugliness.
>
> Ok, how about creating something like this. To be called from inside the
> status callback for compatibility with the testsuite (a temporary thing
> of course).

Well... I think keeping the created_* values in there would be just
fine. It is the crazy semantics of entry->revision that you're trying
to duplicate which is bothering me. If you create a function to return
a revision with those semantics... sure. Go ahead. You're not
"infecting" svn_wc_status3_t with craziness in that case :-)

>...
> Ok, 'we want' is really 'I think we want' but you get the idea.

Whatever you think. My concerns lie around the definition/semantics of
svn_wc_status3_t.

>...

Cheers,
-g

Re: [PATCH] Don't use entries for checking status in svn_wc_status3_t. Was: Re: [WIP] Some quirky parts of libsvn_wc/status.c:assemble_status() for retrieving revisions

Posted by Daniel Näslund <da...@longitudo.com>.

On Fri, Apr 23, 2010 at 06:23:49AM -0400, Greg Stein wrote:
> On Fri, Apr 23, 2010 at 05:38, Daniel Näslund <da...@longitudo.com> wrote:
> > I wanted to keep the current behaviour since the status code is involved
> > in so _many_ tests and rewriting all of those would cause me to loose
> > momentum. Guess, what? I've already lost that momentum. :)
> 
> I suspect you mean svn/status-cmd.c rather than "status code". The
> real point is "how does 'svn status' present the values in
> svn_wc_status3_t?". You can provide wc-ng semantics in that structure,
> and then let status-cmd map those into old-style semantics and
> display.
> 
> Then, in a future step, we can discuss altering the output of
> status-cmd to make more sense and to eliminate all the ugliness.

Ok, how about creating something like this. To be called from inside the
status callback for compatibility with the testsuite (a temporary thing
of course). 

/* Get the working revision of @a local_abspath using @a wc_ctx. If @a
 * local_abspath is not in the working copy, return @c
 * SVN_ERR_WC_PATH_NOT_FOUND.
 *
 * This function is meant as a temporary solution for using the
 * old-style semantics of entries. It will handle any uncommitted
 * changes (delete, replace and/or copy-here/move-here).
 *
 * For a delete the @a revision is the BASE node of the operation root,
 * e.g the path that was deleted. But if the delete is  below an add,
 * the revision is set to SVN_INVALID_REVNUM. For an add, copy or move
 * we return SVN_INVALID_REVNUM. In case of a replacement, we return the
 * BASE revision.
 *
 * The @changed_rev is set to the latest committed change to @a
 * local_abspath before or equal to @a revision, unless the node is
 * copied-here or moved-here. The it is the revision of the latest
 * committed change before or equal to the copyfrom_rev. NOTE, that we
 * use SVN_INVALID_REVNUM for a scheduled copy.
 *
 * The @a changed_date and @a changed_author are the ones associated
 * with @a changed_rev.
 */
svn_wc__node_get_working_rev_info(svn_revnum_t *revision,
                                  svn_revnum_t *changed_rev,
                                  apr_time_t *changed_date,
                                  const char *changed_author,
                                  svn_wc_context_t *wc_ctx,
                                  const char *local_abspath,
                                  apr_pool_t *scratch_pool,
                                  apr_pool_t *result_pool);

It will consist of the code I've put inside assemble_status() in the
previous patch. While we want this behaviour for the status callback:

print_status(..., svn_wc_status3_t *status)
{
  if (status->replaced)
    Get the info from BASE
  if (status->add/copy-here/move-here)
    Use the info inside svn_wc_status3_t (as described for read_info())
  if (status->deleted)
    if (status->inside_an_add)
       /* I have no idea really */
    else
      not_yet_implemented_get_base_of_delete()
}

Ok, 'we want' is really 'I think we want' but you get the idea.

There is no way to detect those states from the callback.
status->text_status was _not_t the way to go. I didn't get any specific
reasons to why, but I'm assuming it maps poorly to the new states in
svn_wc__db_state_t. And the temporary svn_wc__node_is_status_{added,...}
had a couple of quirks with not detecting obstructed_add and so on.
Using the function I outlined above I can let status->revision be easily
defined while still beeing compatible with the testsuite.

Maybe we should create svn_wc_status_kind2_t to make the relevant db
states available to the upper layers.

[...]                                   

Cheers,
Daniel

Re: [PATCH] Don't use entries for checking status in svn_wc_status3_t. Was: Re: [WIP] Some quirky parts of libsvn_wc/status.c:assemble_status() for retrieving revisions

Posted by Daniel Näslund <da...@longitudo.com>.

On Fri, Apr 23, 2010 at 06:23:49AM -0400, Greg Stein wrote:
> On Fri, Apr 23, 2010 at 05:38, Daniel Näslund <da...@longitudo.com> wrote:
> > I wanted to keep the current behaviour since the status code is involved
> > in so _many_ tests and rewriting all of those would cause me to loose
> > momentum. Guess, what? I've already lost that momentum. :)
> 
> I suspect you mean svn/status-cmd.c rather than "status code". The
> real point is "how does 'svn status' present the values in
> svn_wc_status3_t?". You can provide wc-ng semantics in that structure,
> and then let status-cmd map those into old-style semantics and
> display.
> 
> Then, in a future step, we can discuss altering the output of
> status-cmd to make more sense and to eliminate all the ugliness.

Ok, how about creating something like this. To be called from inside the
status callback for compatibility with the testsuite (a temporary thing
of course). 

/* Get the working revision of @a local_abspath using @a wc_ctx. If @a
 * local_abspath is not in the working copy, return @c
 * SVN_ERR_WC_PATH_NOT_FOUND.
 *
 * This function is meant as a temporary solution for using the
 * old-style semantics of entries. It will handle any uncommitted
 * changes (delete, replace and/or copy-here/move-here).
 *
 * For a delete the @a revision is the BASE node of the operation root,
 * e.g the path that was deleted. But if the delete is  below an add,
 * the revision is set to SVN_INVALID_REVNUM. For an add, copy or move
 * we return SVN_INVALID_REVNUM. In case of a replacement, we return the
 * BASE revision.
 *
 * The @changed_rev is set to the latest committed change to @a
 * local_abspath before or equal to @a revision, unless the node is
 * copied-here or moved-here. The it is the revision of the latest
 * committed change before or equal to the copyfrom_rev. NOTE, that we
 * use SVN_INVALID_REVNUM for a scheduled copy.
 *
 * The @a changed_date and @a changed_author are the ones associated
 * with @a changed_rev.
 */
svn_wc__node_get_working_rev_info(svn_revnum_t *revision,
                                  svn_revnum_t *changed_rev,
                                  apr_time_t *changed_date,
                                  const char *changed_author,
                                  svn_wc_context_t *wc_ctx,
                                  const char *local_abspath,
                                  apr_pool_t *scratch_pool,
                                  apr_pool_t *result_pool);

It will consist of the code I've put inside assemble_status() in the
previous patch. While we want this behaviour for the status callback:

print_status(..., svn_wc_status3_t *status)
{
  if (status->replaced)
    Get the info from BASE
  if (status->add/copy-here/move-here)
    Use the info inside svn_wc_status3_t (as described for read_info())
  if (status->deleted)
    if (status->inside_an_add)
       /* I have no idea really */
    else
      not_yet_implemented_get_base_of_delete()
}

Ok, 'we want' is really 'I think we want' but you get the idea.

There is no way to detect those states from the callback.
status->text_status was _not_t the way to go. I didn't get any specific
reasons to why, but I'm assuming it maps poorly to the new states in
svn_wc__db_state_t. And the temporary svn_wc__node_is_status_{added,...}
had a couple of quirks with not detecting obstructed_add and so on.
Using the function I outlined above I can let status->revision be easily
defined while still beeing compatible with the testsuite.

Maybe we should create svn_wc_status_kind2_t to make the relevant db
states available to the upper layers.

[...]                                   

Cheers,
Daniel

Re: [PATCH] Don't use entries for checking status in svn_wc_status3_t. Was: Re: [WIP] Some quirky parts of libsvn_wc/status.c:assemble_status() for retrieving revisions

Posted by Greg Stein <gs...@gmail.com>.

On Fri, Apr 23, 2010 at 05:38, Daniel Näslund <da...@longitudo.com> wrote:
>...
> I set out to replace the entry field in svn_wc_statusX_t. I thought that
> ideally I could just keep the semantics from the entry but move out the
> fields to svn_wc_status2_t.

Yikes.

I would say, "if callers want entry_t semantics, then they should use
status->entry. new fields should use wc-ng semantics ONLY."

Migrate away from entry_t and its semantics. Creating new fields with
the same semantics doesn't get us very far. The status->entry can
linger around and hold its old semantics.

> I wanted to keep the current behaviour since the status code is involved
> in so _many_ tests and rewriting all of those would cause me to loose
> momentum. Guess, what? I've already lost that momentum. :)

I suspect you mean svn/status-cmd.c rather than "status code". The
real point is "how does 'svn status' present the values in
svn_wc_status3_t?". You can provide wc-ng semantics in that structure,
and then let status-cmd map those into old-style semantics and
display.

Then, in a future step, we can discuss altering the output of
status-cmd to make more sense and to eliminate all the ugliness.

> I guess I should do what you suggested: Make assemble_status() simpler
> to insure we that we can easily describe what it's returning. On the

Bingo. entry->revision is very difficult to describe, which means most
users get it wrong. They want one idea of "revision", but
entry->revision doesn't always match that idea.

Answer: have a very clear and simple definition of "status->revision",
and if they want something else, then they ask that clearly and
specifically via other APIs.

> other hand, I don't see the scanning process as something expensive
> comparing to 'text compares'.

Scanning up the tree will get to be expensive when you do it for EVERY
node in a large tree. But I'm not so worried about the time-cost, as I
am about the semantics of these things. As I mentioned, the code in
your patch had a long and complicated meaning behind "revision". When
it gets that complicated, it means the field cannot reliably be used.
"Is the node in the right status for me to use status->revision?
hrm....."

> And I question if the API we're exposing
> really should demand that the callers do any extra work for fetching
> something as fundamental as a revision.

Heh. "fundamental". Define "revision". read_info defines it as "what
is the revision of the node I'm looking at right now?" And that has a
simple answer "what you checked out" or "doesn't have one cuz you
haven't committed your changes".

> AFAIK, the only thing that is
> not clear is how we handle working changes below a deleted node. If we
> get that one settled, then why not return the _right_ revision at once?

Again, what the hell is the right revision? The child below that
deleted node has at least three revisions:

1) undefined. you haven't committed
2) a BASE node was here, but isn't now. provide that BASE node's revision
3) the deleted node was copied from somewhere, and has a corresponding
repository node. what is that revision.

So tell me... which of those three definitions do you want for your
"revision". And is this truly "fundamental" or a difficult and
scenario-specific question?

entry->revision blended many of these simple definitions into one,
making the field very hard to use. The code generally used it
*unaware* of the real semantics and what the value truly meant. I am
hoping to alter that code so it *knows* what it is looking at. And if
"BASE revision, or -1 cuz you haven't committed" is not right for that
code, then it can ask for exactly what it truly needs.

> I mean, for a VCS where the notion of globally recognized identifiers
> for change sets is a central part, there should be little vagueness as
> to which one we really mean for a path! The users of our API will hardly
> want to use another behaviour than what the CLI exposes.
>
> Since I'm easily persuaded, below is my idea of what to do to get to the
> point where assemble_status() only returns the most basic info about the
> revision of a node. (I really should skip step (i) and rewrite it to be
> part (ii) at once but I've put so much effort into that patch. I just
> want to commit it! Hardly a good argument, though.)
>
> Suggested work flow
> ====================
> i) Commit the current patch with a few mods to correct the behaviour for
> changed_rev, entries-checking and some '###' comments in
> assemble_status() clearly stating our intent to return a more
> well-defined revision (less ambiguity).

The work in that patch to define status->revision doesn't move us
forward from status->entry->revision. If current code wants the old
semantics, then status->entry is still there.

> ii) Create a svn_wc__node_get_revision_of_deleted() or similar to invoke
> when status->text_status == svn_wc_status_deleted. That one should be
> able to cope with changes below the base_del_abspath when returning the
> abspath. Remove relevant parts from assemble_status() and invoke the
> node function instead from svn/status.c.

First, "deleted" is a concept of the node, not the "text" or the props
(ie. prop_status). That is seriously bogus, and should be ripped out.
The status structure should have something similar to
svn_wc__db_status_t. And then text/props can have simple flags like
"have these been modified?"

> iii) Replace the status->entry->lock_{token, comment,...} with
> status->lock_{token, comment, ... }

Yes. An alternative would be "svn_boolean_t locked;" .. but provided
the lock information is easy/cheap, so we may as well. ie. we gotta
try and fetch the lock info to report it exists, so let's just provide
all the info.

> iv) Remove the entry field and use another way for checking if a path is
> versioned or not instead of just checking if status->entry is set.

svn_boolean_t versioned;

> v) Create the svn_status2_from_3() logic to create an entry for
> svn_wc_status2_t.

svn_wc__get_entry(&status2->entry, ...);

> vi) Change the output of 'svn status' to set uncommitted paths as having
> rev ' ' instead of '0'.

Do anything you like in status-cmd to retain compatibility. My concern
is around a cogent and understandable set of semantics for
svn_wc_status3_t.

> vii) Do the same for 'svn info' after removing the uses of entries in
> there. build_info_for_entry() is a rat nest of entry usages.

Eesh. That structure is nearly as bad, and my head aches from "sheesh.
MORE stuff to clean up?!?!"

Cheers,
-g

Re: [PATCH] Don't use entries for checking status in svn_wc_status3_t. Was: Re: [WIP] Some quirky parts of libsvn_wc/status.c:assemble_status() for retrieving revisions

Posted by Daniel Näslund <da...@longitudo.com>.

On Thu, Apr 22, 2010 at 05:07:56PM -0400, Greg Stein wrote:
> On Thu, Apr 22, 2010 at 15:48, Daniel Näslund <da...@longitudo.com> wrote:
> 
> >> read_info() returns changed_* information for copied nodes. A copied
> >> node has a corresponding node in the repository, and the changed_*
> >> information reflects that. (thus, your assemble_status should not be
> >> wiping the values out... if the caller wants to interpret added nodes
> >> that way, then it can)
>
> >> Personally, I would suggest that assemble_status() does NOTHING more
> >> than returning read_info's REVISION result. That is the "ideal" value
> >> for the revision of the node LOCAL_ABSPATH.
> >>
> >> If that revision is NOT what you want (ie. you have some other rules
> >> like "well... if it is added, then ..." or "if it is deleted, then
> >> ..."), then the caller should perform those calculations explicitly.
> >> The status code should not pre-suppose what rules the caller may want
> >> for revision handling. It should simply report the "ideal" revision
> >> for that node and be done with it.
> >
> > WHAT? You're saying that all the scanning should take place in the
> > client callback?
> 
> No. I said "don't create strange semantics".
> 
> The code you had in there had the following definition for status->revision:
> 
> "The revision of the BASE node, unless that has been deleted, then: if
> nothing was added, then still the same BASE revision, but if something
> was added, then SVN_INVALID_REVNUM. BUT! If you delete a child of that
> addition, then the BASE version MAY be returned again... we say "may"
> because if a BASE doesn't correspond to the deleted child of the add,
> then you'll get an error instead."
> 
> The definition that read_info uses is:
> 
> "The revision of the unmodified (BASE) node, or SVN_INVALID_REVNUM if
> any (structural) changes have been made to that node."
> 
> (where "structural" means an add/copy/move/delete; we don't mean
> content edits like propchanges or editing the text)
> 
> > That's a major change from what I've done so far. That
> > I should just return SVN_INVALID_REVNUM for cases where I've scanned for
> > deletion? Is that a behaviour that everyone agrees on? You're saying
> > personally so I'm assuming there hasn't been a discussion about it.
> 
> No, I don't think it should do a scan_add, nor a scan_deletion. Report
> the add/delete, and leave it at that.
> 
> repos_relpath, repos_root_url, and repos_uuid are all inherited
> values, so I believe a scan_base_repos makes sense: you're simply
> returning those values. You're not looking for further details about
> operations (which the caller may not be interested in anyways).
> 
> (and note that scan_base_repos doesn't even have to be used cuz
> assemble_status could be passed the parent repos_* values; if the BASE
> node hasn't been switched and (therefore) returns NULL for repos_*,
> then its repos_* values are easily derived within wc/status.c; but
> repos_* values will always be NULL for modified nodes (ie. if a
> WORKING node exists))
> 
> Scanning additional rows is not "free", so we shouldn't be looking up
> information from the ancestors unless/until we find that most/all of
> the callers need that information. And in some cases, we can even make
> status.c fill in the info, since it has seen the ancestors already.
> 
> In the future, we may return added/copied/moved_here distinctly so
> that a scan won't be needed to resolve that (tho you'd still have to
> scan to locate the root of the operation, tho we may be able to
> optimize that inside wc_db).

I set out to replace the entry field in svn_wc_statusX_t. I thought that
ideally I could just keep the semantics from the entry but move out the
fields to svn_wc_status2_t. 

I wanted to keep the current behaviour since the status code is involved
in so _many_ tests and rewriting all of those would cause me to loose
momentum. Guess, what? I've already lost that momentum. :)

I guess I should do what you suggested: Make assemble_status() simpler
to insure we that we can easily describe what it's returning. On the
other hand, I don't see the scanning process as something expensive
comparing to 'text compares'. And I question if the API we're exposing
really should demand that the callers do any extra work for fetching
something as fundamental as a revision. AFAIK, the only thing that is
not clear is how we handle working changes below a deleted node. If we
get that one settled, then why not return the _right_ revision at once?
I mean, for a VCS where the notion of globally recognized identifiers
for change sets is a central part, there should be little vagueness as
to which one we really mean for a path! The users of our API will hardly
want to use another behaviour than what the CLI exposes.

Since I'm easily persuaded, below is my idea of what to do to get to the
point where assemble_status() only returns the most basic info about the
revision of a node. (I really should skip step (i) and rewrite it to be
part (ii) at once but I've put so much effort into that patch. I just
want to commit it! Hardly a good argument, though.)

Suggested work flow
====================
i) Commit the current patch with a few mods to correct the behaviour for
changed_rev, entries-checking and some '###' comments in
assemble_status() clearly stating our intent to return a more
well-defined revision (less ambiguity).

ii) Create a svn_wc__node_get_revision_of_deleted() or similar to invoke
when status->text_status == svn_wc_status_deleted. That one should be
able to cope with changes below the base_del_abspath when returning the
abspath. Remove relevant parts from assemble_status() and invoke the
node function instead from svn/status.c.

iii) Replace the status->entry->lock_{token, comment,...} with
status->lock_{token, comment, ... }

iv) Remove the entry field and use another way for checking if a path is
versioned or not instead of just checking if status->entry is set.

v) Create the svn_status2_from_3() logic to create an entry for
svn_wc_status2_t.

vi) Change the output of 'svn status' to set uncommitted paths as having
rev ' ' instead of '0'.

vii) Do the same for 'svn info' after removing the uses of entries in
there. build_info_for_entry() is a rat nest of entry usages.

Thanks,
Daniel

Re: [PATCH] Don't use entries for checking status in svn_wc_status3_t. Was: Re: [WIP] Some quirky parts of libsvn_wc/status.c:assemble_status() for retrieving revisions

Posted by Daniel Näslund <da...@longitudo.com>.

On Thu, Apr 22, 2010 at 05:07:56PM -0400, Greg Stein wrote:
> On Thu, Apr 22, 2010 at 15:48, Daniel Näslund <da...@longitudo.com> wrote:
> 
> >> read_info() returns changed_* information for copied nodes. A copied
> >> node has a corresponding node in the repository, and the changed_*
> >> information reflects that. (thus, your assemble_status should not be
> >> wiping the values out... if the caller wants to interpret added nodes
> >> that way, then it can)
>
> >> Personally, I would suggest that assemble_status() does NOTHING more
> >> than returning read_info's REVISION result. That is the "ideal" value
> >> for the revision of the node LOCAL_ABSPATH.
> >>
> >> If that revision is NOT what you want (ie. you have some other rules
> >> like "well... if it is added, then ..." or "if it is deleted, then
> >> ..."), then the caller should perform those calculations explicitly.
> >> The status code should not pre-suppose what rules the caller may want
> >> for revision handling. It should simply report the "ideal" revision
> >> for that node and be done with it.
> >
> > WHAT? You're saying that all the scanning should take place in the
> > client callback?
> 
> No. I said "don't create strange semantics".
> 
> The code you had in there had the following definition for status->revision:
> 
> "The revision of the BASE node, unless that has been deleted, then: if
> nothing was added, then still the same BASE revision, but if something
> was added, then SVN_INVALID_REVNUM. BUT! If you delete a child of that
> addition, then the BASE version MAY be returned again... we say "may"
> because if a BASE doesn't correspond to the deleted child of the add,
> then you'll get an error instead."
> 
> The definition that read_info uses is:
> 
> "The revision of the unmodified (BASE) node, or SVN_INVALID_REVNUM if
> any (structural) changes have been made to that node."
> 
> (where "structural" means an add/copy/move/delete; we don't mean
> content edits like propchanges or editing the text)
> 
> > That's a major change from what I've done so far. That
> > I should just return SVN_INVALID_REVNUM for cases where I've scanned for
> > deletion? Is that a behaviour that everyone agrees on? You're saying
> > personally so I'm assuming there hasn't been a discussion about it.
> 
> No, I don't think it should do a scan_add, nor a scan_deletion. Report
> the add/delete, and leave it at that.
> 
> repos_relpath, repos_root_url, and repos_uuid are all inherited
> values, so I believe a scan_base_repos makes sense: you're simply
> returning those values. You're not looking for further details about
> operations (which the caller may not be interested in anyways).
> 
> (and note that scan_base_repos doesn't even have to be used cuz
> assemble_status could be passed the parent repos_* values; if the BASE
> node hasn't been switched and (therefore) returns NULL for repos_*,
> then its repos_* values are easily derived within wc/status.c; but
> repos_* values will always be NULL for modified nodes (ie. if a
> WORKING node exists))
> 
> Scanning additional rows is not "free", so we shouldn't be looking up
> information from the ancestors unless/until we find that most/all of
> the callers need that information. And in some cases, we can even make
> status.c fill in the info, since it has seen the ancestors already.
> 
> In the future, we may return added/copied/moved_here distinctly so
> that a scan won't be needed to resolve that (tho you'd still have to
> scan to locate the root of the operation, tho we may be able to
> optimize that inside wc_db).

I set out to replace the entry field in svn_wc_statusX_t. I thought that
ideally I could just keep the semantics from the entry but move out the
fields to svn_wc_status2_t. 

I wanted to keep the current behaviour since the status code is involved
in so _many_ tests and rewriting all of those would cause me to loose
momentum. Guess, what? I've already lost that momentum. :)

I guess I should do what you suggested: Make assemble_status() simpler
to insure we that we can easily describe what it's returning. On the
other hand, I don't see the scanning process as something expensive
comparing to 'text compares'. And I question if the API we're exposing
really should demand that the callers do any extra work for fetching
something as fundamental as a revision. AFAIK, the only thing that is
not clear is how we handle working changes below a deleted node. If we
get that one settled, then why not return the _right_ revision at once?
I mean, for a VCS where the notion of globally recognized identifiers
for change sets is a central part, there should be little vagueness as
to which one we really mean for a path! The users of our API will hardly
want to use another behaviour than what the CLI exposes.

Since I'm easily persuaded, below is my idea of what to do to get to the
point where assemble_status() only returns the most basic info about the
revision of a node. (I really should skip step (i) and rewrite it to be
part (ii) at once but I've put so much effort into that patch. I just
want to commit it! Hardly a good argument, though.)

Suggested work flow
====================
i) Commit the current patch with a few mods to correct the behaviour for
changed_rev, entries-checking and some '###' comments in
assemble_status() clearly stating our intent to return a more
well-defined revision (less ambiguity).

ii) Create a svn_wc__node_get_revision_of_deleted() or similar to invoke
when status->text_status == svn_wc_status_deleted. That one should be
able to cope with changes below the base_del_abspath when returning the
abspath. Remove relevant parts from assemble_status() and invoke the
node function instead from svn/status.c.

iii) Replace the status->entry->lock_{token, comment,...} with
status->lock_{token, comment, ... }

iv) Remove the entry field and use another way for checking if a path is
versioned or not instead of just checking if status->entry is set.

v) Create the svn_status2_from_3() logic to create an entry for
svn_wc_status2_t.

vi) Change the output of 'svn status' to set uncommitted paths as having
rev ' ' instead of '0'.

vii) Do the same for 'svn info' after removing the uses of entries in
there. build_info_for_entry() is a rat nest of entry usages.

Thanks,
Daniel

Re: [PATCH] Don't use entries for checking status in svn_wc_status3_t. Was: Re: [WIP] Some quirky parts of libsvn_wc/status.c:assemble_status() for retrieving revisions

Posted by Greg Stein <gs...@gmail.com>.

On Thu, Apr 22, 2010 at 15:48, Daniel Näslund <da...@longitudo.com> wrote:
> On Thu, Apr 22, 2010 at 02:45:08PM -0400, Greg Stein wrote:
>> On Thu, Apr 22, 2010 at 10:05, Daniel Näslund <da...@longitudo.com> wrote:
>...
>> > * subversion/tests/cmdline/stat_tests.py
>> >  (status_with_tree_conflicts): An copied node should not have a
>> >    changed_rev or author.
>>
>> Don't the changed_rev and changed_author correspond to the repository
>> node that was copied into the tree? Thus, when I'm looking at a diff,
>> it is effectively relative to what I copied in.
>
> I had my doubts about this one, although I didn't express them here.
> Since a copied or moved node has a previous line of history it makes
> sense to use that history for changed_rev and changed_author.

Right.

> My
> motivation for not setting those for copied/moved nodes was that I
> thought that changed_rev was defined as the closest rev less than or
> equal to the current revision. Since changed_author is closely related
> to changed_rev I set that one to it's nil value too.

changed_rev is "for the path@rev you're looking at, when was path last changed?"

so if you have a history like:

1
2  path changed
3
4
5  path changed
6
7

Then changed_rev=0 for path@0 and @1. changed_rev=2 for path@2..4. and
changed_rev=5 for path@5..7.

IOW, if I update my working to rev R, then changed_rev will indicate
when wc/some/path was last changed, and that will always be <= R.

Nodes that are copied from elsewhere will have copyfrom_* and
changed_* information, and changed_rev <= copyfrom_rev.

>> > * subversion/tests/cmdline/svntest/actions.py
>> >  (run_and_verify_status): Add new parameter check_entries. We only
>> >    check the entries if check_entries is set to True.
>>
>> No way. For the *one* item which needs a different revision... maybe
>> okay. But by eliminating the check_entries, you're also eliminating
>> the checks on all the *other* items in the status output.
>>
>> Instead, I recommend changing svntest/wc.StateItem. Add an "entry_rev"
>> attribute defaulting to None. Where we expect the entry's revision to
>> differ from that reported in a typical status output, then we can set
>> this extra attribute and specifically track the change in behavior.
>
> Ok, I of course prefer this suggestion since it offers the same test
> coverage but allows me to change the behaviour. But I would have _never_
> come up with it on my own.

Yeah, don't worry. I thought it was going to be pretty difficult to
monkey things around until you mentioned tweak_entries_for_compare
later in your message, and then I went "oh yeah. forgot about that.
this should be much easier than I expected"  :-P

>...
>> >  (svn_wc_status2_t): Add fields revision, changed_rev and
>> >    changed_author.
>>
>> If you have changed_rev and changed_author, then please include
>> changed_date. It is weird/inconsistent to have two of the three.
>
> Ok, the reason it was left out was due to me looking in svn/status.c to
> determine what fields were used. I didn't see it so it was not included.
> I took the idea of the status code just returning the most cruciual
> parts too literally. What I should have done was look at what
> performance impact there was to including different fields. If I include
> the others, there won't be much overhead (virtually zero) to including
> the changed_date as well. Will do.

Yup. And they group together "as a unit".

>...
>> And now that you've changed status3, you should alter the
>> svn_wc__status2_from_3() function.
>
> Since the old code uses the fields from the entry and the entry is still
> there, nothing needs to be changed in status2_from_3(), *yet*.

And since you added the fields at the end. But still... this line:

  *new_stat = (svn_wc_status2_t *)svn_wc_status_dup(old_stat);

Is kinda fishy. Yes, it still happens to work, but the two structures
are not "identical" any more.

I'm not too worried right now, but what if somebody else makes a
change, and doesn't realize this fact? If a field is removed, then the
2_from3 would break (if you had manual copying). If a field is added,
then it *may* work depending on where it was added. You can keep it
straight, but how aware are the other devs of this sneakiness?

>...
>> > ii) Copies from foreign repos to wc has the rev set to -1
>>
>> Hmm? "foreign repos" only makes sense for merges, I believe. I think
>> you mean "copy from (our) repository to wc".
>
> Nope, check out copy_tests.py:978.

Oh! Hunh. Learn something new every day.

I'm guessing they just turn into normal adds then? And if that's true,
then I wonder where the heck the 1 ever came from. Why is
entry->revision == 1 in that case?

>...
>> read_info() returns changed_* information for copied nodes. A copied
>> node has a corresponding node in the repository, and the changed_*
>> information reflects that. (thus, your assemble_status should not be
>> wiping the values out... if the caller wants to interpret added nodes
>> that way, then it can)
>
> Ok, for the reasons I stated earlier.
>...
>> Personally, I would suggest that assemble_status() does NOTHING more
>> than returning read_info's REVISION result. That is the "ideal" value
>> for the revision of the node LOCAL_ABSPATH.
>>
>> If that revision is NOT what you want (ie. you have some other rules
>> like "well... if it is added, then ..." or "if it is deleted, then
>> ..."), then the caller should perform those calculations explicitly.
>> The status code should not pre-suppose what rules the caller may want
>> for revision handling. It should simply report the "ideal" revision
>> for that node and be done with it.
>
> WHAT? You're saying that all the scanning should take place in the
> client callback?

No. I said "don't create strange semantics".

The code you had in there had the following definition for status->revision:

"The revision of the BASE node, unless that has been deleted, then: if
nothing was added, then still the same BASE revision, but if something
was added, then SVN_INVALID_REVNUM. BUT! If you delete a child of that
addition, then the BASE version MAY be returned again... we say "may"
because if a BASE doesn't correspond to the deleted child of the add,
then you'll get an error instead."

The definition that read_info uses is:

"The revision of the unmodified (BASE) node, or SVN_INVALID_REVNUM if
any (structural) changes have been made to that node."

(where "structural" means an add/copy/move/delete; we don't mean
content edits like propchanges or editing the text)

> That's a major change from what I've done so far. That
> I should just return SVN_INVALID_REVNUM for cases where I've scanned for
> deletion? Is that a behaviour that everyone agrees on? You're saying
> personally so I'm assuming there hasn't been a discussion about it.

No, I don't think it should do a scan_add, nor a scan_deletion. Report
the add/delete, and leave it at that.

repos_relpath, repos_root_url, and repos_uuid are all inherited
values, so I believe a scan_base_repos makes sense: you're simply
returning those values. You're not looking for further details about
operations (which the caller may not be interested in anyways).

(and note that scan_base_repos doesn't even have to be used cuz
assemble_status could be passed the parent repos_* values; if the BASE
node hasn't been switched and (therefore) returns NULL for repos_*,
then its repos_* values are easily derived within wc/status.c; but
repos_* values will always be NULL for modified nodes (ie. if a
WORKING node exists))

Scanning additional rows is not "free", so we shouldn't be looking up
information from the ancestors unless/until we find that most/all of
the callers need that information. And in some cases, we can even make
status.c fill in the info, since it has seen the ancestors already.

In the future, we may return added/copied/moved_here distinctly so
that a scan won't be needed to resolve that (tho you'd still have to
scan to locate the root of the operation, tho we may be able to
optimize that inside wc_db).

Cheers,
-g

Re: [PATCH] Don't use entries for checking status in svn_wc_status3_t. Was: Re: [WIP] Some quirky parts of libsvn_wc/status.c:assemble_status() for retrieving revisions

Posted by Daniel Näslund <da...@longitudo.com>.

On Thu, Apr 22, 2010 at 02:45:08PM -0400, Greg Stein wrote:
> On Thu, Apr 22, 2010 at 10:05, Daniel Näslund <da...@longitudo.com> wrote:
> > [[[
> > As part of WC-NG, replace entry calls for revisions from
> > svn_wc_status2_t related code.
> >
> > The entries code set the revision of newly added nodes to 0 but the db
> > sets them to -1. Since too many tests needs to be changed and 'svn info'
> > also uses 0, I'll change those values in a follow-up patch instead.
> >
> > * subversion/tests/cmdline/copy_tests.py
> >  (repos_to_wc): Change revision to 0 since the node is added. Don't
> >    check against entries since the behavior differs.
> >
> > * subversion/tests/cmdline/stat_tests.py
> >  (status_with_tree_conflicts): An copied node should not have a
> >    changed_rev or author.
> 
> Don't the changed_rev and changed_author correspond to the repository
> node that was copied into the tree? Thus, when I'm looking at a diff,
> it is effectively relative to what I copied in.

I had my doubts about this one, although I didn't express them here.
Since a copied or moved node has a previous line of history it makes
sense to use that history for changed_rev and changed_author. My
motivation for not setting those for copied/moved nodes was that I
thought that changed_rev was defined as the closest rev less than or
equal to the current revision. Since changed_author is closely related
to changed_rev I set that one to it's nil value too. 
> 
> > * subversion/tests/cmdline/svntest/actions.py
> >  (run_and_verify_status): Add new parameter check_entries. We only
> >    check the entries if check_entries is set to True.
> 
> No way. For the *one* item which needs a different revision... maybe
> okay. But by eliminating the check_entries, you're also eliminating
> the checks on all the *other* items in the status output.
> 
> Instead, I recommend changing svntest/wc.StateItem. Add an "entry_rev"
> attribute defaulting to None. Where we expect the entry's revision to
> differ from that reported in a typical status output, then we can set
> this extra attribute and specifically track the change in behavior.

Ok, I of course prefer this suggestion since it offers the same test
coverage but allows me to change the behaviour. But I would have _never_
come up with it on my own.
> 
> And that's what we're talking about here: a change in behavior from
> 1.6 to 1.7. The entries testing is in there to ensure that we don't
> (arbitrarily) change the revision information that we display.

I understand and totally agree on that we should be able to easily see
in what way our new code differs from the entries-based.
> >
> > * subversion/tests/cmdline/merge_tests.py
> >  (merge_into_missing): Add revision to status output for missing dir.
> >    Previously, missing dirs did not have a revision, only files had.
> >    Don't check entries since the behavior differs.
> >
> > * subversion/svn/status.c
> >  (print_status): Replace checks for revisions using entries with the
> >    direct fields in svn_wc_status3_t. Set the revision for added and
> >    replaced nodes to 0. WC-NG sets those revisions to -1, but changing
> >    all the involved tests is for a follow-up.
> >
> > * subversion/include/svn_wc.h
> >  (svn_wc_status2_t): Add fields revision, changed_rev and
> >    changed_author.
> 
> If you have changed_rev and changed_author, then please include
> changed_date. It is weird/inconsistent to have two of the three.

Ok, the reason it was left out was due to me looking in svn/status.c to
determine what fields were used. I didn't see it so it was not included.
I took the idea of the status code just returning the most cruciual
parts too literally. What I should have done was look at what
performance impact there was to including different fields. If I include
the others, there won't be much overhead (virtually zero) to including
the changed_date as well. Will do.
> 
> Also: your log message says svn_wc_status2_t, but you changed 3.

Oups. I thought I changed it but, well, no I didn't. :(

> And now that you've changed status3, you should alter the
> svn_wc__status2_from_3() function.

Since the old code uses the fields from the entry and the entry is still
there, nothing needs to be changed in status2_from_3(), *yet*.

> And you don't need @since tags in this part of the change, since the
> whole structure is new.
> 
> > * subversion/libsvn_wc/status.c
> >  (assemble_status): Fill in the new fields with data from the wc db.
> >  (svn_wc_dup_status3): Copy the changed_author field.
> > ]]]
> >
> > Quirks
> > ========
> >
> > Checking entries
> > -------------------
> > I need to tweak the status output when doing a compare against the
> > entries in run_and_verify_status(). The code changes the behaviour for
> > revisions in three ways:
> >
> > i) Missing dirs, now have a revision
> 
> This seems fine.
> 
> > ii) Copies from foreign repos to wc has the rev set to -1
> 
> Hmm? "foreign repos" only makes sense for merges, I believe. I think
> you mean "copy from (our) repository to wc".

Nope, check out copy_tests.py:978.
> 
> The original status code would show copyfrom_rev here, right? IOW, yes
> the status3_t struct should show revision=SVN_INVALID_REVNUM, but I
> think the status cmdline would show copyfrom_rev.
> 
> > iii) A copied node should not have a changed_author or changed_rev set.
> 
> I disagree, per above.
> 
> read_info() returns changed_* information for copied nodes. A copied
> node has a corresponding node in the repository, and the changed_*
> information reflects that. (thus, your assemble_status should not be
> wiping the values out... if the caller wants to interpret added nodes
> that way, then it can)

Ok, for the reasons I stated earlier.

> > I could change tweak_for_entries_compare() to fix i) but for the
> > other two I would have to write a func for tweaking the status returned
> > by 'entries-dump'. Can't I just use the check_entries parameter I've
> > added to run_and_verify_status() instead? I would only disable the
> > entries check for three tests.
> 
> For the whole *test*. Not just the different items.
> 
> Instead, alter tweak_for_entries_compare() to do something like:
> 
>   if item.entry_rev is not None:
>     item.wc_rev = item.entry_rev
> 
> > Getting revisions of deleted nodes
> > -----------------------------------
> > I'm using base_get_info() for both nodes having the op_root from a plain
> > delete and those from a delete within an add. In the later case, the
> > revision, changed_rev and changed_author will be set to the default nil
> > values which match what I would get from read_info().
> 
> Whoop. Careful here. Deleted nodes may not have an underlying BASE
> node that you can fetch a revision from. Consider:
> 
> $ svn cp A B
> $ svn rm B/foo
> 
> WORK_DEL_ABSPATH will be B/foo, and there it has no BASE node.
> 
> The question is: do you want to use the copyfrom_rev in this case? Or
> do you want to use B's parent's revision?

Yeah, I read the get_base_info() code as if it would return nil values
if there was no BASE and that all cases where there was no BASE node
would automatically mean; 'use the nil values'. I agree that it was a
dumb idea.
> 
> Personally, I would suggest that assemble_status() does NOTHING more
> than returning read_info's REVISION result. That is the "ideal" value
> for the revision of the node LOCAL_ABSPATH.
> 
> If that revision is NOT what you want (ie. you have some other rules
> like "well... if it is added, then ..." or "if it is deleted, then
> ..."), then the caller should perform those calculations explicitly.
> The status code should not pre-suppose what rules the caller may want
> for revision handling. It should simply report the "ideal" revision
> for that node and be done with it.

WHAT? You're saying that all the scanning should take place in the
client callback? That's a major change from what I've done so far. That
I should just return SVN_INVALID_REVNUM for cases where I've scanned for
deletion? Is that a behaviour that everyone agrees on? You're saying
personally so I'm assuming there hasn't been a discussion about it.

Cheers,
Daniel

Re: [PATCH] Don't use entries for checking status in svn_wc_status3_t. Was: Re: [WIP] Some quirky parts of libsvn_wc/status.c:assemble_status() for retrieving revisions

Posted by Daniel Näslund <da...@longitudo.com>.

On Thu, Apr 22, 2010 at 02:45:08PM -0400, Greg Stein wrote:
> On Thu, Apr 22, 2010 at 10:05, Daniel Näslund <da...@longitudo.com> wrote:
> > [[[
> > As part of WC-NG, replace entry calls for revisions from
> > svn_wc_status2_t related code.
> >
> > The entries code set the revision of newly added nodes to 0 but the db
> > sets them to -1. Since too many tests needs to be changed and 'svn info'
> > also uses 0, I'll change those values in a follow-up patch instead.
> >
> > * subversion/tests/cmdline/copy_tests.py
> >  (repos_to_wc): Change revision to 0 since the node is added. Don't
> >    check against entries since the behavior differs.
> >
> > * subversion/tests/cmdline/stat_tests.py
> >  (status_with_tree_conflicts): An copied node should not have a
> >    changed_rev or author.
> 
> Don't the changed_rev and changed_author correspond to the repository
> node that was copied into the tree? Thus, when I'm looking at a diff,
> it is effectively relative to what I copied in.

I had my doubts about this one, although I didn't express them here.
Since a copied or moved node has a previous line of history it makes
sense to use that history for changed_rev and changed_author. My
motivation for not setting those for copied/moved nodes was that I
thought that changed_rev was defined as the closest rev less than or
equal to the current revision. Since changed_author is closely related
to changed_rev I set that one to it's nil value too. 
> 
> > * subversion/tests/cmdline/svntest/actions.py
> >  (run_and_verify_status): Add new parameter check_entries. We only
> >    check the entries if check_entries is set to True.
> 
> No way. For the *one* item which needs a different revision... maybe
> okay. But by eliminating the check_entries, you're also eliminating
> the checks on all the *other* items in the status output.
> 
> Instead, I recommend changing svntest/wc.StateItem. Add an "entry_rev"
> attribute defaulting to None. Where we expect the entry's revision to
> differ from that reported in a typical status output, then we can set
> this extra attribute and specifically track the change in behavior.

Ok, I of course prefer this suggestion since it offers the same test
coverage but allows me to change the behaviour. But I would have _never_
come up with it on my own.
> 
> And that's what we're talking about here: a change in behavior from
> 1.6 to 1.7. The entries testing is in there to ensure that we don't
> (arbitrarily) change the revision information that we display.

I understand and totally agree on that we should be able to easily see
in what way our new code differs from the entries-based.
> >
> > * subversion/tests/cmdline/merge_tests.py
> >  (merge_into_missing): Add revision to status output for missing dir.
> >    Previously, missing dirs did not have a revision, only files had.
> >    Don't check entries since the behavior differs.
> >
> > * subversion/svn/status.c
> >  (print_status): Replace checks for revisions using entries with the
> >    direct fields in svn_wc_status3_t. Set the revision for added and
> >    replaced nodes to 0. WC-NG sets those revisions to -1, but changing
> >    all the involved tests is for a follow-up.
> >
> > * subversion/include/svn_wc.h
> >  (svn_wc_status2_t): Add fields revision, changed_rev and
> >    changed_author.
> 
> If you have changed_rev and changed_author, then please include
> changed_date. It is weird/inconsistent to have two of the three.

Ok, the reason it was left out was due to me looking in svn/status.c to
determine what fields were used. I didn't see it so it was not included.
I took the idea of the status code just returning the most cruciual
parts too literally. What I should have done was look at what
performance impact there was to including different fields. If I include
the others, there won't be much overhead (virtually zero) to including
the changed_date as well. Will do.
> 
> Also: your log message says svn_wc_status2_t, but you changed 3.

Oups. I thought I changed it but, well, no I didn't. :(

> And now that you've changed status3, you should alter the
> svn_wc__status2_from_3() function.

Since the old code uses the fields from the entry and the entry is still
there, nothing needs to be changed in status2_from_3(), *yet*.

> And you don't need @since tags in this part of the change, since the
> whole structure is new.
> 
> > * subversion/libsvn_wc/status.c
> >  (assemble_status): Fill in the new fields with data from the wc db.
> >  (svn_wc_dup_status3): Copy the changed_author field.
> > ]]]
> >
> > Quirks
> > ========
> >
> > Checking entries
> > -------------------
> > I need to tweak the status output when doing a compare against the
> > entries in run_and_verify_status(). The code changes the behaviour for
> > revisions in three ways:
> >
> > i) Missing dirs, now have a revision
> 
> This seems fine.
> 
> > ii) Copies from foreign repos to wc has the rev set to -1
> 
> Hmm? "foreign repos" only makes sense for merges, I believe. I think
> you mean "copy from (our) repository to wc".

Nope, check out copy_tests.py:978.
> 
> The original status code would show copyfrom_rev here, right? IOW, yes
> the status3_t struct should show revision=SVN_INVALID_REVNUM, but I
> think the status cmdline would show copyfrom_rev.
> 
> > iii) A copied node should not have a changed_author or changed_rev set.
> 
> I disagree, per above.
> 
> read_info() returns changed_* information for copied nodes. A copied
> node has a corresponding node in the repository, and the changed_*
> information reflects that. (thus, your assemble_status should not be
> wiping the values out... if the caller wants to interpret added nodes
> that way, then it can)

Ok, for the reasons I stated earlier.

> > I could change tweak_for_entries_compare() to fix i) but for the
> > other two I would have to write a func for tweaking the status returned
> > by 'entries-dump'. Can't I just use the check_entries parameter I've
> > added to run_and_verify_status() instead? I would only disable the
> > entries check for three tests.
> 
> For the whole *test*. Not just the different items.
> 
> Instead, alter tweak_for_entries_compare() to do something like:
> 
>   if item.entry_rev is not None:
>     item.wc_rev = item.entry_rev
> 
> > Getting revisions of deleted nodes
> > -----------------------------------
> > I'm using base_get_info() for both nodes having the op_root from a plain
> > delete and those from a delete within an add. In the later case, the
> > revision, changed_rev and changed_author will be set to the default nil
> > values which match what I would get from read_info().
> 
> Whoop. Careful here. Deleted nodes may not have an underlying BASE
> node that you can fetch a revision from. Consider:
> 
> $ svn cp A B
> $ svn rm B/foo
> 
> WORK_DEL_ABSPATH will be B/foo, and there it has no BASE node.
> 
> The question is: do you want to use the copyfrom_rev in this case? Or
> do you want to use B's parent's revision?

Yeah, I read the get_base_info() code as if it would return nil values
if there was no BASE and that all cases where there was no BASE node
would automatically mean; 'use the nil values'. I agree that it was a
dumb idea.
> 
> Personally, I would suggest that assemble_status() does NOTHING more
> than returning read_info's REVISION result. That is the "ideal" value
> for the revision of the node LOCAL_ABSPATH.
> 
> If that revision is NOT what you want (ie. you have some other rules
> like "well... if it is added, then ..." or "if it is deleted, then
> ..."), then the caller should perform those calculations explicitly.
> The status code should not pre-suppose what rules the caller may want
> for revision handling. It should simply report the "ideal" revision
> for that node and be done with it.

WHAT? You're saying that all the scanning should take place in the
client callback? That's a major change from what I've done so far. That
I should just return SVN_INVALID_REVNUM for cases where I've scanned for
deletion? Is that a behaviour that everyone agrees on? You're saying
personally so I'm assuming there hasn't been a discussion about it.

Cheers,
Daniel

Re: [PATCH] Don't use entries for checking status in svn_wc_status3_t. Was: Re: [WIP] Some quirky parts of libsvn_wc/status.c:assemble_status() for retrieving revisions

Posted by Greg Stein <gs...@gmail.com>.

On Thu, Apr 22, 2010 at 10:05, Daniel Näslund <da...@longitudo.com> wrote:
> Hi!
>
> [[[
> As part of WC-NG, replace entry calls for revisions from
> svn_wc_status2_t related code.
>
> The entries code set the revision of newly added nodes to 0 but the db
> sets them to -1. Since too many tests needs to be changed and 'svn info'
> also uses 0, I'll change those values in a follow-up patch instead.
>
> * subversion/tests/cmdline/copy_tests.py
>  (repos_to_wc): Change revision to 0 since the node is added. Don't
>    check against entries since the behavior differs.
>
> * subversion/tests/cmdline/stat_tests.py
>  (status_with_tree_conflicts): An copied node should not have a
>    changed_rev or author.

Don't the changed_rev and changed_author correspond to the repository
node that was copied into the tree? Thus, when I'm looking at a diff,
it is effectively relative to what I copied in.

> * subversion/tests/cmdline/svntest/actions.py
>  (run_and_verify_status): Add new parameter check_entries. We only
>    check the entries if check_entries is set to True.

No way. For the *one* item which needs a different revision... maybe
okay. But by eliminating the check_entries, you're also eliminating
the checks on all the *other* items in the status output.

Instead, I recommend changing svntest/wc.StateItem. Add an "entry_rev"
attribute defaulting to None. Where we expect the entry's revision to
differ from that reported in a typical status output, then we can set
this extra attribute and specifically track the change in behavior.

And that's what we're talking about here: a change in behavior from
1.6 to 1.7. The entries testing is in there to ensure that we don't
(arbitrarily) change the revision information that we display.

>
> * subversion/tests/cmdline/merge_tests.py
>  (merge_into_missing): Add revision to status output for missing dir.
>    Previously, missing dirs did not have a revision, only files had.
>    Don't check entries since the behavior differs.
>
> * subversion/svn/status.c
>  (print_status): Replace checks for revisions using entries with the
>    direct fields in svn_wc_status3_t. Set the revision for added and
>    replaced nodes to 0. WC-NG sets those revisions to -1, but changing
>    all the involved tests is for a follow-up.
>
> * subversion/include/svn_wc.h
>  (svn_wc_status2_t): Add fields revision, changed_rev and
>    changed_author.

If you have changed_rev and changed_author, then please include
changed_date. It is weird/inconsistent to have two of the three.

Also: your log message says svn_wc_status2_t, but you changed 3.

And now that you've changed status3, you should alter the
svn_wc__status2_from_3() function.

And you don't need @since tags in this part of the change, since the
whole structure is new.

> * subversion/libsvn_wc/status.c
>  (assemble_status): Fill in the new fields with data from the wc db.
>  (svn_wc_dup_status3): Copy the changed_author field.
> ]]]
>
> Quirks
> ========
>
> Checking entries
> -------------------
> I need to tweak the status output when doing a compare against the
> entries in run_and_verify_status(). The code changes the behaviour for
> revisions in three ways:
>
> i) Missing dirs, now have a revision

This seems fine.

> ii) Copies from foreign repos to wc has the rev set to -1

Hmm? "foreign repos" only makes sense for merges, I believe. I think
you mean "copy from (our) repository to wc".

The original status code would show copyfrom_rev here, right? IOW, yes
the status3_t struct should show revision=SVN_INVALID_REVNUM, but I
think the status cmdline would show copyfrom_rev.

> iii) A copied node should not have a changed_author or changed_rev set.

I disagree, per above.

read_info() returns changed_* information for copied nodes. A copied
node has a corresponding node in the repository, and the changed_*
information reflects that. (thus, your assemble_status should not be
wiping the values out... if the caller wants to interpret added nodes
that way, then it can)

> I could change tweak_for_entries_compare() to fix i) but for the
> other two I would have to write a func for tweaking the status returned
> by 'entries-dump'. Can't I just use the check_entries parameter I've
> added to run_and_verify_status() instead? I would only disable the
> entries check for three tests.

For the whole *test*. Not just the different items.

Instead, alter tweak_for_entries_compare() to do something like:

  if item.entry_rev is not None:
    item.wc_rev = item.entry_rev

> Getting revisions of deleted nodes
> -----------------------------------
> I'm using base_get_info() for both nodes having the op_root from a plain
> delete and those from a delete within an add. In the later case, the
> revision, changed_rev and changed_author will be set to the default nil
> values which match what I would get from read_info().

Whoop. Careful here. Deleted nodes may not have an underlying BASE
node that you can fetch a revision from. Consider:

$ svn cp A B
$ svn rm B/foo

WORK_DEL_ABSPATH will be B/foo, and there it has no BASE node.

The question is: do you want to use the copyfrom_rev in this case? Or
do you want to use B's parent's revision?

Personally, I would suggest that assemble_status() does NOTHING more
than returning read_info's REVISION result. That is the "ideal" value
for the revision of the node LOCAL_ABSPATH.

If that revision is NOT what you want (ie. you have some other rules
like "well... if it is added, then ..." or "if it is deleted, then
..."), then the caller should perform those calculations explicitly.
The status code should not pre-suppose what rules the caller may want
for revision handling. It should simply report the "ideal" revision
for that node and be done with it.

Cheers,
-g