You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Daniel Näslund <da...@longitudo.com> on 2010/02/21 23:02:27 UTC

[WIP] 'svn patch' should remove empty dirs

Hi Stefan!

This patch is growing and growing and I'm worried that I'm heading off
in the wrong direction. Perhaps you can skim through the patch and tell
me if this is the right approach. You've said so earlier but all these
lines of code makes me nervous. I like small fixes, and this is starting
to look bloated.

A preliminary log msg (it's a WIP)
[[[
Make 'svn patch' able to remove empty.

* subversion/tests/cmdline/patch_tests.py
  (patch_remove_empty_dir): New.

* subversion/libsvn_client/patch.c
  (is_dir_empty): Checks if a dir is empty, taking missing nodes into
    account.
  (sort_compare_paths_by_depth): Compare func to use with
    svn_sort__hash(). Compares nr of '/' separators. Since the paths are
    canonicalized, it should work just fine.
  (collect_deleted_targets): If there is targets to be deleted, collect
    those targets and the targets no involving deletes in two arrays.
    TODO: We should condense the list of deleted dirs and be able to
    only delete the top-most parent.
  (apply_patches): TODO: We should remove the targets in
    delete_all_of_this. Or perhaps concatenate the two list into one?
]]]

Just a quick.
[ ] Yes, using APR forces you to a lot of boilerplate. 
[ ] Hey there, you don't need to do all that, just ...

Hope I'm not asking to much,
Daniel

Re: [WIP] 'svn patch' should remove empty dirs

Posted by Neels J Hofmeyr <ne...@elego.de>.
Daniel Näslund wrote:
> On Mon, Feb 22, 2010 at 12:44:06PM +0100, Neels J Hofmeyr wrote:
>> Re is_dir_empty() comment: When I last created function comments starting
>> with "Helper for..." I was told that that's not really the desired way to
>> go. We should rather have descriptive names / concise docs, and we should
>> design functions to be re-used for a particular purpose, rather than
>> "hiding" code in specialised helper functions. I came to agree with that :)
> 
> Ok. But isn't this bordering to bike-shedding? :)

I thought the same at first, but this one really isn't. Thinking in terms of
helper-functions causes clunky code. Things go together in one function that
are not really that related. They get meaningless names and comments. etc.

Instead thinking "which stock function would do this, and does this stock
function already exist" hopefully saves us a lot of refactoring later.

Nice to see that there will be more versions of this patch :)
* neels waits & watches in anticipation

~Neels


Re: [WIP] 'svn patch' should remove empty dirs

Posted by Julian Foad <ju...@wandisco.com>.
Daniel Näslund wrote:
> > > Just a quick.
> > > [ ] Yes, using APR forces you to a lot of boilerplate.
> > 
> > You mean a lot of opaque APIs? With a (very) quick glance, I don't see any
> > unusual use of APR.
> 
> Ok. But could someone atleast tell me where to find a linked list in APR?

APR doesn't provide a simple linked list, although, now I look, I see
that there is a "ring" - a circular, doubly-linked list. We normally
just use the "array" and "hash" containers. There is also a "table"
which seems to be a cross between array and hash table. See
<http://dev.ariel-networks.com/apr/apr-tutorial/html/apr-tutorial-19.html> for an introduction to the four main container APIs.

- Julian


Re: [WIP] 'svn patch' should remove empty dirs

Posted by Daniel Näslund <da...@longitudo.com>.
On Mon, Feb 22, 2010 at 12:44:06PM +0100, Neels J Hofmeyr wrote:
> Hi Daniel,
> 
> let me comment as far as my knowledge goes. (probably more general remarks)
> 
> Daniel Näslund wrote:
> > Hi Stefan!
> > 
> > This patch is growing and growing and I'm worried that I'm heading off
> > in the wrong direction. Perhaps you can skim through the patch and tell
> > me if this is the right approach. You've said so earlier but all these
> > lines of code makes me nervous. I like small fixes, and this is starting
> > to look bloated.
> > 
> > A preliminary log msg (it's a WIP)
> > [[[
> > Make 'svn patch' able to remove empty.
> > 
> > * subversion/tests/cmdline/patch_tests.py
> >   (patch_remove_empty_dir): New.
> > 
> > * subversion/libsvn_client/patch.c
> >   (is_dir_empty): Checks if a dir is empty, taking missing nodes into
> >     account.
> 
> If you create a new function, you can simply write
> 
>     (function_name): New function.
> 
> The rationale is that you are supposed to write detailed function docs in
> the function declaration's comment. (Like you did with the test, though I
> personally would have written "New test." to be at least that clear). Of
> course, you can add anything that is really special about the function. But
> in general, that's not needed :)

Noted. Anything that eases the burden of writing doc comments are
welcomed! :).
 
> Re is_dir_empty() comment: When I last created function comments starting
> with "Helper for..." I was told that that's not really the desired way to
> go. We should rather have descriptive names / concise docs, and we should
> design functions to be re-used for a particular purpose, rather than
> "hiding" code in specialised helper functions. I came to agree with that :)

Ok. But isn't this bordering to bike-shedding? :)
 
> Re is_dir_empty() impl: I remember us chatting about that. Is it not
> possible to re-use some function that already exists?
> svn_client__can_delete() maybe?
> I vaguely expect is_dir_empty() to do some sort of crawl using an already
> existing function. svn_client_status5(), maybe?

I checked svn_client__can_delete() but decided to not use it threw some
errors I didn't want. As for rolling my own status_func and using
svn_client_status5(): Haven't thought that long. I was busy writing all
the code to read dirs and populate hash tables :). If I used
svn_client_status5() I could identify  svn_wc_status_missing and
svn_wc_status_unversioned withouth having to write all the dir and hash
table code. I could save the paths in the status_baton.  Doh, Why didn't
I do it like that?

> >   (sort_compare_paths_by_depth): Compare func to use with
> >     svn_sort__hash(). Compares nr of '/' separators. Since the paths are
> >     canonicalized, it should work just fine.
> 
> Should this function maybe be called something more specific, like
> "sort_compare_nr_of_path_elements()" or something?

Good idea, will do!
 
> > Just a quick.
> > [ ] Yes, using APR forces you to a lot of boilerplate.
> 
> You mean a lot of opaque APIs? With a (very) quick glance, I don't see any
> unusual use of APR.

Ok. But could someone atleast tell me where to find a linked list in APR?
 
> > [ ] Hey there, you don't need to do all that, just ...
> ...maybe use that svn_client_status5() function as said above, or find
> another function. But I'm not entirely sure.

See above

> > Hope I'm not asking to much,
> 
> I know how stupid one can feel trying to write a patch in the complex lands
> of Subversion... We're here to review. Thanks for asking for one!
> I hope I'm actually poking in the right directions, though.

And I appreciate you taking the time!
Daniel

Re: [WIP] 'svn patch' should remove empty dirs

Posted by Daniel Näslund <da...@longitudo.com>.
On Mon, Feb 22, 2010 at 12:44:06PM +0100, Neels J Hofmeyr wrote:
> Hi Daniel,
> 
> let me comment as far as my knowledge goes. (probably more general remarks)
> 
> Daniel Näslund wrote:
> > Hi Stefan!
> > 
> > This patch is growing and growing and I'm worried that I'm heading off
> > in the wrong direction. Perhaps you can skim through the patch and tell
> > me if this is the right approach. You've said so earlier but all these
> > lines of code makes me nervous. I like small fixes, and this is starting
> > to look bloated.
> > 
> > A preliminary log msg (it's a WIP)
> > [[[
> > Make 'svn patch' able to remove empty.
> > 
> > * subversion/tests/cmdline/patch_tests.py
> >   (patch_remove_empty_dir): New.
> > 
> > * subversion/libsvn_client/patch.c
> >   (is_dir_empty): Checks if a dir is empty, taking missing nodes into
> >     account.
> 
> If you create a new function, you can simply write
> 
>     (function_name): New function.
> 
> The rationale is that you are supposed to write detailed function docs in
> the function declaration's comment. (Like you did with the test, though I
> personally would have written "New test." to be at least that clear). Of
> course, you can add anything that is really special about the function. But
> in general, that's not needed :)

Noted. Anything that eases the burden of writing doc comments are
welcomed! :).
 
> Re is_dir_empty() comment: When I last created function comments starting
> with "Helper for..." I was told that that's not really the desired way to
> go. We should rather have descriptive names / concise docs, and we should
> design functions to be re-used for a particular purpose, rather than
> "hiding" code in specialised helper functions. I came to agree with that :)

Ok. But isn't this bordering to bike-shedding? :)
 
> Re is_dir_empty() impl: I remember us chatting about that. Is it not
> possible to re-use some function that already exists?
> svn_client__can_delete() maybe?
> I vaguely expect is_dir_empty() to do some sort of crawl using an already
> existing function. svn_client_status5(), maybe?

I checked svn_client__can_delete() but decided to not use it threw some
errors I didn't want. As for rolling my own status_func and using
svn_client_status5(): Haven't thought that long. I was busy writing all
the code to read dirs and populate hash tables :). If I used
svn_client_status5() I could identify  svn_wc_status_missing and
svn_wc_status_unversioned withouth having to write all the dir and hash
table code. I could save the paths in the status_baton.  Doh, Why didn't
I do it like that?

> >   (sort_compare_paths_by_depth): Compare func to use with
> >     svn_sort__hash(). Compares nr of '/' separators. Since the paths are
> >     canonicalized, it should work just fine.
> 
> Should this function maybe be called something more specific, like
> "sort_compare_nr_of_path_elements()" or something?

Good idea, will do!
 
> > Just a quick.
> > [ ] Yes, using APR forces you to a lot of boilerplate.
> 
> You mean a lot of opaque APIs? With a (very) quick glance, I don't see any
> unusual use of APR.

Ok. But could someone atleast tell me where to find a linked list in APR?
 
> > [ ] Hey there, you don't need to do all that, just ...
> ...maybe use that svn_client_status5() function as said above, or find
> another function. But I'm not entirely sure.

See above

> > Hope I'm not asking to much,
> 
> I know how stupid one can feel trying to write a patch in the complex lands
> of Subversion... We're here to review. Thanks for asking for one!
> I hope I'm actually poking in the right directions, though.

And I appreciate you taking the time!
Daniel

Re: [WIP] 'svn patch' should remove empty dirs

Posted by Julian Foad <ju...@wandisco.com>.
Neels J Hofmeyr wrote:
> Re is_dir_empty() comment: When I last created function comments starting
> with "Helper for..." I was told that that's not really the desired way to
> go. We should rather have descriptive names / concise docs, and we should
> design functions to be re-used for a particular purpose, rather than
> "hiding" code in specialised helper functions. I came to agree with that :)

It was probably me that said that. One of my pet theories is that it's
good to write a helper function, but any function is helpful to the
design and understanding of a structured system if and only if it has a
purpose and an interface that are understandable by itself, in terms of
some concepts about the domain it's working in. Thinking of a function
as "just a helper" can sometimes lead us to take a random portion of
code from the caller and forget about providing a clearly defined subset
of the caller's functionality which might later be useful to another
caller.

- Julian


Re: [WIP] 'svn patch' should remove empty dirs

Posted by Neels J Hofmeyr <ne...@elego.de>.
Hi Daniel,

let me comment as far as my knowledge goes. (probably more general remarks)

Daniel Näslund wrote:
> Hi Stefan!
> 
> This patch is growing and growing and I'm worried that I'm heading off
> in the wrong direction. Perhaps you can skim through the patch and tell
> me if this is the right approach. You've said so earlier but all these
> lines of code makes me nervous. I like small fixes, and this is starting
> to look bloated.
> 
> A preliminary log msg (it's a WIP)
> [[[
> Make 'svn patch' able to remove empty.
> 
> * subversion/tests/cmdline/patch_tests.py
>   (patch_remove_empty_dir): New.
> 
> * subversion/libsvn_client/patch.c
>   (is_dir_empty): Checks if a dir is empty, taking missing nodes into
>     account.

If you create a new function, you can simply write

    (function_name): New function.

The rationale is that you are supposed to write detailed function docs in
the function declaration's comment. (Like you did with the test, though I
personally would have written "New test." to be at least that clear). Of
course, you can add anything that is really special about the function. But
in general, that's not needed :)

Re is_dir_empty() comment: When I last created function comments starting
with "Helper for..." I was told that that's not really the desired way to
go. We should rather have descriptive names / concise docs, and we should
design functions to be re-used for a particular purpose, rather than
"hiding" code in specialised helper functions. I came to agree with that :)

Re is_dir_empty() impl: I remember us chatting about that. Is it not
possible to re-use some function that already exists?
svn_client__can_delete() maybe?
I vaguely expect is_dir_empty() to do some sort of crawl using an already
existing function. svn_client_status5(), maybe?

(disclaimer: I have no idea really, just poking to make sure.)


>   (sort_compare_paths_by_depth): Compare func to use with
>     svn_sort__hash(). Compares nr of '/' separators. Since the paths are
>     canonicalized, it should work just fine.

Should this function maybe be called something more specific, like
"sort_compare_nr_of_path_elements()" or something?

>   (collect_deleted_targets): If there is targets to be deleted, collect
>     those targets and the targets no involving deletes in two arrays.
>     TODO: We should condense the list of deleted dirs and be able to
>     only delete the top-most parent.
>   (apply_patches): TODO: We should remove the targets in
>     delete_all_of_this. Or perhaps concatenate the two list into one?

This comment lacks a description of what is being done in the code :)
TODOs are sometimes more appropriate in the code as a comment. (Just poking.)

> ]]]
> 
> Just a quick.
> [ ] Yes, using APR forces you to a lot of boilerplate.

You mean a lot of opaque APIs? With a (very) quick glance, I don't see any
unusual use of APR.

> [ ] Hey there, you don't need to do all that, just ...
...maybe use that svn_client_status5() function as said above, or find
another function. But I'm not entirely sure.

> 
> Hope I'm not asking to much,

I know how stupid one can feel trying to write a patch in the complex lands
of Subversion... We're here to review. Thanks for asking for one!
I hope I'm actually poking in the right directions, though.

~Neels



Re: [WIP] 'svn patch' should remove empty dirs

Posted by Daniel Näslund <da...@longitudo.com>.
On Mon, Feb 22, 2010 at 08:44:47AM +0100, Stefan Sperling wrote:
> On Sun, Feb 21, 2010 at 11:02:27PM +0100, Daniel Näslund wrote:

[..]

> > +  unidiff_patch = [
> > +    "Index: psi\n",
> > +    "===================================================================\n",
> > +    "--- A/D/H/psi\t(revision 0)\n",
> > +    "+++ A/D/H/psi\t(revision 0)\n",
> > +    "@@ -1 +0,0 @@\n",
> > +    "-This is the file 'psi'.\n",
> > +    "Index: omega\n",
> > +    "===================================================================\n",
> > +    "--- A/D/H/omega\t(revision 0)\n",
> > +    "+++ A/D/H/omega\t(revision 0)\n",
> > +    "@@ -1 +0,0 @@\n",
> > +    "-This is the file 'omega'.\n",
> > +  ]
> > +
> > +  svntest.main.file_write(patch_file_path, ''.join(unidiff_patch))
> > +
> > +  # We should be able to handle one path beeing missing.
> > +  os.remove(os.path.join(wc_dir, 'A', 'D', 'H', 'chi'))
> > +
> > +
> > +  expected_output = [
> > +    'D         %s\n' % os.path.join(wc_dir, 'A', 'D', 'H', 'chi'),
> > +    'D         %s\n' % os.path.join(wc_dir, 'A', 'D', 'H', 'psi'),
> > +    'D         %s\n' % os.path.join(wc_dir, 'A', 'D', 'H', 'omega'),
> > +  ]
> 
> I think we want a single 'D' notification for the entire directory.
> Notification for every single child is redundant.

Yeah, sorry. The test part is _very_ much a WIP. I just used it to test
for finding missing targets. It will of course be changed to check for
parent dirs to be removed.
 
[...]

> > +/* Helper for collect_deleted_targets(). Checks if PARENT_DIR_ABSPATH is
> > + * EMPTY when treating the targets in TARGETS_TO_BE_DELETED as already
> > + * deleted. */
> > +static svn_error_t *
> > +is_dir_empty(svn_boolean_t *empty, const char *parent_dir_abspath,
> > +             svn_wc_context_t *wc_ctx, 
> > +             apr_array_header_t *targets_to_be_deleted,
> > +             apr_pool_t *result_pool, apr_pool_t *scratch_pool)
> > +{
> > +  apr_dir_t *dir;
> > +  apr_finfo_t this_entry;
> > +  apr_int32_t flags = APR_FINFO_NAME;
> > +  apr_pool_t *iterpool;
> > +  svn_error_t *err;
> > +  apr_array_header_t *targets_found;
> > +  const apr_array_header_t *missing_targets;
> > +  const char *abspath;
> > +  int i;
> > +
> > +  SVN_DBG(("In is_dir_empty()\n"));
> > +  targets_found = apr_array_make(scratch_pool, 0, sizeof(const char *));
> > +
> > +  SVN_ERR(svn_io_dir_open(&dir, parent_dir_abspath, scratch_pool));
> > +
> > +  iterpool = svn_pool_create(scratch_pool);
> > +
> > +  /* First we collect the entries on-disk...*/
> > +  while (1)
> > +    {
> > +      svn_pool_clear(iterpool);
> > +      err = svn_io_dir_read(&this_entry, flags, dir, iterpool);
> 
> Hmmm... have you seen svn_io_get_dir_filenames()?

Nope. This part was one of the things I refered to as boilerplate.
Thanks!

[...]
 
> > +
> > +  /* .. Then we collect the nodes that the db knows of. We do that since a
> > +   * missing node will not be found by simply checking the on-disk. We need
> > +   * to compare what the db knows with what's actually there. If a node is
> > +   * missing we add it to our targets_found. */
> 
> We might want to check the DB first. The DB handle is already open,
> so may be cheaper to run the DB queries some of which can be answered
> without hitting the disk.
> And if the DB prevents us from deleting the directory, we don't bother
> checking the disk ourselves. Note that some DB query functions might
> do disk i/o, I'm not sure which and under what circumstances
> though.

A perfectly valid optimization. Will do that.
 
> > +  SVN_ERR(svn_wc__node_get_children(&missing_targets, wc_ctx, 
> > +                                    parent_dir_abspath, FALSE, result_pool,
> > +                                    scratch_pool));
> > +
> > +  /* Check for missing targets. */
> > +  for (i = 0; i < missing_targets->nelts; i++)
> > +    {
> > +      const char *target;
> > +      svn_node_kind_t kind;
> > +
> > +      target = APR_ARRAY_IDX(missing_targets, i, const char *);
> > +
> > +      SVN_ERR(svn_io_check_path(target, &kind, scratch_pool));
> 
> Using svn_io_* is not using the DB, really.
> What we want here is something like svn_wc__node_is_status_present() or
> maybe svn_wc__node_is_status_absent(). Not sure. The point is getting
> info from the DB, so we'll eventually be calling svn_wc__db_read_info().
> The DB can tell us if a node is known as missing, obstructed, or
> locally deleted. Maybe using just some svn_wc__node_* functions tells
> us all we need to know, even?

absent, excluded and not_present are the three states making up the
hidden concept of the entries code. They refer to conditions when the wc
has not gained access no a node for some reason. The entries code
suggests authz restrictions, setting svn_depth_excluded (1.5 clients
doesn't handle that) and deleting and commiting a node without updating
its parent. I've set the show_hidden parameter of
svn_wc__node_get_children() to FALSE and not included those. When
thinking about it, I should have included those. Even if those nodes
won't appear in our wc they may in other peoples!

We want the missing nodes, those that the db thinks is there but are ...
missing. The db doesn't know who they are. To find those we need to
check what's on-disk.

Greg talked me through the show_hiddden and missing concept. Hopefully I
got it right.

In the def of svn_wc__db_status_t:
[[[
/* This node was named by the server, but no information was provided. */
svn_wc__db_status_absent,

/* This node has been administratively excluded. */
svn_wc__db_status_excluded,

/* This node is not present in this revision. This typically happens
   when a node is deleted and committed without updating its parent.
   The parent revision indicates it should be present, but this node's
   revision states otherwise. */
svn_wc__db_status_not_present,
]]]

not_present, absent and (not mentioned above) excluded are terms
refering to 
> 
> > +      SVN_DBG(("Testing if '%s' is missing\n", target));
> > +
> > +      if (kind == svn_node_none)
> > +        {
> > +          SVN_DBG(("Found missing node '%s'\n", target));
> > +          APR_ARRAY_PUSH(targets_found, const char *) = target;
> > +        }
> > +    }
> > +
> > +  *empty = TRUE;
> > +
> > +  /* Finally, we check to see if we're going to delete all entries in the
> > +   * dir. */
> > +  for (i = 0; i < targets_found->nelts; i++)
> > +    {
> 
> Can we integrate the strcmp() loop into the loop that iterates over the dirents,
> skipping dirents which don't match a deletion target?
> Then we could simply compare the number of elements of the targets found
> and targets deleted arrays here, instead of looping again.

Probably. In any case, there _is_ a way to write it better than what
I've done now.

[...]
 
> > +/* Helper for collect_deleted_targets(). Compare A and B and return an
> > + * integer greater than, equal to, or less than 0, according to whether A
> > + * has more subdirs, just as many or less subdir than B. */
> > +static int
> > +sort_compare_paths_by_depth(const svn_sort__item_t *a, 
> > +                            const svn_sort__item_t *b)
> > +{
> > +  const char *astr, *bstr;
> > +  int n_a, n_b, i;
> > +  
> > +  astr = a->key;
> > +  bstr = b->key;
> > +
> > +  for (i = 0; i < a->klen; i++)
> > +    if (astr[i] == '/')
> > +      n_a++;
> > +
> > +  for (i = 0; i < b->klen; i++)
> > +    if (bstr[i] == '/')
> > +      n_b++;
> > +
> > +  if (n_a > n_b)
> > +    return 1;
> > +  else if (n_a == n_b)
> > +    return 0;
> > +  else
> > +    return -1;
> > +}
> > +
> > +/* Determine which TARGETS are to be deleted. Return those targets in
> > + * DELETE_ALL_OF_THIS, allocated in RESULT_POOL. If all targets in a
> > + * directory is to be deleted, the dir itself is set to be deleted. The
> > + * remaining targets will be returned in TARGETS, allocated in RESULT_POOL.
> > + * If no target will be deleted, set DELETE_ALL_OF_THIS to NULL and return
> > + * the original TARGETS. Do temporary allocations in SCRATCH_POOL. */
> > +static svn_error_t *
> > +collect_deleted_targets(apr_array_header_t **delete_all_of_this, 
> 
> :)
> Maybe s/delete_all_of_this/condensed_deletion_target_list/ or something?

Maybe. I was thinking of concanating the two arrays and feed it to
install_patched_targets(). The only reason for using two arrays would be
if the deleted targets should be run before the other targets but I
don't see why we would want that.
 
[...] 

> > +
> > +  /* We sort the keys here for allowing the paths with the greatest depth to
> > +   * be processed first. */
> > +  sorted_keys = svn_sort__hash(targets_to_be_deleted,
> > +                               sort_compare_paths_by_depth,
> > +                               scratch_pool);
> > +
> > +  /* The keys are sorted in ascending order, but we want them in descending
> > +   * order. */
> 
> You could also flip the logic of your compare function to sort entries
> in descending order.

Probably a good idea. It's better to use the for(i=0,...) idiom when
possible.

[...]

> >  /* Install a patched TARGET into the working copy at ABS_WC_PATH.
> >   * Use client context CTX to retrieve WC_CTX, and possibly doing
> >   * notifications. If DRY_RUN is TRUE, don't modify the working copy.
> > @@ -1371,6 +1671,7 @@
> >    const char *patch_eol_str;
> >    apr_file_t *patch_file;
> >    apr_array_header_t *targets;
> > +  apr_array_header_t *delete_all_of_this;
> >    int i;
> >    apply_patches_baton_t *btn;
> >    
> > @@ -1413,6 +1714,10 @@
> >      }
> >    while (patch);
> >  
> > +  SVN_ERR(collect_deleted_targets(&delete_all_of_this, &targets, 
> > +                                  btn->ctx->wc_ctx, scratch_pool, 
> > +                                  result_pool));
> > +
> >    /* Install patched targets into the working copy. */
> >    for (i = 0; i < targets->nelts; i++)
> >      {
> 
> Where is delete_all_of_this used?

Not there yet, it's the next step! :)

Thanks for the review. Appreciate it!
Daniel

Re: [WIP] 'svn patch' should remove empty dirs

Posted by Daniel Näslund <da...@longitudo.com>.
On Mon, Feb 22, 2010 at 08:44:47AM +0100, Stefan Sperling wrote:
> On Sun, Feb 21, 2010 at 11:02:27PM +0100, Daniel Näslund wrote:

[..]

> > +  unidiff_patch = [
> > +    "Index: psi\n",
> > +    "===================================================================\n",
> > +    "--- A/D/H/psi\t(revision 0)\n",
> > +    "+++ A/D/H/psi\t(revision 0)\n",
> > +    "@@ -1 +0,0 @@\n",
> > +    "-This is the file 'psi'.\n",
> > +    "Index: omega\n",
> > +    "===================================================================\n",
> > +    "--- A/D/H/omega\t(revision 0)\n",
> > +    "+++ A/D/H/omega\t(revision 0)\n",
> > +    "@@ -1 +0,0 @@\n",
> > +    "-This is the file 'omega'.\n",
> > +  ]
> > +
> > +  svntest.main.file_write(patch_file_path, ''.join(unidiff_patch))
> > +
> > +  # We should be able to handle one path beeing missing.
> > +  os.remove(os.path.join(wc_dir, 'A', 'D', 'H', 'chi'))
> > +
> > +
> > +  expected_output = [
> > +    'D         %s\n' % os.path.join(wc_dir, 'A', 'D', 'H', 'chi'),
> > +    'D         %s\n' % os.path.join(wc_dir, 'A', 'D', 'H', 'psi'),
> > +    'D         %s\n' % os.path.join(wc_dir, 'A', 'D', 'H', 'omega'),
> > +  ]
> 
> I think we want a single 'D' notification for the entire directory.
> Notification for every single child is redundant.

Yeah, sorry. The test part is _very_ much a WIP. I just used it to test
for finding missing targets. It will of course be changed to check for
parent dirs to be removed.
 
[...]

> > +/* Helper for collect_deleted_targets(). Checks if PARENT_DIR_ABSPATH is
> > + * EMPTY when treating the targets in TARGETS_TO_BE_DELETED as already
> > + * deleted. */
> > +static svn_error_t *
> > +is_dir_empty(svn_boolean_t *empty, const char *parent_dir_abspath,
> > +             svn_wc_context_t *wc_ctx, 
> > +             apr_array_header_t *targets_to_be_deleted,
> > +             apr_pool_t *result_pool, apr_pool_t *scratch_pool)
> > +{
> > +  apr_dir_t *dir;
> > +  apr_finfo_t this_entry;
> > +  apr_int32_t flags = APR_FINFO_NAME;
> > +  apr_pool_t *iterpool;
> > +  svn_error_t *err;
> > +  apr_array_header_t *targets_found;
> > +  const apr_array_header_t *missing_targets;
> > +  const char *abspath;
> > +  int i;
> > +
> > +  SVN_DBG(("In is_dir_empty()\n"));
> > +  targets_found = apr_array_make(scratch_pool, 0, sizeof(const char *));
> > +
> > +  SVN_ERR(svn_io_dir_open(&dir, parent_dir_abspath, scratch_pool));
> > +
> > +  iterpool = svn_pool_create(scratch_pool);
> > +
> > +  /* First we collect the entries on-disk...*/
> > +  while (1)
> > +    {
> > +      svn_pool_clear(iterpool);
> > +      err = svn_io_dir_read(&this_entry, flags, dir, iterpool);
> 
> Hmmm... have you seen svn_io_get_dir_filenames()?

Nope. This part was one of the things I refered to as boilerplate.
Thanks!

[...]
 
> > +
> > +  /* .. Then we collect the nodes that the db knows of. We do that since a
> > +   * missing node will not be found by simply checking the on-disk. We need
> > +   * to compare what the db knows with what's actually there. If a node is
> > +   * missing we add it to our targets_found. */
> 
> We might want to check the DB first. The DB handle is already open,
> so may be cheaper to run the DB queries some of which can be answered
> without hitting the disk.
> And if the DB prevents us from deleting the directory, we don't bother
> checking the disk ourselves. Note that some DB query functions might
> do disk i/o, I'm not sure which and under what circumstances
> though.

A perfectly valid optimization. Will do that.
 
> > +  SVN_ERR(svn_wc__node_get_children(&missing_targets, wc_ctx, 
> > +                                    parent_dir_abspath, FALSE, result_pool,
> > +                                    scratch_pool));
> > +
> > +  /* Check for missing targets. */
> > +  for (i = 0; i < missing_targets->nelts; i++)
> > +    {
> > +      const char *target;
> > +      svn_node_kind_t kind;
> > +
> > +      target = APR_ARRAY_IDX(missing_targets, i, const char *);
> > +
> > +      SVN_ERR(svn_io_check_path(target, &kind, scratch_pool));
> 
> Using svn_io_* is not using the DB, really.
> What we want here is something like svn_wc__node_is_status_present() or
> maybe svn_wc__node_is_status_absent(). Not sure. The point is getting
> info from the DB, so we'll eventually be calling svn_wc__db_read_info().
> The DB can tell us if a node is known as missing, obstructed, or
> locally deleted. Maybe using just some svn_wc__node_* functions tells
> us all we need to know, even?

absent, excluded and not_present are the three states making up the
hidden concept of the entries code. They refer to conditions when the wc
has not gained access no a node for some reason. The entries code
suggests authz restrictions, setting svn_depth_excluded (1.5 clients
doesn't handle that) and deleting and commiting a node without updating
its parent. I've set the show_hidden parameter of
svn_wc__node_get_children() to FALSE and not included those. When
thinking about it, I should have included those. Even if those nodes
won't appear in our wc they may in other peoples!

We want the missing nodes, those that the db thinks is there but are ...
missing. The db doesn't know who they are. To find those we need to
check what's on-disk.

Greg talked me through the show_hiddden and missing concept. Hopefully I
got it right.

In the def of svn_wc__db_status_t:
[[[
/* This node was named by the server, but no information was provided. */
svn_wc__db_status_absent,

/* This node has been administratively excluded. */
svn_wc__db_status_excluded,

/* This node is not present in this revision. This typically happens
   when a node is deleted and committed without updating its parent.
   The parent revision indicates it should be present, but this node's
   revision states otherwise. */
svn_wc__db_status_not_present,
]]]

not_present, absent and (not mentioned above) excluded are terms
refering to 
> 
> > +      SVN_DBG(("Testing if '%s' is missing\n", target));
> > +
> > +      if (kind == svn_node_none)
> > +        {
> > +          SVN_DBG(("Found missing node '%s'\n", target));
> > +          APR_ARRAY_PUSH(targets_found, const char *) = target;
> > +        }
> > +    }
> > +
> > +  *empty = TRUE;
> > +
> > +  /* Finally, we check to see if we're going to delete all entries in the
> > +   * dir. */
> > +  for (i = 0; i < targets_found->nelts; i++)
> > +    {
> 
> Can we integrate the strcmp() loop into the loop that iterates over the dirents,
> skipping dirents which don't match a deletion target?
> Then we could simply compare the number of elements of the targets found
> and targets deleted arrays here, instead of looping again.

Probably. In any case, there _is_ a way to write it better than what
I've done now.

[...]
 
> > +/* Helper for collect_deleted_targets(). Compare A and B and return an
> > + * integer greater than, equal to, or less than 0, according to whether A
> > + * has more subdirs, just as many or less subdir than B. */
> > +static int
> > +sort_compare_paths_by_depth(const svn_sort__item_t *a, 
> > +                            const svn_sort__item_t *b)
> > +{
> > +  const char *astr, *bstr;
> > +  int n_a, n_b, i;
> > +  
> > +  astr = a->key;
> > +  bstr = b->key;
> > +
> > +  for (i = 0; i < a->klen; i++)
> > +    if (astr[i] == '/')
> > +      n_a++;
> > +
> > +  for (i = 0; i < b->klen; i++)
> > +    if (bstr[i] == '/')
> > +      n_b++;
> > +
> > +  if (n_a > n_b)
> > +    return 1;
> > +  else if (n_a == n_b)
> > +    return 0;
> > +  else
> > +    return -1;
> > +}
> > +
> > +/* Determine which TARGETS are to be deleted. Return those targets in
> > + * DELETE_ALL_OF_THIS, allocated in RESULT_POOL. If all targets in a
> > + * directory is to be deleted, the dir itself is set to be deleted. The
> > + * remaining targets will be returned in TARGETS, allocated in RESULT_POOL.
> > + * If no target will be deleted, set DELETE_ALL_OF_THIS to NULL and return
> > + * the original TARGETS. Do temporary allocations in SCRATCH_POOL. */
> > +static svn_error_t *
> > +collect_deleted_targets(apr_array_header_t **delete_all_of_this, 
> 
> :)
> Maybe s/delete_all_of_this/condensed_deletion_target_list/ or something?

Maybe. I was thinking of concanating the two arrays and feed it to
install_patched_targets(). The only reason for using two arrays would be
if the deleted targets should be run before the other targets but I
don't see why we would want that.
 
[...] 

> > +
> > +  /* We sort the keys here for allowing the paths with the greatest depth to
> > +   * be processed first. */
> > +  sorted_keys = svn_sort__hash(targets_to_be_deleted,
> > +                               sort_compare_paths_by_depth,
> > +                               scratch_pool);
> > +
> > +  /* The keys are sorted in ascending order, but we want them in descending
> > +   * order. */
> 
> You could also flip the logic of your compare function to sort entries
> in descending order.

Probably a good idea. It's better to use the for(i=0,...) idiom when
possible.

[...]

> >  /* Install a patched TARGET into the working copy at ABS_WC_PATH.
> >   * Use client context CTX to retrieve WC_CTX, and possibly doing
> >   * notifications. If DRY_RUN is TRUE, don't modify the working copy.
> > @@ -1371,6 +1671,7 @@
> >    const char *patch_eol_str;
> >    apr_file_t *patch_file;
> >    apr_array_header_t *targets;
> > +  apr_array_header_t *delete_all_of_this;
> >    int i;
> >    apply_patches_baton_t *btn;
> >    
> > @@ -1413,6 +1714,10 @@
> >      }
> >    while (patch);
> >  
> > +  SVN_ERR(collect_deleted_targets(&delete_all_of_this, &targets, 
> > +                                  btn->ctx->wc_ctx, scratch_pool, 
> > +                                  result_pool));
> > +
> >    /* Install patched targets into the working copy. */
> >    for (i = 0; i < targets->nelts; i++)
> >      {
> 
> Where is delete_all_of_this used?

Not there yet, it's the next step! :)

Thanks for the review. Appreciate it!
Daniel

Re: [WIP] 'svn patch' should remove empty dirs

Posted by Stefan Sperling <st...@elego.de>.
On Sun, Feb 21, 2010 at 11:02:27PM +0100, Daniel Näslund wrote:
> Hi Stefan!
> 
> This patch is growing and growing and I'm worried that I'm heading off
> in the wrong direction. Perhaps you can skim through the patch and tell
> me if this is the right approach. You've said so earlier but all these
> lines of code makes me nervous. I like small fixes, and this is starting
> to look bloated.

I gave it a quick review. I don't think you're off track. See below.

> 
> A preliminary log msg (it's a WIP)
> [[[
> Make 'svn patch' able to remove empty.
> 
> * subversion/tests/cmdline/patch_tests.py
>   (patch_remove_empty_dir): New.
> 
> * subversion/libsvn_client/patch.c
>   (is_dir_empty): Checks if a dir is empty, taking missing nodes into
>     account.
>   (sort_compare_paths_by_depth): Compare func to use with
>     svn_sort__hash(). Compares nr of '/' separators. Since the paths are
>     canonicalized, it should work just fine.
>   (collect_deleted_targets): If there is targets to be deleted, collect
>     those targets and the targets no involving deletes in two arrays.
>     TODO: We should condense the list of deleted dirs and be able to
>     only delete the top-most parent.
>   (apply_patches): TODO: We should remove the targets in
>     delete_all_of_this. Or perhaps concatenate the two list into one?
> ]]]
> 
> Just a quick.
> [ ] Yes, using APR forces you to a lot of boilerplate. 
> [ ] Hey there, you don't need to do all that, just ...
> 
> Hope I'm not asking to much,
> Daniel

> Index: subversion/tests/cmdline/patch_tests.py
> ===================================================================
> --- subversion/tests/cmdline/patch_tests.py	(revision 912338)
> +++ subversion/tests/cmdline/patch_tests.py	(arbetskopia)
> @@ -911,7 +911,63 @@
>                                         None, # expected err
>                                         1, # check-props
>                                         1) # dry-run
> +def patch_remove_empty_dir(sbox):
> +  "delete the dir if all children is deleted"
>  
> +  sbox.build()
> +  wc_dir = sbox.wc_dir
> +  
> +  patch_file_path = tempfile.mkstemp(dir=os.path.abspath(svntest.main.temp_dir))[1]
> +
> +  unidiff_patch = [
> +    "Index: psi\n",
> +    "===================================================================\n",
> +    "--- A/D/H/psi\t(revision 0)\n",
> +    "+++ A/D/H/psi\t(revision 0)\n",
> +    "@@ -1 +0,0 @@\n",
> +    "-This is the file 'psi'.\n",
> +    "Index: omega\n",
> +    "===================================================================\n",
> +    "--- A/D/H/omega\t(revision 0)\n",
> +    "+++ A/D/H/omega\t(revision 0)\n",
> +    "@@ -1 +0,0 @@\n",
> +    "-This is the file 'omega'.\n",
> +  ]
> +
> +  svntest.main.file_write(patch_file_path, ''.join(unidiff_patch))
> +
> +  # We should be able to handle one path beeing missing.
> +  os.remove(os.path.join(wc_dir, 'A', 'D', 'H', 'chi'))
> +
> +
> +  expected_output = [
> +    'D         %s\n' % os.path.join(wc_dir, 'A', 'D', 'H', 'chi'),
> +    'D         %s\n' % os.path.join(wc_dir, 'A', 'D', 'H', 'psi'),
> +    'D         %s\n' % os.path.join(wc_dir, 'A', 'D', 'H', 'omega'),
> +  ]

I think we want a single 'D' notification for the entire directory.
Notification for every single child is redundant.

> +
> +  expected_disk = svntest.main.greek_state.copy()
> +  expected_disk.remove('A/D/H/chi')
> +  expected_disk.remove('A/D/H/psi')
> +  expected_disk.remove('A/D/H/omega')
> +  expected_disk.remove('A/D/H')
> +
> +  expected_status = svntest.actions.get_virginal_state(wc_dir, 1)
> +  expected_status.add({'A/D/H' : Item(status='D ', wc_rev=1)})
> +
> +  expected_skip = wc.State('', { })
> +
> +  svntest.actions.run_and_verify_patch(wc_dir, 
> +                                       os.path.abspath(patch_file_path),
> +                                       expected_output,
> +                                       expected_disk,
> +                                       expected_status,
> +                                       expected_skip,
> +                                       None, # expected err
> +                                       1, # check-props
> +                                       1) # dry-run
> +
> +
>  def patch_reject(sbox):
>    "apply a patch which is rejected"
>  
> @@ -1363,6 +1419,7 @@
>                patch_chopped_leading_spaces,
>                patch_strip1,
>                patch_add_new_dir,
> +              patch_remove_empty_dir,
>                patch_reject,
>                patch_keywords,
>                patch_with_fuzz,
> Index: subversion/libsvn_client/patch.c
> ===================================================================
> --- subversion/libsvn_client/patch.c	(revision 912338)
> +++ subversion/libsvn_client/patch.c	(arbetskopia)
> @@ -34,6 +34,7 @@
>  #include "svn_path.h"
>  #include "svn_pools.h"
>  #include "svn_props.h"
> +#include "svn_sorts.h"
>  #include "svn_subst.h"
>  #include "svn_wc.h"
>  #include "client.h"
> @@ -1159,6 +1160,305 @@
>    return SVN_NO_ERROR;
>  }
>  
> +/* Helper for collect_deleted_targets(). Checks if PARENT_DIR_ABSPATH is
> + * EMPTY when treating the targets in TARGETS_TO_BE_DELETED as already
> + * deleted. */
> +static svn_error_t *
> +is_dir_empty(svn_boolean_t *empty, const char *parent_dir_abspath,
> +             svn_wc_context_t *wc_ctx, 
> +             apr_array_header_t *targets_to_be_deleted,
> +             apr_pool_t *result_pool, apr_pool_t *scratch_pool)
> +{
> +  apr_dir_t *dir;
> +  apr_finfo_t this_entry;
> +  apr_int32_t flags = APR_FINFO_NAME;
> +  apr_pool_t *iterpool;
> +  svn_error_t *err;
> +  apr_array_header_t *targets_found;
> +  const apr_array_header_t *missing_targets;
> +  const char *abspath;
> +  int i;
> +
> +  SVN_DBG(("In is_dir_empty()\n"));
> +  targets_found = apr_array_make(scratch_pool, 0, sizeof(const char *));
> +
> +  SVN_ERR(svn_io_dir_open(&dir, parent_dir_abspath, scratch_pool));
> +
> +  iterpool = svn_pool_create(scratch_pool);
> +
> +  /* First we collect the entries on-disk...*/
> +  while (1)
> +    {
> +      svn_pool_clear(iterpool);
> +      err = svn_io_dir_read(&this_entry, flags, dir, iterpool);

Hmmm... have you seen svn_io_get_dir_filenames()?

> +
> +      if (err)
> +        {
> +        /* Have we read all entries of the dir? */
> +        if (APR_STATUS_IS_ENOENT(err->apr_err))
> +          {
> +            apr_status_t apr_err;
> +
> +            SVN_DBG(("We have read all entries\n"));
> +            svn_error_clear(err);
> +            apr_err = apr_dir_close(dir);
> +            if (apr_err)
> +              return svn_error_wrap_apr(apr_err,
> +                                        _("Can't close directory '%s'"),
> +                                        svn_dirent_local_style(parent_dir_abspath,
> +                                                               result_pool));
> +            break;
> +          }
> +        }
> +      else
> +        {
> +          SVN_ERR(err);
> +        }
> +
> +      /* Skip entries for this dir and its parent. */
> +      if (this_entry.name[0] == '.'
> +          && (this_entry.name[1] == '\0'
> +              || (this_entry.name[1] == '.' && this_entry.name[2] == '\0')))
> +        continue;
> +
> +      /* Skip over SVN admin directories. */
> +      if (svn_wc_is_adm_dir(this_entry.name, iterpool))
> +        continue;
> +
> +      /* Construct the full path of the entry. */
> +      abspath = svn_dirent_join(parent_dir_abspath, this_entry.name, scratch_pool);
> +      SVN_DBG(("Pushing '%s' to targets_found\n", this_entry.name));
> +
> +      APR_ARRAY_PUSH(targets_found, const char *) = abspath;
> +    }
> +
> +  svn_pool_destroy(iterpool);
> +
> +  /* .. Then we collect the nodes that the db knows of. We do that since a
> +   * missing node will not be found by simply checking the on-disk. We need
> +   * to compare what the db knows with what's actually there. If a node is
> +   * missing we add it to our targets_found. */

We might want to check the DB first. The DB handle is already open,
so may be cheaper to run the DB queries some of which can be answered
without hitting the disk.
And if the DB prevents us from deleting the directory, we don't bother
checking the disk ourselves. Note that some DB query functions might
do disk i/o, I'm not sure which and under what circumstances
though.

> +  SVN_ERR(svn_wc__node_get_children(&missing_targets, wc_ctx, 
> +                                    parent_dir_abspath, FALSE, result_pool,
> +                                    scratch_pool));
> +
> +  /* Check for missing targets. */
> +  for (i = 0; i < missing_targets->nelts; i++)
> +    {
> +      const char *target;
> +      svn_node_kind_t kind;
> +
> +      target = APR_ARRAY_IDX(missing_targets, i, const char *);
> +
> +      SVN_ERR(svn_io_check_path(target, &kind, scratch_pool));

Using svn_io_* is not using the DB, really.
What we want here is something like svn_wc__node_is_status_present() or
maybe svn_wc__node_is_status_absent(). Not sure. The point is getting
info from the DB, so we'll eventually be calling svn_wc__db_read_info().
The DB can tell us if a node is known as missing, obstructed, or
locally deleted. Maybe using just some svn_wc__node_* functions tells
us all we need to know, even?

> +      SVN_DBG(("Testing if '%s' is missing\n", target));
> +
> +      if (kind == svn_node_none)
> +        {
> +          SVN_DBG(("Found missing node '%s'\n", target));
> +          APR_ARRAY_PUSH(targets_found, const char *) = target;
> +        }
> +    }
> +
> +  *empty = TRUE;
> +
> +  /* Finally, we check to see if we're going to delete all entries in the
> +   * dir. */
> +  for (i = 0; i < targets_found->nelts; i++)
> +    {

Can we integrate the strcmp() loop into the loop that iterates over the dirents,
skipping dirents which don't match a deletion target?
Then we could simply compare the number of elements of the targets found
and targets deleted arrays here, instead of looping again.

> +      int j;
> +      const char *found = APR_ARRAY_IDX(targets_found, i, const char *);
> +      svn_boolean_t deleted = FALSE;
> +
> +      SVN_DBG(("Checking if '%s' will be deleted\n", found));
> +
> +      for (j = 0; j < targets_to_be_deleted->nelts; j++)
> +        {
> +          patch_target_t *to_be_del;
> +          to_be_del = APR_ARRAY_IDX(targets_to_be_deleted, j, patch_target_t *);
> +          SVN_DBG(("Picking '%s' from to_be_del\n", to_be_del->abs_path));
> +          if (! strcmp(found, to_be_del->abs_path))
> +            deleted = TRUE;
> +        }
> +      if (! deleted)
> +        {
> +          SVN_DBG(("'%s' will not be deleted\n", found));
> +          *empty = FALSE;
> +          break;
> +        }
> +    }
> +  return SVN_NO_ERROR;
> +}
> +
> +/* Helper for collect_deleted_targets(). Compare A and B and return an
> + * integer greater than, equal to, or less than 0, according to whether A
> + * has more subdirs, just as many or less subdir than B. */
> +static int
> +sort_compare_paths_by_depth(const svn_sort__item_t *a, 
> +                            const svn_sort__item_t *b)
> +{
> +  const char *astr, *bstr;
> +  int n_a, n_b, i;
> +  
> +  astr = a->key;
> +  bstr = b->key;
> +
> +  for (i = 0; i < a->klen; i++)
> +    if (astr[i] == '/')
> +      n_a++;
> +
> +  for (i = 0; i < b->klen; i++)
> +    if (bstr[i] == '/')
> +      n_b++;
> +
> +  if (n_a > n_b)
> +    return 1;
> +  else if (n_a == n_b)
> +    return 0;
> +  else
> +    return -1;
> +}
> +
> +/* Determine which TARGETS are to be deleted. Return those targets in
> + * DELETE_ALL_OF_THIS, allocated in RESULT_POOL. If all targets in a
> + * directory is to be deleted, the dir itself is set to be deleted. The
> + * remaining targets will be returned in TARGETS, allocated in RESULT_POOL.
> + * If no target will be deleted, set DELETE_ALL_OF_THIS to NULL and return
> + * the original TARGETS. Do temporary allocations in SCRATCH_POOL. */
> +static svn_error_t *
> +collect_deleted_targets(apr_array_header_t **delete_all_of_this, 

:)
Maybe s/delete_all_of_this/condensed_deletion_target_list/ or something?

> +                         apr_array_header_t **targets, 
> +                         svn_wc_context_t *wc_ctx, apr_pool_t *result_pool, 
> +                         apr_pool_t *scratch_pool)
> +{
> +  int i;
> +  svn_boolean_t deleted_target_exists = FALSE;
> +  apr_array_header_t *new_targets;
> +  apr_array_header_t *deleted_targets;
> +  apr_array_header_t *sorted_keys;
> +  apr_hash_t *targets_to_be_deleted;
> +  apr_hash_index_t *hi;
> +
> +  for (i = 0; i < (*targets)->nelts; i++)
> +    {
> +      patch_target_t *target = APR_ARRAY_IDX(*targets, i, patch_target_t *);
> +
> +      if (target->deleted)
> +        {
> +          SVN_DBG(("deleted targets found\n"));
> +          deleted_target_exists = TRUE;
> +        }
> +    }
> +
> +  /* Just return if we have no deleted targets. */
> +  if (! deleted_target_exists)
> +    {
> +      *delete_all_of_this = NULL;
> +      SVN_DBG(("Returning since no deleted targets found\n"));
> +      return SVN_NO_ERROR;
> +    }
> +
> +  /* We have deleted targets, time to do some allocations. */
> +  new_targets = apr_array_make(result_pool, 0, sizeof(patch_target_t *));
> +  deleted_targets = apr_array_make(result_pool, 0, sizeof(patch_target_t *));
> +
> +  targets_to_be_deleted = apr_hash_make(result_pool);
> +
> +  for (i = 0; i < (*targets)->nelts; i++)
> +    {
> +      patch_target_t *target = APR_ARRAY_IDX(*targets, i, patch_target_t *);
> +
> +      if (! target->deleted)
> +        {
> +          APR_ARRAY_PUSH(new_targets, patch_target_t *) = target;
> +          SVN_DBG(("Adding non deleted target to new_target\n"));
> +        }
> +      else
> +        {
> +          apr_array_header_t * deleted_targets_in_dir;
> +          /* We're using the abs_path of the target. The abs_path is not
> +           * present if the path is a symlink pointing outside the wc but we
> +           * know, that's not the case. */

The comma after "know" can be removed.

> +          const char *dirname = svn_dirent_dirname(target->abs_path, 
> +                                                   result_pool);
> +
> +          SVN_DBG(("Found a target to be deleted\n"));
> +          deleted_targets_in_dir = apr_hash_get(targets_to_be_deleted, 
> +                                      dirname, APR_HASH_KEY_STRING);
> +
> +          if (deleted_targets_in_dir)
> +            {
> +              APR_ARRAY_PUSH(deleted_targets_in_dir, patch_target_t *) = target;
> +
> +              SVN_DBG(("Adding another deleted target to '%s'\n", dirname));
> +              apr_hash_set(targets_to_be_deleted, dirname,
> +                           APR_HASH_KEY_STRING, deleted_targets_in_dir);
> +            }
> +          else
> +            {
> +              apr_array_header_t *new_array;
> +
> +              SVN_DBG(("Adding first deleted target to '%s'\n", dirname));
> +              new_array = apr_array_make(result_pool, 0, sizeof(patch_target_t *));
> +              APR_ARRAY_PUSH(new_array, patch_target_t *) = target;
> +              apr_hash_set(targets_to_be_deleted, 
> +                           dirname, APR_HASH_KEY_STRING, new_array);
> +            }
> +        }
> +    }
> +
> +  /* We sort the keys here for allowing the paths with the greatest depth to
> +   * be processed first. */
> +  sorted_keys = svn_sort__hash(targets_to_be_deleted,
> +                               sort_compare_paths_by_depth,
> +                               scratch_pool);
> +
> +  /* The keys are sorted in ascending order, but we want them in descending
> +   * order. */

You could also flip the logic of your compare function to sort entries
in descending order.

> +  for (i = sorted_keys->nelts -1; i >= 0; i--)
> +    {
> +      svn_sort__item_t *item;
> +      svn_boolean_t empty;
> +      char *key;
> +      apr_array_header_t *targets_array;
> +      int j;
> +
> +      item = &APR_ARRAY_IDX(sorted_keys, i, svn_sort__item_t);
> +      key = (char *) item->key;
> +      targets_array = (apr_array_header_t *) item->value;
> +      SVN_DBG(("n keys = %d\n", sorted_keys->nelts));
> +      SVN_DBG(("A sorted key '%s', i = %d\n", key, i));
> +
> +      SVN_ERR(is_dir_empty(&empty, key, wc_ctx, targets_array, result_pool,
> +                           scratch_pool));
> +      SVN_DBG(("After is_dir_empty(), empty = %d\n", empty));
> +      if (empty)
> +        {
> +          patch_target_t *target = apr_palloc(result_pool, sizeof(*target));
> +          target->deleted = TRUE;
> +          target->abs_path = key;
> +          APR_ARRAY_PUSH(deleted_targets, patch_target_t *) = target;
> +
> +        /* create parent_dir target

Indentation is off above.

> +           add parent_dir to deleted_targets */
> +        }
> +      else

Please put braces around multi-line else clauses (even if they're
only a single statement).

> +        /* No empty dir, just add the targets to be deleted */
> +        for (j = 0; j < targets_array->nelts; j++)
> +          {
> +            patch_target_t *target = APR_ARRAY_IDX(targets_array, j, 
> +                                                   patch_target_t *);
> +            APR_ARRAY_PUSH(deleted_targets, patch_target_t *) = target;
> +          }
> +    }
> +  SVN_DBG(("Finished in collect..()\n"));
> +  *targets = new_targets;
> +  *delete_all_of_this = deleted_targets;
> +
> +
> +  return SVN_NO_ERROR;
> +}
> +
>  /* Install a patched TARGET into the working copy at ABS_WC_PATH.
>   * Use client context CTX to retrieve WC_CTX, and possibly doing
>   * notifications. If DRY_RUN is TRUE, don't modify the working copy.
> @@ -1371,6 +1671,7 @@
>    const char *patch_eol_str;
>    apr_file_t *patch_file;
>    apr_array_header_t *targets;
> +  apr_array_header_t *delete_all_of_this;
>    int i;
>    apply_patches_baton_t *btn;
>    
> @@ -1413,6 +1714,10 @@
>      }
>    while (patch);
>  
> +  SVN_ERR(collect_deleted_targets(&delete_all_of_this, &targets, 
> +                                  btn->ctx->wc_ctx, scratch_pool, 
> +                                  result_pool));
> +
>    /* Install patched targets into the working copy. */
>    for (i = 0; i < targets->nelts; i++)
>      {

Where is delete_all_of_this used?

Stefan

Re: [WIP] 'svn patch' should remove empty dirs

Posted by Stefan Sperling <st...@elego.de>.
On Sun, Feb 21, 2010 at 11:02:27PM +0100, Daniel Näslund wrote:
> Hi Stefan!
> 
> This patch is growing and growing and I'm worried that I'm heading off
> in the wrong direction. Perhaps you can skim through the patch and tell
> me if this is the right approach. You've said so earlier but all these
> lines of code makes me nervous. I like small fixes, and this is starting
> to look bloated.

I gave it a quick review. I don't think you're off track. See below.

> 
> A preliminary log msg (it's a WIP)
> [[[
> Make 'svn patch' able to remove empty.
> 
> * subversion/tests/cmdline/patch_tests.py
>   (patch_remove_empty_dir): New.
> 
> * subversion/libsvn_client/patch.c
>   (is_dir_empty): Checks if a dir is empty, taking missing nodes into
>     account.
>   (sort_compare_paths_by_depth): Compare func to use with
>     svn_sort__hash(). Compares nr of '/' separators. Since the paths are
>     canonicalized, it should work just fine.
>   (collect_deleted_targets): If there is targets to be deleted, collect
>     those targets and the targets no involving deletes in two arrays.
>     TODO: We should condense the list of deleted dirs and be able to
>     only delete the top-most parent.
>   (apply_patches): TODO: We should remove the targets in
>     delete_all_of_this. Or perhaps concatenate the two list into one?
> ]]]
> 
> Just a quick.
> [ ] Yes, using APR forces you to a lot of boilerplate. 
> [ ] Hey there, you don't need to do all that, just ...
> 
> Hope I'm not asking to much,
> Daniel

> Index: subversion/tests/cmdline/patch_tests.py
> ===================================================================
> --- subversion/tests/cmdline/patch_tests.py	(revision 912338)
> +++ subversion/tests/cmdline/patch_tests.py	(arbetskopia)
> @@ -911,7 +911,63 @@
>                                         None, # expected err
>                                         1, # check-props
>                                         1) # dry-run
> +def patch_remove_empty_dir(sbox):
> +  "delete the dir if all children is deleted"
>  
> +  sbox.build()
> +  wc_dir = sbox.wc_dir
> +  
> +  patch_file_path = tempfile.mkstemp(dir=os.path.abspath(svntest.main.temp_dir))[1]
> +
> +  unidiff_patch = [
> +    "Index: psi\n",
> +    "===================================================================\n",
> +    "--- A/D/H/psi\t(revision 0)\n",
> +    "+++ A/D/H/psi\t(revision 0)\n",
> +    "@@ -1 +0,0 @@\n",
> +    "-This is the file 'psi'.\n",
> +    "Index: omega\n",
> +    "===================================================================\n",
> +    "--- A/D/H/omega\t(revision 0)\n",
> +    "+++ A/D/H/omega\t(revision 0)\n",
> +    "@@ -1 +0,0 @@\n",
> +    "-This is the file 'omega'.\n",
> +  ]
> +
> +  svntest.main.file_write(patch_file_path, ''.join(unidiff_patch))
> +
> +  # We should be able to handle one path beeing missing.
> +  os.remove(os.path.join(wc_dir, 'A', 'D', 'H', 'chi'))
> +
> +
> +  expected_output = [
> +    'D         %s\n' % os.path.join(wc_dir, 'A', 'D', 'H', 'chi'),
> +    'D         %s\n' % os.path.join(wc_dir, 'A', 'D', 'H', 'psi'),
> +    'D         %s\n' % os.path.join(wc_dir, 'A', 'D', 'H', 'omega'),
> +  ]

I think we want a single 'D' notification for the entire directory.
Notification for every single child is redundant.

> +
> +  expected_disk = svntest.main.greek_state.copy()
> +  expected_disk.remove('A/D/H/chi')
> +  expected_disk.remove('A/D/H/psi')
> +  expected_disk.remove('A/D/H/omega')
> +  expected_disk.remove('A/D/H')
> +
> +  expected_status = svntest.actions.get_virginal_state(wc_dir, 1)
> +  expected_status.add({'A/D/H' : Item(status='D ', wc_rev=1)})
> +
> +  expected_skip = wc.State('', { })
> +
> +  svntest.actions.run_and_verify_patch(wc_dir, 
> +                                       os.path.abspath(patch_file_path),
> +                                       expected_output,
> +                                       expected_disk,
> +                                       expected_status,
> +                                       expected_skip,
> +                                       None, # expected err
> +                                       1, # check-props
> +                                       1) # dry-run
> +
> +
>  def patch_reject(sbox):
>    "apply a patch which is rejected"
>  
> @@ -1363,6 +1419,7 @@
>                patch_chopped_leading_spaces,
>                patch_strip1,
>                patch_add_new_dir,
> +              patch_remove_empty_dir,
>                patch_reject,
>                patch_keywords,
>                patch_with_fuzz,
> Index: subversion/libsvn_client/patch.c
> ===================================================================
> --- subversion/libsvn_client/patch.c	(revision 912338)
> +++ subversion/libsvn_client/patch.c	(arbetskopia)
> @@ -34,6 +34,7 @@
>  #include "svn_path.h"
>  #include "svn_pools.h"
>  #include "svn_props.h"
> +#include "svn_sorts.h"
>  #include "svn_subst.h"
>  #include "svn_wc.h"
>  #include "client.h"
> @@ -1159,6 +1160,305 @@
>    return SVN_NO_ERROR;
>  }
>  
> +/* Helper for collect_deleted_targets(). Checks if PARENT_DIR_ABSPATH is
> + * EMPTY when treating the targets in TARGETS_TO_BE_DELETED as already
> + * deleted. */
> +static svn_error_t *
> +is_dir_empty(svn_boolean_t *empty, const char *parent_dir_abspath,
> +             svn_wc_context_t *wc_ctx, 
> +             apr_array_header_t *targets_to_be_deleted,
> +             apr_pool_t *result_pool, apr_pool_t *scratch_pool)
> +{
> +  apr_dir_t *dir;
> +  apr_finfo_t this_entry;
> +  apr_int32_t flags = APR_FINFO_NAME;
> +  apr_pool_t *iterpool;
> +  svn_error_t *err;
> +  apr_array_header_t *targets_found;
> +  const apr_array_header_t *missing_targets;
> +  const char *abspath;
> +  int i;
> +
> +  SVN_DBG(("In is_dir_empty()\n"));
> +  targets_found = apr_array_make(scratch_pool, 0, sizeof(const char *));
> +
> +  SVN_ERR(svn_io_dir_open(&dir, parent_dir_abspath, scratch_pool));
> +
> +  iterpool = svn_pool_create(scratch_pool);
> +
> +  /* First we collect the entries on-disk...*/
> +  while (1)
> +    {
> +      svn_pool_clear(iterpool);
> +      err = svn_io_dir_read(&this_entry, flags, dir, iterpool);

Hmmm... have you seen svn_io_get_dir_filenames()?

> +
> +      if (err)
> +        {
> +        /* Have we read all entries of the dir? */
> +        if (APR_STATUS_IS_ENOENT(err->apr_err))
> +          {
> +            apr_status_t apr_err;
> +
> +            SVN_DBG(("We have read all entries\n"));
> +            svn_error_clear(err);
> +            apr_err = apr_dir_close(dir);
> +            if (apr_err)
> +              return svn_error_wrap_apr(apr_err,
> +                                        _("Can't close directory '%s'"),
> +                                        svn_dirent_local_style(parent_dir_abspath,
> +                                                               result_pool));
> +            break;
> +          }
> +        }
> +      else
> +        {
> +          SVN_ERR(err);
> +        }
> +
> +      /* Skip entries for this dir and its parent. */
> +      if (this_entry.name[0] == '.'
> +          && (this_entry.name[1] == '\0'
> +              || (this_entry.name[1] == '.' && this_entry.name[2] == '\0')))
> +        continue;
> +
> +      /* Skip over SVN admin directories. */
> +      if (svn_wc_is_adm_dir(this_entry.name, iterpool))
> +        continue;
> +
> +      /* Construct the full path of the entry. */
> +      abspath = svn_dirent_join(parent_dir_abspath, this_entry.name, scratch_pool);
> +      SVN_DBG(("Pushing '%s' to targets_found\n", this_entry.name));
> +
> +      APR_ARRAY_PUSH(targets_found, const char *) = abspath;
> +    }
> +
> +  svn_pool_destroy(iterpool);
> +
> +  /* .. Then we collect the nodes that the db knows of. We do that since a
> +   * missing node will not be found by simply checking the on-disk. We need
> +   * to compare what the db knows with what's actually there. If a node is
> +   * missing we add it to our targets_found. */

We might want to check the DB first. The DB handle is already open,
so may be cheaper to run the DB queries some of which can be answered
without hitting the disk.
And if the DB prevents us from deleting the directory, we don't bother
checking the disk ourselves. Note that some DB query functions might
do disk i/o, I'm not sure which and under what circumstances
though.

> +  SVN_ERR(svn_wc__node_get_children(&missing_targets, wc_ctx, 
> +                                    parent_dir_abspath, FALSE, result_pool,
> +                                    scratch_pool));
> +
> +  /* Check for missing targets. */
> +  for (i = 0; i < missing_targets->nelts; i++)
> +    {
> +      const char *target;
> +      svn_node_kind_t kind;
> +
> +      target = APR_ARRAY_IDX(missing_targets, i, const char *);
> +
> +      SVN_ERR(svn_io_check_path(target, &kind, scratch_pool));

Using svn_io_* is not using the DB, really.
What we want here is something like svn_wc__node_is_status_present() or
maybe svn_wc__node_is_status_absent(). Not sure. The point is getting
info from the DB, so we'll eventually be calling svn_wc__db_read_info().
The DB can tell us if a node is known as missing, obstructed, or
locally deleted. Maybe using just some svn_wc__node_* functions tells
us all we need to know, even?

> +      SVN_DBG(("Testing if '%s' is missing\n", target));
> +
> +      if (kind == svn_node_none)
> +        {
> +          SVN_DBG(("Found missing node '%s'\n", target));
> +          APR_ARRAY_PUSH(targets_found, const char *) = target;
> +        }
> +    }
> +
> +  *empty = TRUE;
> +
> +  /* Finally, we check to see if we're going to delete all entries in the
> +   * dir. */
> +  for (i = 0; i < targets_found->nelts; i++)
> +    {

Can we integrate the strcmp() loop into the loop that iterates over the dirents,
skipping dirents which don't match a deletion target?
Then we could simply compare the number of elements of the targets found
and targets deleted arrays here, instead of looping again.

> +      int j;
> +      const char *found = APR_ARRAY_IDX(targets_found, i, const char *);
> +      svn_boolean_t deleted = FALSE;
> +
> +      SVN_DBG(("Checking if '%s' will be deleted\n", found));
> +
> +      for (j = 0; j < targets_to_be_deleted->nelts; j++)
> +        {
> +          patch_target_t *to_be_del;
> +          to_be_del = APR_ARRAY_IDX(targets_to_be_deleted, j, patch_target_t *);
> +          SVN_DBG(("Picking '%s' from to_be_del\n", to_be_del->abs_path));
> +          if (! strcmp(found, to_be_del->abs_path))
> +            deleted = TRUE;
> +        }
> +      if (! deleted)
> +        {
> +          SVN_DBG(("'%s' will not be deleted\n", found));
> +          *empty = FALSE;
> +          break;
> +        }
> +    }
> +  return SVN_NO_ERROR;
> +}
> +
> +/* Helper for collect_deleted_targets(). Compare A and B and return an
> + * integer greater than, equal to, or less than 0, according to whether A
> + * has more subdirs, just as many or less subdir than B. */
> +static int
> +sort_compare_paths_by_depth(const svn_sort__item_t *a, 
> +                            const svn_sort__item_t *b)
> +{
> +  const char *astr, *bstr;
> +  int n_a, n_b, i;
> +  
> +  astr = a->key;
> +  bstr = b->key;
> +
> +  for (i = 0; i < a->klen; i++)
> +    if (astr[i] == '/')
> +      n_a++;
> +
> +  for (i = 0; i < b->klen; i++)
> +    if (bstr[i] == '/')
> +      n_b++;
> +
> +  if (n_a > n_b)
> +    return 1;
> +  else if (n_a == n_b)
> +    return 0;
> +  else
> +    return -1;
> +}
> +
> +/* Determine which TARGETS are to be deleted. Return those targets in
> + * DELETE_ALL_OF_THIS, allocated in RESULT_POOL. If all targets in a
> + * directory is to be deleted, the dir itself is set to be deleted. The
> + * remaining targets will be returned in TARGETS, allocated in RESULT_POOL.
> + * If no target will be deleted, set DELETE_ALL_OF_THIS to NULL and return
> + * the original TARGETS. Do temporary allocations in SCRATCH_POOL. */
> +static svn_error_t *
> +collect_deleted_targets(apr_array_header_t **delete_all_of_this, 

:)
Maybe s/delete_all_of_this/condensed_deletion_target_list/ or something?

> +                         apr_array_header_t **targets, 
> +                         svn_wc_context_t *wc_ctx, apr_pool_t *result_pool, 
> +                         apr_pool_t *scratch_pool)
> +{
> +  int i;
> +  svn_boolean_t deleted_target_exists = FALSE;
> +  apr_array_header_t *new_targets;
> +  apr_array_header_t *deleted_targets;
> +  apr_array_header_t *sorted_keys;
> +  apr_hash_t *targets_to_be_deleted;
> +  apr_hash_index_t *hi;
> +
> +  for (i = 0; i < (*targets)->nelts; i++)
> +    {
> +      patch_target_t *target = APR_ARRAY_IDX(*targets, i, patch_target_t *);
> +
> +      if (target->deleted)
> +        {
> +          SVN_DBG(("deleted targets found\n"));
> +          deleted_target_exists = TRUE;
> +        }
> +    }
> +
> +  /* Just return if we have no deleted targets. */
> +  if (! deleted_target_exists)
> +    {
> +      *delete_all_of_this = NULL;
> +      SVN_DBG(("Returning since no deleted targets found\n"));
> +      return SVN_NO_ERROR;
> +    }
> +
> +  /* We have deleted targets, time to do some allocations. */
> +  new_targets = apr_array_make(result_pool, 0, sizeof(patch_target_t *));
> +  deleted_targets = apr_array_make(result_pool, 0, sizeof(patch_target_t *));
> +
> +  targets_to_be_deleted = apr_hash_make(result_pool);
> +
> +  for (i = 0; i < (*targets)->nelts; i++)
> +    {
> +      patch_target_t *target = APR_ARRAY_IDX(*targets, i, patch_target_t *);
> +
> +      if (! target->deleted)
> +        {
> +          APR_ARRAY_PUSH(new_targets, patch_target_t *) = target;
> +          SVN_DBG(("Adding non deleted target to new_target\n"));
> +        }
> +      else
> +        {
> +          apr_array_header_t * deleted_targets_in_dir;
> +          /* We're using the abs_path of the target. The abs_path is not
> +           * present if the path is a symlink pointing outside the wc but we
> +           * know, that's not the case. */

The comma after "know" can be removed.

> +          const char *dirname = svn_dirent_dirname(target->abs_path, 
> +                                                   result_pool);
> +
> +          SVN_DBG(("Found a target to be deleted\n"));
> +          deleted_targets_in_dir = apr_hash_get(targets_to_be_deleted, 
> +                                      dirname, APR_HASH_KEY_STRING);
> +
> +          if (deleted_targets_in_dir)
> +            {
> +              APR_ARRAY_PUSH(deleted_targets_in_dir, patch_target_t *) = target;
> +
> +              SVN_DBG(("Adding another deleted target to '%s'\n", dirname));
> +              apr_hash_set(targets_to_be_deleted, dirname,
> +                           APR_HASH_KEY_STRING, deleted_targets_in_dir);
> +            }
> +          else
> +            {
> +              apr_array_header_t *new_array;
> +
> +              SVN_DBG(("Adding first deleted target to '%s'\n", dirname));
> +              new_array = apr_array_make(result_pool, 0, sizeof(patch_target_t *));
> +              APR_ARRAY_PUSH(new_array, patch_target_t *) = target;
> +              apr_hash_set(targets_to_be_deleted, 
> +                           dirname, APR_HASH_KEY_STRING, new_array);
> +            }
> +        }
> +    }
> +
> +  /* We sort the keys here for allowing the paths with the greatest depth to
> +   * be processed first. */
> +  sorted_keys = svn_sort__hash(targets_to_be_deleted,
> +                               sort_compare_paths_by_depth,
> +                               scratch_pool);
> +
> +  /* The keys are sorted in ascending order, but we want them in descending
> +   * order. */

You could also flip the logic of your compare function to sort entries
in descending order.

> +  for (i = sorted_keys->nelts -1; i >= 0; i--)
> +    {
> +      svn_sort__item_t *item;
> +      svn_boolean_t empty;
> +      char *key;
> +      apr_array_header_t *targets_array;
> +      int j;
> +
> +      item = &APR_ARRAY_IDX(sorted_keys, i, svn_sort__item_t);
> +      key = (char *) item->key;
> +      targets_array = (apr_array_header_t *) item->value;
> +      SVN_DBG(("n keys = %d\n", sorted_keys->nelts));
> +      SVN_DBG(("A sorted key '%s', i = %d\n", key, i));
> +
> +      SVN_ERR(is_dir_empty(&empty, key, wc_ctx, targets_array, result_pool,
> +                           scratch_pool));
> +      SVN_DBG(("After is_dir_empty(), empty = %d\n", empty));
> +      if (empty)
> +        {
> +          patch_target_t *target = apr_palloc(result_pool, sizeof(*target));
> +          target->deleted = TRUE;
> +          target->abs_path = key;
> +          APR_ARRAY_PUSH(deleted_targets, patch_target_t *) = target;
> +
> +        /* create parent_dir target

Indentation is off above.

> +           add parent_dir to deleted_targets */
> +        }
> +      else

Please put braces around multi-line else clauses (even if they're
only a single statement).

> +        /* No empty dir, just add the targets to be deleted */
> +        for (j = 0; j < targets_array->nelts; j++)
> +          {
> +            patch_target_t *target = APR_ARRAY_IDX(targets_array, j, 
> +                                                   patch_target_t *);
> +            APR_ARRAY_PUSH(deleted_targets, patch_target_t *) = target;
> +          }
> +    }
> +  SVN_DBG(("Finished in collect..()\n"));
> +  *targets = new_targets;
> +  *delete_all_of_this = deleted_targets;
> +
> +
> +  return SVN_NO_ERROR;
> +}
> +
>  /* Install a patched TARGET into the working copy at ABS_WC_PATH.
>   * Use client context CTX to retrieve WC_CTX, and possibly doing
>   * notifications. If DRY_RUN is TRUE, don't modify the working copy.
> @@ -1371,6 +1671,7 @@
>    const char *patch_eol_str;
>    apr_file_t *patch_file;
>    apr_array_header_t *targets;
> +  apr_array_header_t *delete_all_of_this;
>    int i;
>    apply_patches_baton_t *btn;
>    
> @@ -1413,6 +1714,10 @@
>      }
>    while (patch);
>  
> +  SVN_ERR(collect_deleted_targets(&delete_all_of_this, &targets, 
> +                                  btn->ctx->wc_ctx, scratch_pool, 
> +                                  result_pool));
> +
>    /* Install patched targets into the working copy. */
>    for (i = 0; i < targets->nelts; i++)
>      {

Where is delete_all_of_this used?

Stefan