You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@apr.apache.org by Ben Collins-Sussman <su...@collab.net> on 2001/05/22 15:44:19 UTC

[PATCH] apr_dir_remove_recursively

OK, it's not actually a patch, because I'm not sure where it ought to
live.  I've attached it below.

The function apr_dir_remove() won't work on a non-empty directory, so
this routine fills the gap.  It's equivalent to 'rm -rf'.

This routine currently lives in a Subversion library.  I'd like to
move it into apr_file_io.h, but I'm not sure where the actual code
should live.  It's implemented using nothing but other apr file
routines;  does this mean duplicating the code into file_io/unix,
file_io/win32, file_io/os2?

---------------

apr_status_t
apr_dir_remove_recursively (const char *path, apr_pool_t *pool)
{
  apr_status_t status;
  apr_dir_t *this_dir;
  apr_finfo_t this_entry;
  apr_pool_t *subpool;
  apr_int32_t flags = APR_FINFO_TYPE | APR_FINFO_NAME;

  status = apr_pool_create (&subpool, pool);
  if (! (APR_STATUS_IS_SUCCESS (status))) return status;

  status = apr_dir_open (&this_dir, path, subpool);
  if (! (APR_STATUS_IS_SUCCESS (status))) return status;

  for (status = apr_dir_read (&this_entry, flags, this_dir);
       APR_STATUS_IS_SUCCESS (status);
       status = apr_dir_read (&this_entry, flags, this_dir))
    {
      char *fullpath = apr_pstrcat (subpool, path, "/", this_entry.name, NULL);

      if (this_entry.filetype == APR_DIR)
        {
          if ((strcmp (this_entry.name, ".") == 0)
              || (strcmp (this_entry.name, "..") == 0))
            continue;

          status = apr_dir_remove_recursively (fullpath, subpool);
          if (! (APR_STATUS_IS_SUCCESS (status))) return status;
        }
      else if (this_entry.filetype == APR_REG)
        {
          status = apr_file_remove (fullpath, subpool);
          if (! (APR_STATUS_IS_SUCCESS (status))) return status;
        }
    }

  if (! (APR_STATUS_IS_ENOENT (status)))
    return status;

  else
    {
      status = apr_dir_close (this_dir);
      if (! (APR_STATUS_IS_SUCCESS (status))) return status;
    }

  status = apr_dir_remove (path, subpool);
  if (! (APR_STATUS_IS_SUCCESS (status))) return status;

  apr_pool_destroy (subpool);

  return APR_SUCCESS;
}

Re: [PATCH] apr_dir_remove_recursively

Posted by Greg Marr <gr...@alum.wpi.edu>.

At 06:21 PM 05/23/2001, Greg Stein wrote:
>On Wed, May 23, 2001 at 09:17:57AM -0500, Ben Collins-Sussman wrote:
> > "William A. Rowe, Jr." <ad...@rowe-clan.net> writes:
> > > Does it make sense to apply some lstat check against the tree, 
> such that
> > > symlinks' targets aren't blown away?  I'm not clear if your patch
> > > protects that or not.
> >
> > Good idea.
>
>remove() on a symlink does *not* toss the target. It only removes 
>the symlink.

If there are symlinks to directories, a recursive delete that doesn't 
check for symlinks could end up walking out of the tree being 
deleted.

>Think about it: remove tosses the directory entry. The symlink handles
>interaction with the file. We aren't touching the file, so we don't 
>get hit
>by the symlink.
>
>So... don't do an lstat. That is just wasteful and unneeded.

If this were a normal delete, that would be the case.  Since this is 
a recursive delete, it needs to be handled.

-- 
Greg Marr
gregm@alum.wpi.edu
"We thought you were dead."
"I was, but I'm better now." - Sheridan, "The Summoning"

Re: [PATCH] apr_dir_remove_recursively

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.

From: "Greg Stein" <gs...@lyra.org>
Sent: Wednesday, May 23, 2001 6:25 PM

> On Wed, May 23, 2001 at 05:46:07PM -0500, William A. Rowe, Jr. wrote:
> > From: "Greg Stein" <gs...@lyra.org>
> > Sent: Wednesday, May 23, 2001 5:21 PM
> > 
> >...
> > > remove() on a symlink does *not* toss the target. It only removes the
> > > symlink.
> > 
> > BUT... this is a recursive call, he walks to the end of the chain, and starts
> > walking back up the tree.  That means all the files IN the symlinked directory
> > would be cleared out, and the symlink removed, and the original parent directory
> > [now empty] would persist.
> 
> Ah, right. As Greg Marr pointed out, I'm quite wrong in this regard. I
> forgot about the recursion down into directories.
> 
> I would suggest doing the lstat() IFF a directory is found. e.g. don't
> bother with it on files.
> >...
> > > So... don't do an lstat. That is just wasteful and unneeded.
> > 
> > No lstat, we should be able to get that information in the apr_dir_read call.
> 
> I'd say "no" on that. You only want to do the lstat() conditionally.

with a caviat (that we may have to work out the details of...) _if_ the usual
apr_dir_read call indicates (through the valid bits) that the APR_FINFO_TYPE bit 
is toggled provided, there is no reason to call lstat.

But if APR_FINFO_TYPE isn't set, we don't know if it's a directory, either.

Note this little bit of code in apr/file_io/unix/dir.c

    if (wanted)
    {
[snip]
        /* ??? Or lstat below?  What is it we really want? */
        ret = apr_stat(finfo, fspec, wanted, thedir->cntxt);
    }

Seems we want lstat, doesn't it?  

Which leads to an interesting question, do many platforms offer the d_type member
in the DIR struct?  It appears in FreeBSD 4.2, and would certainly optimize this
scenario significantly for UNIX.  Of course, we need to know that a link is going
to map to DT_LNK, and doesn't return some other file type.  I'd trust that's true.

The obvious question, who knows the spell to identify if DIR d.d_type is valid,
and that (at least) DT_DIR, DT_LNK, DT_REG, etc are defined as well?  Then we
can begin crafting this sucker to perform very efficiently on any platform that
returns the type as part of the readdir call.  By the way, apr_dir_read on win32 
_will_ identify a junction (symlink) as APR_LNK.

Bill

Re: [PATCH] apr_dir_remove_recursively

Posted by Greg Stein <gs...@lyra.org>.

On Wed, May 23, 2001 at 05:46:07PM -0500, William A. Rowe, Jr. wrote:
> From: "Greg Stein" <gs...@lyra.org>
> Sent: Wednesday, May 23, 2001 5:21 PM
> 
>...
> > remove() on a symlink does *not* toss the target. It only removes the
> > symlink.
> 
> BUT... this is a recursive call, he walks to the end of the chain, and starts
> walking back up the tree.  That means all the files IN the symlinked directory
> would be cleared out, and the symlink removed, and the original parent directory
> [now empty] would persist.

Ah, right. As Greg Marr pointed out, I'm quite wrong in this regard. I
forgot about the recursion down into directories.

I would suggest doing the lstat() IFF a directory is found. e.g. don't
bother with it on files.

>...
> > So... don't do an lstat. That is just wasteful and unneeded.
> 
> No lstat, we should be able to get that information in the apr_dir_read call.

I'd say "no" on that. You only want to do the lstat() conditionally.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: [PATCH] apr_dir_remove_recursively

Posted by Greg Stein <gs...@lyra.org>.

On Wed, May 23, 2001 at 09:17:57AM -0500, Ben Collins-Sussman wrote:
> "William A. Rowe, Jr." <ad...@rowe-clan.net> writes:
>...
> > Does it make sense to apply some lstat check against the tree, such that

Nope.

> > symlinks' targets aren't blown away?  I'm not clear if your patch protects 
> > that or not.
> 
> Good idea.

remove() on a symlink does *not* toss the target. It only removes the
symlink.

Think about it: remove tosses the directory entry. The symlink handles
interaction with the file. We aren't touching the file, so we don't get hit
by the symlink.

So... don't do an lstat. That is just wasteful and unneeded.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: [PATCH] apr_dir_remove_recursively

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.

From: "Ben Collins-Sussman" <su...@collab.net>
Sent: Wednesday, May 23, 2001 11:35 AM


> "William A. Rowe, Jr." <wr...@rowe-clan.net> writes:
> 
> > > What if 3 different files in the tree (scattered about) are
> > > un-removable for different reasons?  This function still removes
> > > everything else and finishes its recursion.  Which of the three status
> > > codes should it return?
> > 
> > The first it encountered.
> > 
> > Here's the scenario...
> > 
> > path/to/some/where/file
> > 
> > If we remove 'some', and fail at file due to EACCESS, we will fail
> > at 'where' as well since it contains files, and at 'some' since it
> > contains a directory.  We don't care that we failed because files
> > remain, we care why that file remains in the first place.
> 
> Sorry, I'm still not following you.  Your explanation makes me think
> that as *soon* as any object returns a remove-error, all recursion
> should immediately return.  
> 
> This contradicts the earlier goal of "delete everything it can anyway"
> -- which implies finishing the whole depth-first search and just
> caching any remove-errors along the way.
> 
> So which design do we want?  Return-on-first-error, or
> Return-with-a-list-of-errors?

Stow the first error encountered, continue to attempt to remove everything
we can, then return that first error.

Re: [PATCH] apr_dir_remove_recursively

Posted by Ben Collins-Sussman <su...@collab.net>.

"William A. Rowe, Jr." <wr...@rowe-clan.net> writes:

> > What if 3 different files in the tree (scattered about) are
> > un-removable for different reasons?  This function still removes
> > everything else and finishes its recursion.  Which of the three status
> > codes should it return?
> 
> The first it encountered.
> 
> Here's the scenario...
> 
> path/to/some/where/file
> 
> If we remove 'some', and fail at file due to EACCESS, we will fail
> at 'where' as well since it contains files, and at 'some' since it
> contains a directory.  We don't care that we failed because files
> remain, we care why that file remains in the first place.

Sorry, I'm still not following you.  Your explanation makes me think
that as *soon* as any object returns a remove-error, all recursion
should immediately return.  

This contradicts the earlier goal of "delete everything it can anyway"
-- which implies finishing the whole depth-first search and just
caching any remove-errors along the way.

So which design do we want?  Return-on-first-error, or
Return-with-a-list-of-errors?

Re: [PATCH] apr_dir_remove_recursively

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.

From: "Ben Collins-Sussman" <su...@collab.net>
Sent: Wednesday, May 23, 2001 10:31 AM


> "William A. Rowe, Jr." <ad...@rowe-clan.net> writes:
> 
> > > I agree.  I'm thinking of rewriting it to leave files behind.  I
> > > suspect the routine could return either APR_SUCCESS (if nothing but
> > > was left behind), or some not-yet-invented error code that
> > > specifically means it wasn't able to remove everything.
> > 
> > should simply return the error that the _remove returned, so if there is
> > an error, there was some reason it wasn't completed.  But it still completes
> > all that it is allowed.
> 
> What if 3 different files in the tree (scattered about) are
> un-removable for different reasons?  This function still removes
> everything else and finishes its recursion.  Which of the three status
> codes should it return?

The first it encountered.

Here's the scenario...

path/to/some/where/file

If we remove 'some', and fail at file due to EACCESS, we will fail at 'where' 
as well since it contains files, and at 'some' since it contains a directory.
We don't care that we failed because files remain, we care why that file 
remains in the first place.

Bill

Re: [PATCH] apr_dir_remove_recursively

Posted by Ben Collins-Sussman <su...@collab.net>.

"William A. Rowe, Jr." <ad...@rowe-clan.net> writes:

> > I agree.  I'm thinking of rewriting it to leave files behind.  I
> > suspect the routine could return either APR_SUCCESS (if nothing but
> > was left behind), or some not-yet-invented error code that
> > specifically means it wasn't able to remove everything.
> 
> should simply return the error that the _remove returned, so if there is
> an error, there was some reason it wasn't completed.  But it still completes
> all that it is allowed.

What if 3 different files in the tree (scattered about) are
un-removable for different reasons?  This function still removes
everything else and finishes its recursion.  Which of the three status
codes should it return?

Re: [PATCH] apr_dir_remove_recursively

Posted by "William A. Rowe, Jr." <ad...@rowe-clan.net>.

From: "Ben Collins-Sussman" <su...@collab.net>
Sent: Wednesday, May 23, 2001 9:17 AM


> "William A. Rowe, Jr." <ad...@rowe-clan.net> writes:
>  
> > My personal preference is that the remove continues to run, leaving only
> > droppings it can't remove.  Nothing irks me more than fixing the 'one file'
> > that couldn't be removed, only to trip over the next one on the next pass.
> > And save the first failure result for return to the caller.
> 
> I agree.  I'm thinking of rewriting it to leave files behind.  I
> suspect the routine could return either APR_SUCCESS (if nothing but
> was left behind), or some not-yet-invented error code that
> specifically means it wasn't able to remove everything.

should simply return the error that the _remove returned, so if there is
an error, there was some reason it wasn't completed.  But it still completes
all that it is allowed.

> > Does it make sense to apply some lstat check against the tree, such that
> > symlinks' targets aren't blown away?  I'm not clear if your patch protects 
> > that or not.
> 
> Good idea.

Don't go lstating until you've checked the returned flags, the apr dir stuff
can, on some platforms, tell if it's a symlink.  Dunno about all of them.

Bill

Re: [PATCH] apr_dir_remove_recursively

Posted by Ben Collins-Sussman <su...@collab.net>.

"William A. Rowe, Jr." <ad...@rowe-clan.net> writes:

> My personal preference is that the remove continues to run, leaving only
> droppings it can't remove.  Nothing irks me more than fixing the 'one file'
> that couldn't be removed, only to trip over the next one on the next pass.
> And save the first failure result for return to the caller.

I agree.  I'm thinking of rewriting it to leave files behind.  I
suspect the routine could return either APR_SUCCESS (if nothing but
was left behind), or some not-yet-invented error code that
specifically means it wasn't able to remove everything.

> Does it make sense to apply some lstat check against the tree, such that
> symlinks' targets aren't blown away?  I'm not clear if your patch protects 
> that or not.

Good idea.

Re: [PATCH] apr_dir_remove_recursively

Posted by "William A. Rowe, Jr." <ad...@rowe-clan.net>.

From: "Ben Collins-Sussman" <su...@collab.net>
Sent: Tuesday, May 22, 2001 8:44 AM


> OK, it's not actually a patch, because I'm not sure where it ought to
> live.  I've attached it below.
> 
> The function apr_dir_remove() won't work on a non-empty directory, so
> this routine fills the gap.  It's equivalent to 'rm -rf'.

My personal preference is that the remove continues to run, leaving only
droppings it can't remove.  Nothing irks me more than fixing the 'one file'
that couldn't be removed, only to trip over the next one on the next pass.
And save the first failure result for return to the caller.

Does it make sense to apply some lstat check against the tree, such that
symlinks' targets aren't blown away?  I'm not clear if your patch protects 
that or not.

Bill

Re: [PATCH] apr_dir_remove_recursively

Posted by Greg Stein <gs...@lyra.org>.

On Wed, May 23, 2001 at 09:15:20AM -0500, Ben Collins-Sussman wrote:
> Greg Stein <gs...@lyra.org> writes:
>...
> > 1) APR_STATUS_IS_SUCCESS is kind of a bogus macro; by definition,
> >    APR_SUCCESS is zero. Testing it can be done more clearly than using the
> >    macro.
> 
> Really?  I prefer just to check for zeroness myself, but I swear I
> remember being scolded for not using this macro.  Maybe it's just a
> holdover from Karl's paranoia.

Yes, really. APR_STATUS_IS_* for *other* errors is "required". But for
success, it isn't and if you look at all the code around, we don't use that
macro.

I prefer to check against APR_SUCCESS, but there is also a lot of "if"
statements that assume zero-ness for APR_SUCCESS. And that is valid, as we
*semantically* defined APR_SUCCESS to be zero.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: [PATCH] apr_dir_remove_recursively

Posted by Ben Collins-Sussman <su...@collab.net>.

Greg Stein <gs...@lyra.org> writes:

> This should be in the Apache coding style before committing. Notably: no
> space between symbol and opening paren, braces on the if/else lines, and the
> return status lines on a separate line.

Oh, you mean the exact *opposite* of GNU / Subversion coding style. ;)

> 1) APR_STATUS_IS_SUCCESS is kind of a bogus macro; by definition,
>    APR_SUCCESS is zero. Testing it can be done more clearly than using the
>    macro.

Really?  I prefer just to check for zeroness myself, but I swear I
remember being scolded for not using this macro.  Maybe it's just a
holdover from Karl's paranoia.

> 2) the strcmp() calls can be tossed:

Check.

Re: [PATCH] apr_dir_remove_recursively

Posted by Greg Stein <gs...@lyra.org>.

On Tue, May 22, 2001 at 08:44:19AM -0500, Ben Collins-Sussman wrote:
>...
> This routine currently lives in a Subversion library.  I'd like to
> move it into apr_file_io.h, but I'm not sure where the actual code
> should live.  It's implemented using nothing but other apr file
> routines;  does this mean duplicating the code into file_io/unix,
> file_io/win32, file_io/os2?

Nope. Put the file into file_io/unix/something.c. The other platforms will
just use/refer to it. See unix/fullrw.c and os2/fullrw.c for an example.
(you'll also see that apr/libapr.dsp refers to files in the unix/ dirs)

> apr_status_t
> apr_dir_remove_recursively (const char *path, apr_pool_t *pool)
> {
>   apr_status_t status;
>   apr_dir_t *this_dir;
>   apr_finfo_t this_entry;
>   apr_pool_t *subpool;
>   apr_int32_t flags = APR_FINFO_TYPE | APR_FINFO_NAME;
> 
>   status = apr_pool_create (&subpool, pool);
>   if (! (APR_STATUS_IS_SUCCESS (status))) return status;
> 
>   status = apr_dir_open (&this_dir, path, subpool);
>   if (! (APR_STATUS_IS_SUCCESS (status))) return status;
> 
>   for (status = apr_dir_read (&this_entry, flags, this_dir);
>        APR_STATUS_IS_SUCCESS (status);
>        status = apr_dir_read (&this_entry, flags, this_dir))
>     {
>       char *fullpath = apr_pstrcat (subpool, path, "/", this_entry.name, NULL);
> 
>       if (this_entry.filetype == APR_DIR)
>         {
>           if ((strcmp (this_entry.name, ".") == 0)
>               || (strcmp (this_entry.name, "..") == 0))
>             continue;
> 
>           status = apr_dir_remove_recursively (fullpath, subpool);
>           if (! (APR_STATUS_IS_SUCCESS (status))) return status;
>         }
>       else if (this_entry.filetype == APR_REG)
>         {
>           status = apr_file_remove (fullpath, subpool);
>           if (! (APR_STATUS_IS_SUCCESS (status))) return status;
>         }
>     }
> 
>   if (! (APR_STATUS_IS_ENOENT (status)))
>     return status;
> 
>   else
>     {
>       status = apr_dir_close (this_dir);
>       if (! (APR_STATUS_IS_SUCCESS (status))) return status;
>     }
> 
>   status = apr_dir_remove (path, subpool);
>   if (! (APR_STATUS_IS_SUCCESS (status))) return status;
> 
>   apr_pool_destroy (subpool);
> 
>   return APR_SUCCESS;
> }

This should be in the Apache coding style before committing. Notably: no
space between symbol and opening paren, braces on the if/else lines, and the
return status lines on a separate line.

Two code comments:

1) APR_STATUS_IS_SUCCESS is kind of a bogus macro; by definition,
   APR_SUCCESS is zero. Testing it can be done more clearly than using the
   macro.

2) the strcmp() calls can be tossed:

    if (this_entry.name[0] == '.'
        && (this_entry.name[1] == '\0'
	    || (this_entry.name[1] == '.' && this_entry.name[2] == '\0')))
        continue;


Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

Re: [PATCH] apr_dir_remove_recursively

Posted by "William A. Rowe, Jr." <ad...@rowe-clan.net>.

No, don't duplicate.  Please create a file_io/unix/fileutil.c source for
functions that are common to all platforms (since they are implemented
in terms of apr itself.)

Should do the trick.

Bill

----- Original Message ----- 
From: "Ben Collins-Sussman" <su...@collab.net>
To: <de...@apr.apache.org>
Sent: Tuesday, May 22, 2001 8:44 AM
Subject: [PATCH] apr_dir_remove_recursively


> 
> OK, it's not actually a patch, because I'm not sure where it ought to
> live.  I've attached it below.
> 
> The function apr_dir_remove() won't work on a non-empty directory, so
> this routine fills the gap.  It's equivalent to 'rm -rf'.
> 
> This routine currently lives in a Subversion library.  I'd like to
> move it into apr_file_io.h, but I'm not sure where the actual code
> should live.  It's implemented using nothing but other apr file
> routines;  does this mean duplicating the code into file_io/unix,
> file_io/win32, file_io/os2?
> 
> ---------------
> 
> apr_status_t
> apr_dir_remove_recursively (const char *path, apr_pool_t *pool)
> {
>   apr_status_t status;
>   apr_dir_t *this_dir;
>   apr_finfo_t this_entry;
>   apr_pool_t *subpool;
>   apr_int32_t flags = APR_FINFO_TYPE | APR_FINFO_NAME;
> 
>   status = apr_pool_create (&subpool, pool);
>   if (! (APR_STATUS_IS_SUCCESS (status))) return status;
> 
>   status = apr_dir_open (&this_dir, path, subpool);
>   if (! (APR_STATUS_IS_SUCCESS (status))) return status;
> 
>   for (status = apr_dir_read (&this_entry, flags, this_dir);
>        APR_STATUS_IS_SUCCESS (status);
>        status = apr_dir_read (&this_entry, flags, this_dir))
>     {
>       char *fullpath = apr_pstrcat (subpool, path, "/", this_entry.name, NULL);
> 
>       if (this_entry.filetype == APR_DIR)
>         {
>           if ((strcmp (this_entry.name, ".") == 0)
>               || (strcmp (this_entry.name, "..") == 0))
>             continue;
> 
>           status = apr_dir_remove_recursively (fullpath, subpool);
>           if (! (APR_STATUS_IS_SUCCESS (status))) return status;
>         }
>       else if (this_entry.filetype == APR_REG)
>         {
>           status = apr_file_remove (fullpath, subpool);
>           if (! (APR_STATUS_IS_SUCCESS (status))) return status;
>         }
>     }
> 
>   if (! (APR_STATUS_IS_ENOENT (status)))
>     return status;
> 
>   else
>     {
>       status = apr_dir_close (this_dir);
>       if (! (APR_STATUS_IS_SUCCESS (status))) return status;
>     }
> 
>   status = apr_dir_remove (path, subpool);
>   if (! (APR_STATUS_IS_SUCCESS (status))) return status;
> 
>   apr_pool_destroy (subpool);
> 
>   return APR_SUCCESS;
> }
>