You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Sander Striker <st...@apache.org> on 2001/09/01 00:27:58 UTC

RE: CVS update: MODIFIED: libsvn_wc ...

> On Fri, Aug 31, 2001 at 09:19:42AM -0500, Ben Collins-Sussman wrote:
> >...
> > The working copy commit-crawler needs to commit N targets spread out
> > randomly over a large tree.  Because commits are atomic, it describes
> > the commit as *one* large edit to the tree;  this means starting the
> > commit from a common "parent" directory and then traversing
> > directories in an intelligent order.  The rule we want to follow is:
> > after we leave a directory (go "up"), we want to close the directory.
> > We don't want to have to randomly re-open it again.
>
> That's the idea...
>
> > Thus, the crawler
> > is *already* sorting the targets.  As Mike said, right now the targets
> > are being qsorted alphabetically, which guarantees that all children
> > in the same directory will be examined as a group.
>
> Nope. It *isn't* doing that, which is why I posted the question
> in the first
> place. If we were doing a proper traversal, then we wouldn't need to check
> whether a lock had been taken out already. Thus, Mike's change to look for
> an existing lock is merely covering up a deeper issue (that was my worry).

Right.  I'd like to take on this one tomorrow (Amsterdam time), if you are
alright with it.  It is only a small patch that will get the behaviour of
visiting each directory only once (by adjusting the sorting).

Sander

> Cheers,
> -g
>
> --
> Greg Stein, http://www.lyra.org/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: CVS update: MODIFIED: libsvn_wc ...

Posted by Greg Stein <gs...@lyra.org>.
On Sun, Sep 02, 2001 at 09:45:46PM +0200, Sander Striker wrote:
>...
> > Right.  I'd like to take on this one tomorrow (Amsterdam time), if you are
> > alright with it.  It is only a small patch that will get the behaviour of
> > visiting each directory only once (by adjusting the sorting).
> 
> I've cooked up a patch that I think will do the trick.  Ofcourse it needs
> to be tested.
> 
> It has (IMO) one downside: it assumes that strings are NUL terminated.

Why would you think they aren't? By definition, anything in an
svn_string(buf)_t is null-terminated. (see svn_strings.h)

> OTOH this isn't going to be a big problem when I look at the task
> list and issue #406.

It isn't a problem now :-)

> Oh, I also removed 'else' statements where there was only one path
> for the code to take.  The ones like these IOW:

No problem. I do the same.

>...
> I don't know if this patch is really needed, but I do think that
> visiting a dir only once justifies the extra cost of sorting like
> this.

Yah. Makes sense.


The patch looks reasonable. I'll let the WC guys investigate more closely,
tho...

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: CVS update: MODIFIED: libsvn_wc ...

Posted by cm...@collab.net.
"Sander Striker" <st...@apache.org> writes:

> Just ignore the previous comments, thanks for explaining.

No problem.

> PS. The reason for saying the code was visiting a dir twice is that
>     'changing the current dir' to a dir counts as a visit.  In the scenario
>     above you drop out of the recursion and 'change the current dir' to
>     the parent again (== revisit).  Terminology, language barriers, all
>     very nice :)

Ah.  Gotcha.  I see where you were coming from on that now.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: CVS update: MODIFIED: libsvn_wc ...

Posted by Sander Striker <st...@apache.org>.
> "Sander Striker" <st...@apache.org> writes:
>
> > I don't know if this patch is really needed, but I do think that
> > visiting a dir only once justifies the extra cost of sorting like
> > this.
>
> --
>
> Greg Stein <gs...@lyra.org> writes:
>
> > > Thus, the crawler
> > > is *already* sorting the targets.  As Mike said, right now the targets
> > > are being qsorted alphabetically, which guarantees that all children
> > > in the same directory will be examined as a group.
> >
> > Nope. It *isn't* doing that, which is why I posted the question
> in the first
> > place. If we were doing a proper traversal, then we wouldn't
> need to check
> > whether a lock had been taken out already. Thus, Mike's change
> to look for
> > an existing lock is merely covering up a deeper issue (that was
> my worry).
>
> --
>
> Statements like this make it pretty obvious that people do NOT
> understand what's currently going on.
>
> 1.  We are not 'visiting a dir' more than once.  We visit once, we do
>     some things to children of that dir, and then we're done.  Sure,
>     some of the children of that dir are directories themselves and
>     require some recursion, but that's irrelevant.

*blink* *blink* I'm starting to see the light now.

> 2.  Yes, the paths are being sorted alphabetically, with attention
>     given to path separators so that each child is only visited once,
>     but NOT such that directory children are visited before
>     non-directory children.  In fact, such determination *could not*
>     be made without polling the disk, which is outside the domain of
>     knowledge this function requires, or should require.
>
>     Take this example:
>
>        svn ci foo baz/bie baz/bell bar
>
>     Now, our sorted targets will be:
>
>        bar baz/bell baz/bie foo
>
>     Are they files?  Are they dirs?  We can safely assume that baz is
>     a directory, but can't make any assumptions about bar or foo.
>     *Who cares*.  We visit the parent directory of bar, baz, and foo,
>     and crawl bar, baz/bell, baz/bie, and foo.  When we finish with
>     baz/bie, we exit that recursion and yes, we've forgetting that our
>     parent directory is already locked -- which is why we now check
>     that fact before locking again.

So, why is it trying to lock the parent again?  Is this just an artifact
from the way svn sets the 'current directory' to edit?

> Will someone who actually understands the code please tell me what the
> problem with this scenario is?

No problem with it.  I was just trying to prevent the 'current dir' from
changing* if there were items to do that didn't need recursion.  Ofcourse,
like you said above, it isn't possible to tell from the path names.  You
could defer the recursive items until the non-recursive ones are exhausted,
but it isn't needed.

*) And thus preventing the code from trying to lock again.

Just ignore the previous comments, thanks for explaining.
Btw, the last lines in the patch still hold and save a call to strncmp().

Sander

PS. The reason for saying the code was visiting a dir twice is that
    'changing the current dir' to a dir counts as a visit.  In the scenario
    above you drop out of the recursion and 'change the current dir' to
    the parent again (== revisit).  Terminology, language barriers, all
    very nice :)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: CVS update: MODIFIED: libsvn_wc ...

Posted by cm...@collab.net.
"Sander Striker" <st...@apache.org> writes:

> I don't know if this patch is really needed, but I do think that
> visiting a dir only once justifies the extra cost of sorting like
> this.

--

Greg Stein <gs...@lyra.org> writes:

> > Thus, the crawler
> > is *already* sorting the targets.  As Mike said, right now the targets
> > are being qsorted alphabetically, which guarantees that all children
> > in the same directory will be examined as a group.  
> 
> Nope. It *isn't* doing that, which is why I posted the question in the first
> place. If we were doing a proper traversal, then we wouldn't need to check
> whether a lock had been taken out already. Thus, Mike's change to look for
> an existing lock is merely covering up a deeper issue (that was my worry).

--

Statements like this make it pretty obvious that people do NOT
understand what's currently going on.

1.  We are not 'visiting a dir' more than once.  We visit once, we do
    some things to children of that dir, and then we're done.  Sure,
    some of the children of that dir are directories themselves and
    require some recursion, but that's irrelevant.

2.  Yes, the paths are being sorted alphabetically, with attention
    given to path separators so that each child is only visited once,
    but NOT such that directory children are visited before
    non-directory children.  In fact, such determination *could not*
    be made without polling the disk, which is outside the domain of
    knowledge this function requires, or should require.  

    Take this example:

       svn ci foo baz/bie baz/bell bar

    Now, our sorted targets will be:

       bar baz/bell baz/bie foo

    Are they files?  Are they dirs?  We can safely assume that baz is
    a directory, but can't make any assumptions about bar or foo.
    *Who cares*.  We visit the parent directory of bar, baz, and foo,
    and crawl bar, baz/bell, baz/bie, and foo.  When we finish with
    baz/bie, we exit that recursion and yes, we've forgetting that our
    parent directory is already locked -- which is why we now check
    that fact before locking again.

Will someone who actually understands the code please tell me what the
problem with this scenario is?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: CVS update: MODIFIED: libsvn_wc ...

Posted by Sander Striker <st...@apache.org>.
Hi again,

>> On Fri, Aug 31, 2001 at 09:19:42AM -0500, Ben Collins-Sussman wrote:
>>>...
>>> The working copy commit-crawler needs to commit N targets spread out
>>> randomly over a large tree.  Because commits are atomic, it describes
>>> the commit as *one* large edit to the tree;  this means starting the
>>> commit from a common "parent" directory and then traversing
>>> directories in an intelligent order.  The rule we want to follow is:
>>> after we leave a directory (go "up"), we want to close the directory.
>>> We don't want to have to randomly re-open it again.
>>
>> That's the idea...
>>
>>> Thus, the crawler
>>> is *already* sorting the targets.  As Mike said, right now the targets
>>> are being qsorted alphabetically, which guarantees that all children
>>> in the same directory will be examined as a group.
>>
>> Nope. It *isn't* doing that, which is why I posted the question in the
first
>> place. If we were doing a proper traversal, then we wouldn't need to
check
>> whether a lock had been taken out already. Thus, Mike's change to look
for
>> an existing lock is merely covering up a deeper issue (that was my
worry).
>
> Right.  I'd like to take on this one tomorrow (Amsterdam time), if you are
> alright with it.  It is only a small patch that will get the behaviour of
> visiting each directory only once (by adjusting the sorting).

I've cooked up a patch that I think will do the trick.  Ofcourse it needs
to be tested.

It has (IMO) one downside: it assumes that strings are NUL terminated.
OTOH this isn't going to be a big problem when I look at the task
list and issue #406.

Oh, I also removed 'else' statements where there was only one path
for the code to take.  The ones like these IOW:

if (a)
  return 1;
else
  return 0;

  ==

if (a)
  return 1;

return 0;


I don't know if this patch is really needed, but I do think that
visiting a dir only once justifies the extra cost of sorting like
this.

Sander


The patch:
--- libsvn_subr/path.c  Fri Aug 31 00:31:20 2001
+++ libsvn_subr/path.c~ Sun Sep  2 21:33:50 2001
@@ -216,19 +216,50 @@
   size_t min_len = ((path1->len) < (path2->len)) ? path1->len : path2->len;
   size_t i;
   char dirsep = get_separator_from_style (style);
+  char *sep1, *sep2;

   /* Skip past common prefix. */
   for (i = 0; (i < min_len) && (path1->data[i] == path2->data[i]); i++)
     ;

-  if ((path1->len == path2->len) && (i >= min_len))
-    return 0;     /* the paths are the same */
-  else if (path1->data[i] == dirsep)
+  if (i >= min_len) {
+    if (path1->len == path2->len)
+      return 0;   /* the paths are the same */
+
+    if (path1->len < path2->len)
+      return -1;
+
+    return 1;
+  }
+
+  if (path1->data[i] == dirsep)
     return 1;     /* path1 child of path2, parent always comes before child
*/
-  else if (path2->data[i] == dirsep)
+
+  if (path2->data[i] == dirsep)
     return -1;    /* path2 child of path1, parent always comes before child
*/
-  else
-    return strncmp (path1->data + i, path2->data + i, (min_len - i));
+
+  /* 'count' directory seperators
+   * This does assume that strings are NUL terminated.
+   */
+  sep1 = &path1[i];
+  sep2 = &path2[i];
+  do {
+    sep1++;
+    sep1 = strchr(sep1, dirsep);
+    sep2++;
+    sep2 = strchr(sep2, dirsep);
+  } while (sep1 && sep2);
+
+  if (sep1)
+      return 1;   /* path1 is deeper in the tree than path2 */
+
+  if (sep2)
+      return -1;  /* path2 is deeper in the tree than path1 */
+
+  /* Since we already checked the common prefix, we only need to
+   * check if the next character is bigger or smaller
+   */
+  return path1->data[i + 1] < path2->data[i + 1] ? -1 : 1;
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org