You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Josh Pieper <jj...@pobox.com> on 2004/05/22 13:11:28 UTC

Improving the performance of libsvn_wc for checkouts/updates/switches

Since ghudson is working on turning a O(n^2) import into O(n), I
thought I would look at what in the working copy library was causing
O(n^2) behavior for checkouts and updates.  This mail is an attempt to
show my solution to the problem and make sure that I am not violating
any of the guarantees we want the working copy library to make.  Let
me know if something isn't right, or can be improved.

First, why the n^2 behavior is there: During a checkout, the update
editor receives each new file addition one by one.  For each addition,
it writes out a short log file that describes the changes necessary to
incorporate the one file, then it runs that short log file.  The log
file causes the entire entries file to be read in, changes to be made,
and then the entire entries file to be written back out.  Since
reading and writing the entries file takes time proportional to the
number of files in the directory, the cost to add a new file is also
proportional to the number of files already in the directory, thus
causing O(n^2).

My solution:

Step 1:
The first step was to make the update editor create one big log file
for each directory, running this log file when the directory is closed
(deletes have the log file run immediately, so that updates that
delete then add work).  This involved opening a new log file when the
directory is opened and removing all the open and close log file code
from the beginning and end of each individual operation.  This way
when the log file is eventually run, all the information needed to
construct the final entries file is available in one place, but all
the non-written to disk changes are still in the log file in the event
of failure.

Step 2:
Make the log file execution code read the entries file once at the
beginning of the log file, then make all subsequent changes to an in
memory hash.  The entries file is written out only once, immediately
before the log file is completed.  This involved making new versions
of svn_wc__entry_modify and svn_wc__get_keywords that would use the
in-memory entries hash.


Pros:
 * O(n) behavior when adding/modifying files using the update editor.

Cons:
 * A lot more temporary disk space is used as the log file for an
   entire directory is created.
   
 * An update of all deletes will still be n^2.

 * If log file execution is interrupted, none of the entries will have
   been written to disk.  When cleanup happens, the entire log-file
   will have to be re-read and executed.


Performance:
There are still O(n^2) bits in libsvn_fs right now, so checkouts
aren't linear yet.  However, this does still improve performance right
now by about a factor of 4.  Below are some preliminary speed numbers
for a checkout of a large directory.

BDB @ trunk
# of files, wall clock time, (cpu time)
-----------------------------------------
100: 1.299 (0.244)
200: 1.824 (0.698)
400: 4.271 (2.317)
800: 10.761 (8.217)
1600: 37.004 (31.030)
3200: 2m22.65 (2m2.11)


BDB w/ O(n) changes
----------------------
100: 0.716 (0.113)
200: 1.112 (0.250)
400: 1.526 (0.618)
800: 4.013 (1.815)
1600: 8.273 (5.741)
3200: 24.638 (20.838)


-Josh

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Execute groups of log-files, was (Re: Improving the performance of libsvn_wc for checkouts/updates/switches)

Posted by Josh Pieper <jj...@pobox.com>.
Josh Pieper wrote:
> Philip Martin wrote:
> > The problem is that ra_dav is the only layer that uses wcprops, if you
> > only test over ra_local you won't see wcprops related failures.
> > 
> > The original problem generated some dev list discussion between me and
> > Karl, and some code from Karl.  As I recall the problem was that an
> > interrupted update caused the wcprops and the text-base/revision to be
> > out-of-sync (I can't remember which one was wrong).  I don't know
> > whether the wcprops code is still fragile, or whether Karl's change
> > fixed it.
> 
> Ok, I did some testing over ra_dav and found no problems.  It passes
> the whole test-suite, plus I did quite a bit of checkouts/updating and
> interrupting without any issues.  I also looked at issue #1136, which
> I think is what you were referring to here.  I was unable to reproduce
> that failure with the new code either.
> 
> One last concern.  Do we need to be worried about compatability?  This
> change would prevent trunk from running the log files from older
> working copies since their names have changed.  Should I hack in
> something to try "log" before trying "log.1" or can we just accept
> that amount of change?

It turns out that keeping compatability wasn't too hard, it in fact
reduced the number of files I had to touch.  Therefore, I went ahead
and committed this optimization in r9868.

There is one other optimization I want to make to the working copy
code, which should get it finally to O(n) for very large directory
sizes.  I'll post that in a new thread later on, this thread is
starting to get pretty deep.

-Josh

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Execute groups of log-files, was (Re: Improving the performance of libsvn_wc for checkouts/updates/switches)

Posted by kf...@collab.net.
Josh Pieper <jj...@pobox.com> writes:
> Philip Martin wrote:
> > The original problem generated some dev list discussion between me and
> > Karl, and some code from Karl.  As I recall the problem was that an
> > interrupted update caused the wcprops and the text-base/revision to be
> > out-of-sync (I can't remember which one was wrong).  I don't know
> > whether the wcprops code is still fragile, or whether Karl's change
> > fixed it.
> 
> Ok, I did some testing over ra_dav and found no problems.  It passes
> the whole test-suite, plus I did quite a bit of checkouts/updating and
> interrupting without any issues.  I also looked at issue #1136, which
> I think is what you were referring to here.  I was unable to reproduce
> that failure with the new code either.

You wouldn't generally see the problem in normal testing, you'd only
see it in interrupted operations over ra_dav and other extraordinary
circumstances.

Let me dredge up what I can from the musty caverns of memory...

Well, hmm, this might be resolved issue #1136, in which case what I
said above isn't true -- you *can* get a bug in normal operations.
The patch in that issue helps explain why wcprops are "special"
sometimes.

> One last concern.  Do we need to be worried about compatability?  This
> change would prevent trunk from running the log files from older
> working copies since their names have changed.  Should I hack in
> something to try "log" before trying "log.1" or can we just accept
> that amount of change?

The only operation this could affect would be 'svn cleanup', since log
files shouldn't be present when any other operation starts.  So the
scenario here is that someone

   1. does some operation that leaves the working copy in need of cleanup
   2. upgrades their subversion :-)
   3. runs the next operation, is told they must run 'svn cleanup' first
   4. tries 'svn cleanup'

Step 2 seems unlikely, though I suppose it could happen to someone.
They can always downgrade temporarily to solve it, though, so there's
a workaround.  So I don't think we really need to consider log
behavior to be part of the "working copy format"; no need to worry
about compatibility here, IMHO.

-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Execute groups of log-files, was (Re: Improving the performance of libsvn_wc for checkouts/updates/switches)

Posted by Josh Pieper <jj...@pobox.com>.
Philip Martin wrote:
> The problem is that ra_dav is the only layer that uses wcprops, if you
> only test over ra_local you won't see wcprops related failures.
> 
> The original problem generated some dev list discussion between me and
> Karl, and some code from Karl.  As I recall the problem was that an
> interrupted update caused the wcprops and the text-base/revision to be
> out-of-sync (I can't remember which one was wrong).  I don't know
> whether the wcprops code is still fragile, or whether Karl's change
> fixed it.

Ok, I did some testing over ra_dav and found no problems.  It passes
the whole test-suite, plus I did quite a bit of checkouts/updating and
interrupting without any issues.  I also looked at issue #1136, which
I think is what you were referring to here.  I was unable to reproduce
that failure with the new code either.

One last concern.  Do we need to be worried about compatability?  This
change would prevent trunk from running the log files from older
working copies since their names have changed.  Should I hack in
something to try "log" before trying "log.1" or can we just accept
that amount of change?

-Josh

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Execute groups of log-files, was (Re: Improving the performance of libsvn_wc for checkouts/updates/switches)

Posted by Philip Martin <ph...@codematters.co.uk>.
Josh Pieper <jj...@pobox.com> writes:

> Philip Martin wrote:
>> > @@ -2083,6 +2153,19 @@
>> >                /* If a conflict happens, then the entry will be
>> >                   marked "Conflicted" and will track either 2 or 3 new
>> >                   temporary fulltext files that resulted. */
>> 
>> Is the above comment now redundant?
>
> Well, my new comment doesn't really state it, but I suppose it can go.

Your comment doesn't need to state it, it's information that applies
to the old code.

>> > +
>> > +              /* Run a dry-run of the merge to see if a conflict will
>> > +                 occur.  This is needed so we can report back to the
>> > +                 client as the changes come in. */
>> > +              base = svn_wc_adm_access_path (adm_access);
>> > +
>> > +              SVN_ERR (svn_wc_merge (svn_path_join (base, txtb, pool),
>> > +                                     svn_path_join (base, tmp_txtb, pool),
>> > +                                     svn_path_join (base, base_name, pool),
>> > +                                     adm_access,
>> > +                                     oldrev_str, newrev_str, ".mine",
>> > +                                     TRUE, &merge_outcome, diff3_cmd,
>> > +                                     pool));
>> 
>> This is a bit unfortunate.  I think I see what you are doing, the old
>> code could examine the wc state to get this information but in the new
>> code the notification callback is called before the log file is run.
>> A dry-run merge can be time consuming when dealing with large files,
>> although since only files with local mods are affected I guess it may
>> not be too bad in practice.  I don't know if there are any
>> alternatives, I guess it's difficult to postpone the notification
>> callback.
>
> Yeah, I thought about it some too and decided that a constant factor
> extra merge per file with local mods was probably worth it to drop the
> O(n^2) behavior.  I imagine more people will run into the checkout of
> a large directory than would run into having many large files with
> local mods.  And at worst the latter half would be only twice as slow.
> The former could easily be many orders of magnitude slower.

I tend to agreee, but I still think it's unfortunate that we have to
do a potentially expensive operation twice.

>> Have you tested over ra_dav?  I recall we had problems in the past
>> with wcprops during an interrupted update, although I can't remember
>> the exact details.  It's possible we made the wcprops code robust
>> enough for this change to work.
>
> No I haven't.  Is the editor somehow driven differently via ra_dav
> that would cause different problems?  Was there an issue, and can you
> remember it?

The problem is that ra_dav is the only layer that uses wcprops, if you
only test over ra_local you won't see wcprops related failures.

The original problem generated some dev list discussion between me and
Karl, and some code from Karl.  As I recall the problem was that an
interrupted update caused the wcprops and the text-base/revision to be
out-of-sync (I can't remember which one was wrong).  I don't know
whether the wcprops code is still fragile, or whether Karl's change
fixed it.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Execute groups of log-files, was (Re: Improving the performance of libsvn_wc for checkouts/updates/switches)

Posted by Josh Pieper <jj...@pobox.com>.
Philip Martin wrote:
> > @@ -2083,6 +2153,19 @@
> >                /* If a conflict happens, then the entry will be
> >                   marked "Conflicted" and will track either 2 or 3 new
> >                   temporary fulltext files that resulted. */
> 
> Is the above comment now redundant?

Well, my new comment doesn't really state it, but I suppose it can go.

> > +
> > +              /* Run a dry-run of the merge to see if a conflict will
> > +                 occur.  This is needed so we can report back to the
> > +                 client as the changes come in. */
> > +              base = svn_wc_adm_access_path (adm_access);
> > +
> > +              SVN_ERR (svn_wc_merge (svn_path_join (base, txtb, pool),
> > +                                     svn_path_join (base, tmp_txtb, pool),
> > +                                     svn_path_join (base, base_name, pool),
> > +                                     adm_access,
> > +                                     oldrev_str, newrev_str, ".mine",
> > +                                     TRUE, &merge_outcome, diff3_cmd,
> > +                                     pool));
> 
> This is a bit unfortunate.  I think I see what you are doing, the old
> code could examine the wc state to get this information but in the new
> code the notification callback is called before the log file is run.
> A dry-run merge can be time consuming when dealing with large files,
> although since only files with local mods are affected I guess it may
> not be too bad in practice.  I don't know if there are any
> alternatives, I guess it's difficult to postpone the notification
> callback.

Yeah, I thought about it some too and decided that a constant factor
extra merge per file with local mods was probably worth it to drop the
O(n^2) behavior.  I imagine more people will run into the checkout of
a large directory than would run into having many large files with
local mods.  And at worst the latter half would be only twice as slow.
The former could easily be many orders of magnitude slower.

> Have you tested over ra_dav?  I recall we had problems in the past
> with wcprops during an interrupted update, although I can't remember
> the exact details.  It's possible we made the wcprops code robust
> enough for this change to work.

No I haven't.  Is the editor somehow driven differently via ra_dav
that would cause different problems?  Was there an issue, and can you
remember it?

-Josh

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Execute groups of log-files, was (Re: Improving the performance of libsvn_wc for checkouts/updates/switches)

Posted by Philip Martin <ph...@codematters.co.uk>.
Josh Pieper <jj...@pobox.com> writes:

> Philip Martin wrote:
>> Probably.  I think we could also delete log files as we go provided
>> cleanup did apr_dir_read to identify the lowest numbered log file.

Well done!  You ignored my thinko about deleting log files as we go;
the log files can only be deleted once the entries file has been
written.

> @@ -2083,6 +2153,19 @@
>                /* If a conflict happens, then the entry will be
>                   marked "Conflicted" and will track either 2 or 3 new
>                   temporary fulltext files that resulted. */

Is the above comment now redundant?

> +
> +              /* Run a dry-run of the merge to see if a conflict will
> +                 occur.  This is needed so we can report back to the
> +                 client as the changes come in. */
> +              base = svn_wc_adm_access_path (adm_access);
> +
> +              SVN_ERR (svn_wc_merge (svn_path_join (base, txtb, pool),
> +                                     svn_path_join (base, tmp_txtb, pool),
> +                                     svn_path_join (base, base_name, pool),
> +                                     adm_access,
> +                                     oldrev_str, newrev_str, ".mine",
> +                                     TRUE, &merge_outcome, diff3_cmd,
> +                                     pool));

This is a bit unfortunate.  I think I see what you are doing, the old
code could examine the wc state to get this information but in the new
code the notification callback is called before the log file is run.
A dry-run merge can be time consuming when dealing with large files,
although since only files with local mods are affected I guess it may
not be too bad in practice.  I don't know if there are any
alternatives, I guess it's difficult to postpone the notification
callback.

Have you tested over ra_dav?  I recall we had problems in the past
with wcprops during an interrupted update, although I can't remember
the exact details.  It's possible we made the wcprops code robust
enough for this change to work.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Execute groups of log-files, was (Re: Improving the performance of libsvn_wc for checkouts/updates/switches)

Posted by Branko Čibej <br...@xbc.nu>.
Josh Pieper wrote:

>@@ -1293,29 +1296,45 @@
>      a ghost open tag. */
>   SVN_ERR (svn_xml_parse (parser, log_start, strlen (log_start), 0));
> 
>-  /* Parse the log file's contents. */
>-  SVN_ERR_W (svn_wc__open_adm_file (&f, svn_wc_adm_access_path (adm_access),
>-                                    SVN_WC__ADM_LOG, APR_READ, pool),
>-             _("Couldn't open log"));
>-  
>-  do {
>-    buf_len = sizeof (buf);
>+  for (log_number = 1; ; log_number++)
>+    {
>+      svn_pool_clear (iterpool);
>+      logfile_path = apr_psprintf (iterpool, "log.%d", log_number);
>  
>
Eek. I think this should be SVN_WC__ADM_LOG ".%d", not "log.%d", to be 
consistent with usage elsewhere.

>@@ -1336,9 +1355,15 @@
>     }
>   else
>     {
>-      /* No 'killme'?  Remove the logfile;  its commands have been executed. */
>-      SVN_ERR (svn_wc__remove_adm_file (svn_wc_adm_access_path (adm_access),
>-                                        pool, SVN_WC__ADM_LOG, NULL));
>+      for (log_number--; log_number > 0; log_number--)
>+        {
>+          svn_pool_clear (iterpool);
>+          logfile_path = apr_psprintf (iterpool, "log.%d", log_number);
>  
>
Same here.

>@@ -811,7 +869,8 @@
> 
>   SVN_ERR (svn_wc__open_adm_file (&log_fp,
>                                   parent_path,
>-                                  SVN_WC__ADM_LOG,
>+                                  apr_psprintf (pool, SVN_WC__ADM_LOG
>+                                                ".%d", *log_number),
>  
>
Yes, exactly like this. :-)




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

[PATCH] Execute groups of log-files, was (Re: Improving the performance of libsvn_wc for checkouts/updates/switches)

Posted by Josh Pieper <jj...@pobox.com>.
Philip Martin wrote:
> Josh Pieper <jj...@pobox.com> writes:
> 
> > So, how about we execute all the log files in order, then erase them
> > in reverse order?  That should take care of any unforseen crashes,
> > yes?
> 
> Probably.  I think we could also delete log files as we go provided
> cleanup did apr_dir_read to identify the lowest numbered log file.

Ok Philip, since you're the resident libsvn_wc expert, think you could
look over this patch and make sure everything looks OK before I
commit?  It performs as well as anything I've done before and should
behave identically to the current trunk as far as resumed updates.

-Josh

-----------------------

Improve the performance of the libsvn_wc update editor by letting it
run multiple log files at the same time.  When running a group of log
files, the resulting entries file is written out only once at the
end.  Previously, the entries file was read and written at the
beginning and end of every log file.

As a result of this change, all logs in the working copy directory
have a numerical suffix that indicates which number they are in a
sequence.  During cleanup, the log files are executed in order, but
they are deleted in reverse order so that stale log files cannot be
left behind.

* subversion/libsvn_wc/props.c
  (svn_wc_merge_prop_diffs): Append a ".1" to the log files created
    here since they are not part of the update editor.

* subversion/libsvn_wc/adm_ops.c
  (svn_wc_process_committed): Append a ".1" to the log files here too.

* subversion/libsvn_wc/lock.c
  (svn_wc__adm_is_cleanup_required): Once again append a ".1" to the
    checked for log file.
    
* subversion/libsvn_wc/log.c
  (svn_wc__run_log): Loop through all files named "log.1", "log.2",
    etc. until a non-existant file is found.  After executing this log
    file group, erase them in reverse order.

* subversion/libsvn_wc/update_editor.c
  (struct dir_baton): Add a field to keep track of the next available
    log file number for this directory.
  (cleanup_dir_baton, cleanup_dir_baton_child): New, called when a
    dir_baton is destroyed.  They run any left over log files so that
    updates are easily resumable if a soft-interrupt caused the editor
    to end.
  (make_dir_baton): Initialize the new log file number field.
  (struct file_baton): Add a field to record the directory that this
    file is located in.
  (make_file_baton): Initialize the dir_baton field.
  (do_entry_deletion): Add a parameter that accepts a pointer to an
    integer that contains the next log file number to use.  Reset this
    log number after running the log file group.
  (delete_entry): Pass the log file number to do_entry_deletion.
  (close_directory): Always run the current log file group.
  (install_file): Accept a log file number, use it, and also perform a
    merge dry run if necessary so that the correct conflict status can
    be reported to the caller.  Previously running the log file would
    generate this information.
  (close_edit): Pass a log number of 1 to do_entry_deletion.
  (svn_wc_add_repos_file): Pass a log number of 1 to install file and
    run the resulting log file.

Index: subversion/libsvn_wc/props.c
===================================================================
--- subversion/libsvn_wc/props.c	(revision 9860)
+++ subversion/libsvn_wc/props.c	(working copy)
@@ -299,7 +299,7 @@
 
   if (! dry_run)
     {
-      SVN_ERR (svn_wc__open_adm_file (&log_fp, parent, SVN_WC__ADM_LOG,
+      SVN_ERR (svn_wc__open_adm_file (&log_fp, parent, SVN_WC__ADM_LOG ".1",
                                       (APR_WRITE | APR_CREATE), /* not excl */
                                       pool));
       log_accum = svn_stringbuf_create ("", pool);
@@ -317,7 +317,7 @@
                                          log_accum->len, NULL, pool),
                  apr_psprintf (pool, _("Error writing log for '%s'"), path));
 
-      SVN_ERR (svn_wc__close_adm_file (log_fp, parent, SVN_WC__ADM_LOG,
+      SVN_ERR (svn_wc__close_adm_file (log_fp, parent, SVN_WC__ADM_LOG ".1",
                                        1, /* sync */ pool));
       SVN_ERR (svn_wc__run_log (adm_access, NULL, pool));
     }
Index: subversion/libsvn_wc/log.c
===================================================================
--- subversion/libsvn_wc/log.c	(revision 9860)
+++ subversion/libsvn_wc/log.c	(working copy)
@@ -1274,6 +1274,9 @@
   char buf[BUFSIZ];
   apr_size_t buf_len;
   apr_file_t *f = NULL;
+  const char *logfile_path;
+  int log_number;
+  apr_pool_t *iterpool = svn_pool_create (pool);
 
   /* kff todo: use the tag-making functions here, now. */
   const char *log_start
@@ -1293,29 +1296,45 @@
      a ghost open tag. */
   SVN_ERR (svn_xml_parse (parser, log_start, strlen (log_start), 0));
 
-  /* Parse the log file's contents. */
-  SVN_ERR_W (svn_wc__open_adm_file (&f, svn_wc_adm_access_path (adm_access),
-                                    SVN_WC__ADM_LOG, APR_READ, pool),
-             _("Couldn't open log"));
-  
-  do {
-    buf_len = sizeof (buf);
+  for (log_number = 1; ; log_number++)
+    {
+      svn_pool_clear (iterpool);
+      logfile_path = apr_psprintf (iterpool, "log.%d", log_number);
+      /* Parse the log file's contents. */
+      err = svn_wc__open_adm_file (&f, svn_wc_adm_access_path (adm_access),
+                                   logfile_path, APR_READ, iterpool);
+      if (err)
+        {
+          if (APR_STATUS_IS_ENOENT (err->apr_err))
+            {
+              svn_error_clear (err);
+              break;
+            }
+          else
+            {
+              SVN_ERR_W (err, _("Couldn't open log"));
+            }
+        }
+      
+      do {
+        buf_len = sizeof (buf);
+        
+        err = svn_io_file_read (f, buf, &buf_len, iterpool);
+        if (err && !APR_STATUS_IS_EOF(err->apr_err))
+          return svn_error_createf
+            (err->apr_err, err,
+             _("Error reading administrative log file in '%s'"),
+             svn_wc_adm_access_path (adm_access));
+        
+        SVN_ERR (svn_xml_parse (parser, buf, buf_len, 0));
+        
+      } while (! err);
+      
+      svn_error_clear (err);
+      SVN_ERR (svn_io_file_close (f, iterpool));
+    }
 
-    err = svn_io_file_read (f, buf, &buf_len, pool);
-    if (err && !APR_STATUS_IS_EOF(err->apr_err))
-      return svn_error_createf
-        (err->apr_err, err,
-         _("Error reading administrative log file in '%s'"),
-         svn_wc_adm_access_path (adm_access));
 
-    SVN_ERR (svn_xml_parse (parser, buf, buf_len, 0));
-
-  } while (! err);
-
-  svn_error_clear (err);
-  SVN_ERR (svn_io_file_close (f, pool));
-
-
   /* Pacify Expat with a pointless closing element tag. */
   SVN_ERR (svn_xml_parse (parser, log_end, strlen (log_end), 1));
 
@@ -1336,9 +1355,15 @@
     }
   else
     {
-      /* No 'killme'?  Remove the logfile;  its commands have been executed. */
-      SVN_ERR (svn_wc__remove_adm_file (svn_wc_adm_access_path (adm_access),
-                                        pool, SVN_WC__ADM_LOG, NULL));
+      for (log_number--; log_number > 0; log_number--)
+        {
+          svn_pool_clear (iterpool);
+          logfile_path = apr_psprintf (iterpool, "log.%d", log_number);
+          
+          /* No 'killme'?  Remove the logfile;  its commands have been executed. */
+          SVN_ERR (svn_wc__remove_adm_file (svn_wc_adm_access_path (adm_access),
+                                            iterpool, logfile_path, NULL));
+        }
     }
 
   return SVN_NO_ERROR;
Index: subversion/libsvn_wc/adm_ops.c
===================================================================
--- subversion/libsvn_wc/adm_ops.c	(revision 9860)
+++ subversion/libsvn_wc/adm_ops.c	(working copy)
@@ -242,7 +242,7 @@
   /* Open a log file in the administrative directory */
   SVN_ERR (svn_wc__open_adm_file (&log_fp,
                                   svn_wc_adm_access_path (adm_access),
-                                  SVN_WC__ADM_LOG,
+                                  SVN_WC__ADM_LOG ".1",
                                   (APR_WRITE | APR_CREATE),
                                   pool));
 
@@ -378,7 +378,7 @@
              apr_psprintf (pool, _("Error writing log file for '%s'"), path));
       
   SVN_ERR (svn_wc__close_adm_file (log_fp, svn_wc_adm_access_path (adm_access),
-                                   SVN_WC__ADM_LOG,
+                                   SVN_WC__ADM_LOG ".1",
                                    TRUE, /* sync */
                                    pool));
 
Index: subversion/libsvn_wc/lock.c
===================================================================
--- subversion/libsvn_wc/lock.c	(revision 9860)
+++ subversion/libsvn_wc/lock.c	(working copy)
@@ -893,7 +893,8 @@
 {
   svn_node_kind_t kind;
   const char *log_path = svn_wc__adm_path (svn_wc_adm_access_path (adm_access),
-                                           FALSE, pool, SVN_WC__ADM_LOG, NULL);
+                                           FALSE, pool, SVN_WC__ADM_LOG ".1",
+                                           NULL);
 
   /* The presence of a log file demands cleanup */
   SVN_ERR (svn_io_check_path (log_path, &kind, pool));
Index: subversion/libsvn_wc/update_editor.c
===================================================================
--- subversion/libsvn_wc/update_editor.c	(revision 9860)
+++ subversion/libsvn_wc/update_editor.c	(working copy)
@@ -141,6 +141,9 @@
   /* The bump information for this directory. */
   struct bump_dir_info *bump_info;
 
+  /* The current log file number. */
+  int log_number;
+
   /* The pool in which this baton itself is allocated. */
   apr_pool_t *pool;
 };
@@ -211,7 +214,53 @@
   return entry->url;
 }
 
+/* An APR pool cleanup handler.  This runs the log file for a
+   directory baton. */
+static apr_status_t
+cleanup_dir_baton (void *dir_baton)
+{
+  struct dir_baton *db = dir_baton;
+  svn_error_t *err;
+  apr_status_t apr_err;
+  svn_wc_adm_access_t *adm_access;
 
+  /* If there are no log files to write, return immediately. */
+  if (db->log_number == 1)
+    return APR_SUCCESS;
+
+  err = svn_wc_adm_retrieve (&adm_access, db->edit_baton->adm_access,
+                             db->path, apr_pool_parent_get (db->pool));
+
+  if (err)
+    {
+      apr_err = err->apr_err;
+      svn_error_clear (err);
+      return apr_err;
+    }
+
+  err = svn_wc__run_log (adm_access, NULL, apr_pool_parent_get (db->pool));
+
+  if (err)
+    {
+      apr_err = err->apr_err;
+      svn_error_clear (err);
+      return apr_err;
+    }
+
+  return APR_SUCCESS;
+}
+
+/* An APR pool cleanup handler.  This is a child handler, it removes
+   the mail pool handler. */
+static apr_status_t
+cleanup_dir_baton_child (void *dir_baton)
+{
+  struct dir_baton *db = dir_baton;
+  apr_pool_cleanup_kill (db->pool, db, cleanup_dir_baton);
+  return APR_SUCCESS;
+}    
+
+
 /* Return a new dir_baton to represent NAME (a subdirectory of
    PARENT_BATON).  If PATH is NULL, this is the root directory of the
    edit. */
@@ -291,11 +340,15 @@
 
   d->edit_baton   = eb;
   d->parent_baton = pb;
-  d->pool         = pool;
+  d->pool         = svn_pool_create (pool);
   d->propchanges  = apr_array_make (pool, 1, sizeof (svn_prop_t));
   d->added        = added;
   d->bump_info    = bdi;
+  d->log_number   = 1;
 
+  apr_pool_cleanup_register (d->pool, d, cleanup_dir_baton,
+                             cleanup_dir_baton_child);
+  
   return d;
 }
 
@@ -443,6 +496,9 @@
   /* The global edit baton. */
   struct edit_baton *edit_baton;
 
+  /* The parent directory of this file. */
+  struct dir_baton *dir_baton;
+
   /* Pool specific to this file_baton. */
   apr_pool_t *pool;
 
@@ -520,6 +576,7 @@
   f->propchanges  = apr_array_make (pool, 1, sizeof (svn_prop_t));
   f->bump_info    = pb->bump_info;
   f->added        = adding;
+  f->dir_baton    = pb;
 
   /* No need to initialize f->digest, since we used pcalloc(). */
 
@@ -789,7 +846,8 @@
 static svn_error_t *
 do_entry_deletion (struct edit_baton *eb,
                    const char *parent_path,
-                   const char *path, 
+                   const char *path,
+                   int *log_number,
                    apr_pool_t *pool)
 {
   apr_file_t *log_fp = NULL;
@@ -811,7 +869,8 @@
 
   SVN_ERR (svn_wc__open_adm_file (&log_fp,
                                   parent_path,
-                                  SVN_WC__ADM_LOG,
+                                  apr_psprintf (pool, SVN_WC__ADM_LOG
+                                                ".%d", *log_number),
                                   (APR_WRITE | APR_CREATE), /* not excl */
                                   pool));
 
@@ -862,7 +921,8 @@
 
   SVN_ERR (svn_wc__close_adm_file (log_fp,
                                    parent_path,
-                                   SVN_WC__ADM_LOG,
+                                   apr_psprintf (pool, SVN_WC__ADM_LOG
+                                                 ".%d", (*log_number)++),
                                    TRUE, /* sync */
                                    pool));
     
@@ -906,6 +966,7 @@
 
   SVN_ERR (leftmod_error_chain (svn_wc__run_log (adm_access, NULL, pool),
                                 logfile_path, parent_path, pool));
+  *log_number = 1;
 
   /* The passed-in `path' is relative to the anchor of the edit, so if
    * the operation was invoked on something other than ".", then
@@ -934,7 +995,8 @@
               apr_pool_t *pool)
 {
   struct dir_baton *pb = parent_baton;
-  return do_entry_deletion (pb->edit_baton, pb->path, path, pool);
+  return do_entry_deletion (pb->edit_baton, pb->path, path, &pb->log_number,
+                            pool);
 }
 
 
@@ -1125,27 +1187,28 @@
   struct dir_baton *db = dir_baton;
   svn_wc_notify_state_t prop_state = svn_wc_notify_state_unknown;
   apr_array_header_t *entry_props, *wc_props, *regular_props;
-
+  svn_wc_adm_access_t *adm_access;
+      
   SVN_ERR (svn_categorize_props (db->propchanges, &entry_props, &wc_props,
                                  &regular_props, pool));
 
+  SVN_ERR (svn_wc_adm_retrieve (&adm_access, db->edit_baton->adm_access,
+                                db->path, db->pool));
+      
   /* If this directory has property changes stored up, now is the time
      to deal with them. */
   if (regular_props->nelts || entry_props->nelts || wc_props->nelts)
     {
-      svn_wc_adm_access_t *adm_access;
       apr_file_t *log_fp = NULL;
 
       /* to hold log messages: */
       svn_stringbuf_t *entry_accum = svn_stringbuf_create ("", db->pool);
 
-      SVN_ERR (svn_wc_adm_retrieve (&adm_access, db->edit_baton->adm_access,
-                                    db->path, db->pool));
-      
       /* Open log file */
       SVN_ERR (svn_wc__open_adm_file (&log_fp,
                                       db->path,
-                                      SVN_WC__ADM_LOG,
+                                      apr_psprintf (db->pool, SVN_WC__ADM_LOG
+                                                    ".%d", db->log_number),
                                       (APR_WRITE | APR_CREATE), /* not excl */
                                       db->pool));
 
@@ -1244,14 +1307,17 @@
       /* The log is ready to run, close it. */
       SVN_ERR (svn_wc__close_adm_file (log_fp,
                                        db->path,
-                                       SVN_WC__ADM_LOG,
+                                       apr_psprintf (db->pool, SVN_WC__ADM_LOG
+                                                     ".%d", db->log_number),
                                        TRUE, /* sync */
                                        db->pool));
 
-      /* Run the log. */
-      SVN_ERR (svn_wc__run_log (adm_access, NULL, db->pool));
     }
 
+  /* Run the log. */
+  SVN_ERR (svn_wc__run_log (adm_access, NULL, db->pool));
+  db->log_number = 1;
+  
   /* We're done with this directory, so remove one reference from the
      bump information. This may trigger a number of actions. See
      maybe_bump_dir_info() for more information.  */
@@ -1710,6 +1776,7 @@
 install_file (svn_wc_notify_state_t *content_state,
               svn_wc_notify_state_t *prop_state,
               svn_wc_adm_access_t *adm_access,
+              int *log_number,
               const char *file_path,
               svn_revnum_t new_revision,
               const char *new_text_path,
@@ -1731,6 +1798,7 @@
   svn_boolean_t magic_props_changed = FALSE;
   apr_array_header_t *regular_props = NULL, *wc_props = NULL,
     *entry_props = NULL;
+  enum svn_wc_merge_outcome_t merge_outcome;
 
   /* The code flow does not depend upon these being set to NULL, but
      it removes a gcc 3.1 `might be used uninitialized in this
@@ -1765,7 +1833,8 @@
      right now. */
   SVN_ERR (svn_wc__open_adm_file (&log_fp,
                                   parent_dir,
-                                  SVN_WC__ADM_LOG,
+                                  apr_psprintf (pool, SVN_WC__ADM_LOG ".%d",
+                                                *log_number),
                                   (APR_WRITE | APR_CREATE), /* not excl */
                                   pool));
 
@@ -2050,6 +2119,7 @@
                  textual changes into the working file. */
               const char *oldrev_str, *newrev_str;
               const svn_wc_entry_t *e;
+              const char *base;
               
               /* Create strings representing the revisions of the
                  old and new text-bases. */
@@ -2083,6 +2153,19 @@
               /* If a conflict happens, then the entry will be
                  marked "Conflicted" and will track either 2 or 3 new
                  temporary fulltext files that resulted. */
+
+              /* Run a dry-run of the merge to see if a conflict will
+                 occur.  This is needed so we can report back to the
+                 client as the changes come in. */
+              base = svn_wc_adm_access_path (adm_access);
+
+              SVN_ERR (svn_wc_merge (svn_path_join (base, txtb, pool),
+                                     svn_path_join (base, tmp_txtb, pool),
+                                     svn_path_join (base, base_name, pool),
+                                     adm_access,
+                                     oldrev_str, newrev_str, ".mine",
+                                     TRUE, &merge_outcome, diff3_cmd,
+                                     pool));
               
             } /* end: working file exists and has mods */
         } /* end: working file has mods */
@@ -2190,27 +2273,17 @@
                                     log_accum->len, NULL, pool),
              apr_psprintf (pool, _("Error writing log for '%s'"), file_path));
 
-  /* The log is ready to run.  Close it and run it! */
-  SVN_ERR (svn_wc__close_adm_file (log_fp, parent_dir, SVN_WC__ADM_LOG,
+  /* The log is done, close it. */
+  SVN_ERR (svn_wc__close_adm_file (log_fp, parent_dir,
+                                   apr_psprintf (pool, SVN_WC__ADM_LOG ".%d",
+                                                 (*log_number)++),
                                    TRUE, /* sync */ pool));
-  SVN_ERR (svn_wc__run_log (adm_access, diff3_cmd, pool));
 
   if (content_state)
     {
-      const svn_wc_entry_t *entry;
-      svn_boolean_t tc, pc;
-
       /* Initialize the state of our returned value. */
       *content_state = svn_wc_notify_state_unknown;
       
-      /* ### There should be a more efficient way of finding out whether
-         or not the file is modified|merged|conflicted.  If the
-         svn_wc__run_log() call above could return a special error code
-         in case of a conflict or something, that would work. */
-
-      SVN_ERR (svn_wc_entry (&entry, file_path, adm_access, TRUE, pool));
-      SVN_ERR (svn_wc_conflicted_p (&tc, &pc, parent_dir, entry, pool));
-      
       /* This is kind of interesting.  Even if no new text was
          installed (i.e., new_text_path was null), we could still
          report a pre-existing conflict state.  Say a file, already
@@ -2218,7 +2291,7 @@
          update.  Then we'll notify that it has text conflicts.  This
          seems okay to me.  I guess.  I dunno.  You? */
 
-      if (tc)
+      if (merge_outcome == svn_wc_merge_conflict)
         *content_state = svn_wc_notify_state_conflicted;
       else if (new_text_path)
         {
@@ -2277,6 +2350,7 @@
   SVN_ERR (install_file (&content_state,
                          &prop_state,
                          adm_access,
+                         &fb->dir_baton->log_number,
                          fb->path,
                          (*eb->target_revision),
                          new_text_path,
@@ -2317,13 +2391,15 @@
 {
   struct edit_baton *eb = edit_baton;
   const char *target_path = svn_path_join (eb->anchor, eb->target, pool);
+  int log_number = 1;
 
   /* If there is a target and that target is missing, then it
      apparently wasn't re-added by the update process, so we'll
      pretend that the editor deleted the entry.  The helper function
      do_entry_deletion() will take care of the necessary steps.  */
   if ((*eb->target) && (svn_wc__adm_missing (eb->adm_access, target_path)))
-    SVN_ERR (do_entry_deletion (eb, eb->anchor, eb->target, pool));
+    SVN_ERR (do_entry_deletion (eb, eb->anchor, eb->target, &log_number,
+                                pool));
 
   /* The editor didn't even open the root; we have to take care of
      some cleanup stuffs. */
@@ -2737,6 +2813,7 @@
 {
   const char *new_URL;
   apr_array_header_t *propchanges;
+  int log_number = 1;
 
   /* Fabricate the anticipated new URL of the target. */
   {
@@ -2757,6 +2834,7 @@
   SVN_ERR (install_file (NULL,
                          NULL,
                          adm_access,
+                         &log_number,
                          dst_path,
                          0,
                          new_text_path,
@@ -2770,5 +2848,7 @@
                          NULL,
                          pool));
 
+  SVN_ERR (svn_wc__run_log (adm_access, NULL, pool));
+
   return SVN_NO_ERROR;
 }
    

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Improving the performance of libsvn_wc for checkouts/updates/switches

Posted by Philip Martin <ph...@codematters.co.uk>.
Josh Pieper <jj...@pobox.com> writes:

> So, how about we execute all the log files in order, then erase them
> in reverse order?  That should take care of any unforseen crashes,
> yes?

Probably.  I think we could also delete log files as we go provided
cleanup did apr_dir_read to identify the lowest numbered log file.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Improving the performance of libsvn_wc for checkouts/updates/switches

Posted by Josh Pieper <jj...@pobox.com>.
Philip Martin wrote:
> Naming them log.1, log.2, etc. and then running them in that order and
> stopping when at the first not-found file will probably do.  Don't
> rely on anything stored in the access baton since that won't work when
> running "svn cleanup".

Yeah, I was actually just going to use that as an optimization when
writing, and the dir_baton makes more sense now that I think about it.

> The file handling needs to be considered carefully--it's important to
> make sure that the lock is never removed while leaving a log.N file
> behind, since it could inadvertantly become live if an update creates
> log files up to log.N-1.

So, how about we execute all the log files in order, then erase them
in reverse order?  That should take care of any unforseen crashes,
yes?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Improving the performance of libsvn_wc for checkouts/updates/switches

Posted by Josh Pieper <jj...@pobox.com>.
Josh Pieper wrote:
> Philip Martin wrote:
> > Josh Pieper <jj...@pobox.com> writes:
> > 
> > > Is there a big reason to expect that if you 'kill -9' a checkout it
> > > will remember anything it did?
> > 
> > It does now, so I would expect it to do so in the future.
> 
> Do you see any correctness or data loss concerns with the approach
> other than this?  If not, then I'll do some benchmarking and see if
> it is worth arguing this point. ;)

Results of the benchmarking are in.  I did a checkout of a freshly
imported mozilla source tree as something of a real world test, plus
the same old synthetic things I had before.

The improvement on mozilla was 10% for BDB, 16% for FSFS.

         BDB       FSFS
       ------------------
Trunk  | 21m5s  |  18m37s
       ------------------
O(n)WC | 18m49s |  15m34s
       ------------------

The synthetic ones are the same as I showed before, about a 4x speedup
once you get over 100 files in a directory.

Does anyone think this scale of performance improvement is worth
having the client have to re-download information if it is 'kill -9'd
or has a system crash?  Note that it will not have to re-download
anything if CTRL-C or 'kill -15' is used.

If the vote is no, I'll shelve the patch and this performance
improvement will still be there for the taking by using the
entries+changes concept.

-Josh

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Improving the performance of libsvn_wc for checkouts/updates/switches

Posted by Philip Martin <ph...@codematters.co.uk>.
Josh Pieper <jj...@pobox.com> writes:

> This sounds workable.  Would it be acceptable to use just a postfixed
> number (log.1, log.2, etc...), and store the last used one in the
> adm_access baton?  When it comes time to run them, just loop from 1 to
> the last used, then reset the last used.  Sorting would be O(n log n),
> which while not bad, isn't strictly necessary since we have a lock on
> the directory.

Naming them log.1, log.2, etc. and then running them in that order and
stopping when at the first not-found file will probably do.  Don't
rely on anything stored in the access baton since that won't work when
running "svn cleanup".

The file handling needs to be considered carefully--it's important to
make sure that the lock is never removed while leaving a log.N file
behind, since it could inadvertantly become live if an update creates
log files up to log.N-1.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Improving the performance of libsvn_wc for checkouts/updates/switches

Posted by Josh Pieper <jj...@pobox.com>.
Philip Martin wrote:
> Josh Pieper <jj...@pobox.com> writes:
> 
> > Do you see any correctness or data loss concerns with the approach
> > other than this?
> 
> No, I don't, but I have a variation to propose.
> 
> Instead of accumulating the log in a single .svn/tmp/log write a
> series of separate log files with distinct, ordered, names.  As each
> file is written it is moved from .svn/tmp to .svn but not run.
> Finally have the code that runs the log file loop over all the log
> files and sync the entries file once at the end.  Now while writing
> multiple log files is probably not quite as efficient as writing a
> single log file, it should still be O(N) rather than O(N^2) since the
> entries file only gets written once.  I'm not quite sure what scheme
> one would use to order the log files, perhaps simply choosing suitable
> filenames and sorting the apr_dir_read output would do.

This sounds workable.  Would it be acceptable to use just a postfixed
number (log.1, log.2, etc...), and store the last used one in the
adm_access baton?  When it comes time to run them, just loop from 1 to
the last used, then reset the last used.  Sorting would be O(n log n),
which while not bad, isn't strictly necessary since we have a lock on
the directory.

-Josh

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Improving the performance of libsvn_wc for checkouts/updates/switches

Posted by Philip Martin <ph...@codematters.co.uk>.
Josh Pieper <jj...@pobox.com> writes:

> Do you see any correctness or data loss concerns with the approach
> other than this?

No, I don't, but I have a variation to propose.

Instead of accumulating the log in a single .svn/tmp/log write a
series of separate log files with distinct, ordered, names.  As each
file is written it is moved from .svn/tmp to .svn but not run.
Finally have the code that runs the log file loop over all the log
files and sync the entries file once at the end.  Now while writing
multiple log files is probably not quite as efficient as writing a
single log file, it should still be O(N) rather than O(N^2) since the
entries file only gets written once.  I'm not quite sure what scheme
one would use to order the log files, perhaps simply choosing suitable
filenames and sorting the apr_dir_read output would do.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Improving the performance of libsvn_wc for checkouts/updates/switches

Posted by Josh Pieper <jj...@pobox.com>.
Philip Martin wrote:
> Josh Pieper <jj...@pobox.com> writes:
> 
> > Is there a big reason to expect that if you 'kill -9' a checkout it
> > will remember anything it did?
> 
> It does now, so I would expect it to do so in the future.

Do you see any correctness or data loss concerns with the approach
other than this?  If not, then I'll do some benchmarking and see if
it is worth arguing this point. ;)

-Josh

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Improving the performance of libsvn_wc for checkouts/updates/switches

Posted by Philip Martin <ph...@codematters.co.uk>.
Josh Pieper <jj...@pobox.com> writes:

> Is there a big reason to expect that if you 'kill -9' a checkout it
> will remember anything it did?

It does now, so I would expect it to do so in the future.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Improving the performance of libsvn_wc for checkouts/updates/switches

Posted by Josh Pieper <jj...@pobox.com>.
Philip Martin wrote:
> Josh Pieper <jj...@pobox.com> writes:
> 
> > Philip Martin wrote:
> >> >> What I observe with the current code is typically
> >> >> 
> >> >>          A   dir
> >> >>          A   dir/file
> >> >>          A   dir/subdir
> >> >>          A   dir/subdir/file
> >> 
> >> Interrupt here. Is dir/subdir versioned?  If not then I don't think
> >> cleanup will find subdir's log file.  I suppose one might be able to
> >> run cleanup repeatedly.
> >
> > If the interrupt is hard, i.e. kill -9, the log files will not have
> > been moved into the live position yet.  That could be a problem as
> > there will now be unversioned obstructions lying around, but no data
> > should be lost.
> >
> > If the log file for the inner directory has been made live, but not
> > the one for the parent directory, cleanup may have a hard time finding
> > it.  If this is a big problem, we could run the parent's log file
> > before recursing into subdirectories, and would still gain performance
> > if there were many text files in a directory.
> 
> But how do we run the log file if it is in .svn/tmp/log?

What I meant is that cleanup would have a hard time finding the
sub-directory, since there is no log-file for the parent directory to
run.  However, as I mention below, this cannot happen.

> At present each log file contains a set of several operations and, in
> general, all the operations must be completed before the wc is
> consistent.  We cannot simply run arbitrary log files from .svn/tmp/
> since such a log file may contain an incomplete set of operations, we
> don't know where the interrupt occurred.

Log files are still only run if they have been made live, nothing from
.svn/tmp would ever be run.

> > If both log-files were made live before the interrupt, I believe
> > cleanup would run the parent directory's logfile first, then use the
> > new state of its entries to recurse and would thus correctly recurse
> > into subdir.
> 
> Not at present, cleanup explicitly runs children first.

Ok, well I looked into this a bit further, and this case could never
occur.  1: The child log file would be made live and run before the
parent log file were made live.  2: The operation that enters the
subdir into the parent entries file (update_editor.c:add_directory)
doesn't use the log file mechanism, but instead modifies the entry
directly.

> > The pending logs are kept in .svn/tmp/log the same as they are
> > currently.  When either
> >    a) a delete operation occurs
> >    b) the editor closes the directory or
> >    c) the cancellation function returns an error
> > the log files are moved into the live position in a depth first
> > fashion and run one at a time.
> 
> That's a step backwards for restartable checkouts.  All the files
> downloaded will be stored somewhere (in .svn/tmp/ perhaps?) until the
> log file is run.  If the log file is in .svn/tmp/log it will not get
> run by cleanup, so there will be no way to promote the downloaded
> files into versioned items.  Restarting a checkout will need to
> download those files again :-(

It is only a step backward if you 'kill -9' the process.  If you use
CTRL-C, the cancellation function route kicks in and the log-files are
run before exiting.  I have restarted checkouts on my machine with no
problem, no having to re-download large portions.  Is there a big
reason to expect that if you 'kill -9' a checkout it will remember
anything it did?

> Remember that the log file itself is not the performance bottleneck,
> (every log file operation gets written once) it's the entries file
> that is the problem.  You are combining log files to solve the entries
> file problem, you must be careful not to break the log file atomic
> guarantees in the process.

I know... but putting all the information into one log file makes the
entries problem pretty trivial, thus I'll try and solve the problems
there first.  So far I believe it still meets all the criteria you've
stated, except for not remembering previous work after a hard kill or
system crash.

-Josh

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Improving the performance of libsvn_wc for checkouts/updates/switches

Posted by Philip Martin <ph...@codematters.co.uk>.
Josh Pieper <jj...@pobox.com> writes:

> Philip Martin wrote:
>> >> What I observe with the current code is typically
>> >> 
>> >>          A   dir
>> >>          A   dir/file
>> >>          A   dir/subdir
>> >>          A   dir/subdir/file
>> 
>> Interrupt here. Is dir/subdir versioned?  If not then I don't think
>> cleanup will find subdir's log file.  I suppose one might be able to
>> run cleanup repeatedly.
>
> If the interrupt is hard, i.e. kill -9, the log files will not have
> been moved into the live position yet.  That could be a problem as
> there will now be unversioned obstructions lying around, but no data
> should be lost.
>
> If the log file for the inner directory has been made live, but not
> the one for the parent directory, cleanup may have a hard time finding
> it.  If this is a big problem, we could run the parent's log file
> before recursing into subdirectories, and would still gain performance
> if there were many text files in a directory.

But how do we run the log file if it is in .svn/tmp/log?

At present each log file contains a set of several operations and, in
general, all the operations must be completed before the wc is
consistent.  We cannot simply run arbitrary log files from .svn/tmp/
since such a log file may contain an incomplete set of operations, we
don't know where the interrupt occurred.

> If both log-files were made live before the interrupt, I believe
> cleanup would run the parent directory's logfile first, then use the
> new state of its entries to recurse and would thus correctly recurse
> into subdir.

Not at present, cleanup explicitly runs children first.

>
>> >>          A   dir/another_file
>> >
>> > Well, it does pass all the tests. :)
>> 
>> We don't really have much in the way of cleanup regression tests.
>> 
>> I'm not really sure how this "pending" log file will interact with
>> cleanup.  At present the log file is written in .svn/tmp/log and moved
>> to .svn/log just before it is run.  Moving to .svn/log is what makes
>> the log file "live", and then it is visible to cleanup.  Where does
>> your log file accumulate?  When does it become live?  How does it work
>> when there are multiple log files being accumulated?
>
> The pending logs are kept in .svn/tmp/log the same as they are
> currently.  When either
>    a) a delete operation occurs
>    b) the editor closes the directory or
>    c) the cancellation function returns an error
> the log files are moved into the live position in a depth first
> fashion and run one at a time.

That's a step backwards for restartable checkouts.  All the files
downloaded will be stored somewhere (in .svn/tmp/ perhaps?) until the
log file is run.  If the log file is in .svn/tmp/log it will not get
run by cleanup, so there will be no way to promote the downloaded
files into versioned items.  Restarting a checkout will need to
download those files again :-(

Remember that the log file itself is not the performance bottleneck,
(every log file operation gets written once) it's the entries file
that is the problem.  You are combining log files to solve the entries
file problem, you must be careful not to break the log file atomic
guarantees in the process.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Improving the performance of libsvn_wc for checkouts/updates/switches

Posted by Josh Pieper <jj...@pobox.com>.
Philip Martin wrote:
> >> What I observe with the current code is typically
> >> 
> >>          A   dir
> >>          A   dir/file
> >>          A   dir/subdir
> >>          A   dir/subdir/file
> 
> Interrupt here. Is dir/subdir versioned?  If not then I don't think
> cleanup will find subdir's log file.  I suppose one might be able to
> run cleanup repeatedly.

If the interrupt is hard, i.e. kill -9, the log files will not have
been moved into the live position yet.  That could be a problem as
there will now be unversioned obstructions lying around, but no data
should be lost.

If the log file for the inner directory has been made live, but not
the one for the parent directory, cleanup may have a hard time finding
it.  If this is a big problem, we could run the parent's log file
before recursing into subdirectories, and would still gain performance
if there were many text files in a directory.

If both log-files were made live before the interrupt, I believe
cleanup would run the parent directory's logfile first, then use the
new state of its entries to recurse and would thus correctly recurse
into subdir.

> >>          A   dir/another_file
> >
> > Well, it does pass all the tests. :)
> 
> We don't really have much in the way of cleanup regression tests.
> 
> I'm not really sure how this "pending" log file will interact with
> cleanup.  At present the log file is written in .svn/tmp/log and moved
> to .svn/log just before it is run.  Moving to .svn/log is what makes
> the log file "live", and then it is visible to cleanup.  Where does
> your log file accumulate?  When does it become live?  How does it work
> when there are multiple log files being accumulated?

The pending logs are kept in .svn/tmp/log the same as they are
currently.  When either
   a) a delete operation occurs
   b) the editor closes the directory or
   c) the cancellation function returns an error
the log files are moved into the live position in a depth first
fashion and run one at a time.

-Josh

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Improving the performance of libsvn_wc for checkouts/updates/switches

Posted by Philip Martin <ph...@codematters.co.uk>.
Josh Pieper <jj...@pobox.com> writes:

> Philip Martin wrote:
>> What I observe with the current code is typically
>> 
>>          A   dir
>>          A   dir/file
>>          A   dir/subdir
>>          A   dir/subdir/file

Interrupt here. Is dir/subdir versioned?  If not then I don't think
cleanup will find subdir's log file.  I suppose one might be able to
run cleanup repeatedly.

>>          A   dir/another_file
>
> Well, it does pass all the tests. :)

We don't really have much in the way of cleanup regression tests.

I'm not really sure how this "pending" log file will interact with
cleanup.  At present the log file is written in .svn/tmp/log and moved
to .svn/log just before it is run.  Moving to .svn/log is what makes
the log file "live", and then it is visible to cleanup.  Where does
your log file accumulate?  When does it become live?  How does it work
when there are multiple log files being accumulated?

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Improving the performance of libsvn_wc for checkouts/updates/switches

Posted by Josh Pieper <jj...@pobox.com>.
Philip Martin wrote:
> What I observe with the current code is typically
> 
>          A   dir
>          A   dir/file
>          A   dir/subdir
>          A   dir/subdir/file
>          A   dir/another_file
> 
> It's true that once subdir is added it gets all its contents before
> any more items are added to dir, however that happens during the dir
> update so "pending" log files will accumulate all the way down the
> heirarchy.  Is that going to be a problem?  Does the dir/subdir
> creation get accumulated in the dir log file?  Can dir/subdir sensibly
> have a log file before the dir log file has been run?

Well, it does pass all the tests. :) Nothing seems to much care that
subdir's log file is run before dir's (which sets up subdir's entry)
is.  From what I can tell, the log files can only have operations that
depend on files within the directory itself, not in it's parent
directories.

-Josh


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Improving the performance of libsvn_wc for checkouts/updates/switches

Posted by Philip Martin <ph...@codematters.co.uk>.
Josh Pieper <jj...@pobox.com> writes:

> Philip Martin wrote:
>> 
>> I don't know.  How does it handle sub-dirs?  Are you going to
>> accumulate "pending" log files for the entire directory hierarchy?
>
> There is one log-file per directory.  They are only kept around as
> long as the editor has the directory open.  Since the editor is
> depth-first, it should at most be the tip directory log file and those
> of the parent paths down to the root.

What I observe with the current code is typically

         A   dir
         A   dir/file
         A   dir/subdir
         A   dir/subdir/file
         A   dir/another_file

It's true that once subdir is added it gets all its contents before
any more items are added to dir, however that happens during the dir
update so "pending" log files will accumulate all the way down the
heirarchy.  Is that going to be a problem?  Does the dir/subdir
creation get accumulated in the dir log file?  Can dir/subdir sensibly
have a log file before the dir log file has been run?

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Improving the performance of libsvn_wc for checkouts/updates/switches

Posted by Josh Pieper <jj...@pobox.com>.
Philip Martin wrote:
> Josh Pieper <jj...@pobox.com> writes:
> 
> > Ok, so instead of the suggested entries+changes approach mentioned in
> > 'entries-caching', I essentially just saved up the log-file until the
> > entire directory modification is done.  In order to let updates be
> > resumable, I had to add a pool cleanup handler that runs the
> > directory's log files when its pool is destroyed prematurely before
> > the entire update has completed.  Is this OK?
> 
> I don't know.  How does it handle sub-dirs?  Are you going to
> accumulate "pending" log files for the entire directory hierarchy?

There is one log-file per directory.  They are only kept around as
long as the editor has the directory open.  Since the editor is
depth-first, it should at most be the tip directory log file and those
of the parent paths down to the root.

-Josh


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Improving the performance of libsvn_wc for checkouts/updates/switches

Posted by Philip Martin <ph...@codematters.co.uk>.
Josh Pieper <jj...@pobox.com> writes:

> Ok, so instead of the suggested entries+changes approach mentioned in
> 'entries-caching', I essentially just saved up the log-file until the
> entire directory modification is done.  In order to let updates be
> resumable, I had to add a pool cleanup handler that runs the
> directory's log files when its pool is destroyed prematurely before
> the entire update has completed.  Is this OK?

I don't know.  How does it handle sub-dirs?  Are you going to
accumulate "pending" log files for the entire directory hierarchy?

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Improving the performance of libsvn_wc for checkouts/updates/switches

Posted by Josh Pieper <jj...@pobox.com>.
Philip Martin wrote:
> Josh Pieper <jj...@pobox.com> writes:
> 
> > Since ghudson is working on turning a O(n^2) import into O(n), I
> > thought I would look at what in the working copy library was causing
> > O(n^2) behavior for checkouts and updates.
> 
> You may like to read notes/entries-caching which contains some of my
> observations about the entries file code.
> 
> 
> > Step 2:
> > Make the log file execution code read the entries file once at the
> > beginning of the log file, then make all subsequent changes to an in
> > memory hash.  The entries file is written out only once, immediately
> > before the log file is completed.  This involved making new versions
> > of svn_wc__entry_modify and svn_wc__get_keywords that would use the
> > in-memory entries hash.
> 
> The current log file execution code already does most (perhaps all) of
> that, take a look at do_sync in svn_wc__entry_modify.  Also the
> entries caching code should ensure that the entries file is read only
> once per access baton.  I believe the checkout/update slow down is due
> to repeated writing of the entries file, at present it is written once
> per log file.

Wow, thanks Philip!  You're right, only my step 1 above is necessary
to realize the performance improvement since entries caching is
already implemented.

Ok, so instead of the suggested entries+changes approach mentioned in
'entries-caching', I essentially just saved up the log-file until the
entire directory modification is done.  In order to let updates be
resumable, I had to add a pool cleanup handler that runs the
directory's log files when its pool is destroyed prematurely before
the entire update has completed.  Is this OK?

-Josh

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Improving the performance of libsvn_wc for checkouts/updates/switches

Posted by Philip Martin <ph...@codematters.co.uk>.
Josh Pieper <jj...@pobox.com> writes:

> Since ghudson is working on turning a O(n^2) import into O(n), I
> thought I would look at what in the working copy library was causing
> O(n^2) behavior for checkouts and updates.

You may like to read notes/entries-caching which contains some of my
observations about the entries file code.


> Step 2:
> Make the log file execution code read the entries file once at the
> beginning of the log file, then make all subsequent changes to an in
> memory hash.  The entries file is written out only once, immediately
> before the log file is completed.  This involved making new versions
> of svn_wc__entry_modify and svn_wc__get_keywords that would use the
> in-memory entries hash.

The current log file execution code already does most (perhaps all) of
that, take a look at do_sync in svn_wc__entry_modify.  Also the
entries caching code should ensure that the entries file is read only
once per access baton.  I believe the checkout/update slow down is due
to repeated writing of the entries file, at present it is written once
per log file.

-- 
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Improving the performance of libsvn_wc for checkouts/updates/switches

Posted by "C. Michael Pilato" <cm...@collab.net>.
Josh Pieper <jj...@pobox.com> writes:

>  * If log file execution is interrupted, none of the entries will have
>    been written to disk.  When cleanup happens, the entire log-file
>    will have to be re-read and executed.

As long as there are no operations in the log files that aren't
completely safe to repeat, this is fine.

> Performance:
> There are still O(n^2) bits in libsvn_fs right now, so checkouts
> aren't linear yet.  However, this does still improve performance right
> now by about a factor of 4.  

I can't think of a good reason not to be happy about a speed increase
of 4x.  And I'm thinking really hard (for a Saturday morning,
anyway). :-)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org