You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Greg Hudson <gh...@MIT.EDU> on 2002/07/26 06:34:43 UTC

Skip-deltas

Since Branko has a working delta combiner now, I decided to see how
skip-deltas would affect performance on the SVN repository.  (My
theory was that the ability to paste together small deltas quickly
before applying them is nice, but having many fewer of them to paste
together for a heavily modified file is even nicer.)

Unfortunately, skip-deltas don't seem very practical at the moment
because it takes too long to walk the predecessor list of a node
revision.  If I could get a quick count of the number of predecessors
of a node-revision, then I think I could produce something practical.
Perhaps I can figure out how to store that information in the
node-revision information in the database.

Repeating Branko's tests, I got (all times in seconds, all
tests run against the repository created by that version of the code
base using a dump of rev 2656 of the SVN repository):

		Trunk	Iss531	Skip1	Skip2	Skip3
		-----	------	-----	-----	-----
svnadmin load	  853	  949	 1492	 1096	 1441
svn co -r500	   44	   42	   38	   37	   41
svn up -r2500	  321	  345	  328	  355	  337
svn up -r1500	  120	  110	  108	  115	  128
Repos db size	55324K	38712K	38744K	51048K	38740K

where Trunk is revision 2714 of the trunk, Iss531 is revision 2714 of
branches/issue-531-dev, and Skip1, Skip2, and Skip3 are various tweaks
on the skip-deltas concept; specifically:

  Skip1: Straight deterministic skip-deltas.  If 2^N divides the
  number of predecessors, we redeltify the 2^Nth predecessor so that
  it is represnted as a delta against the head.

  Skip2: Non-deterministic skip-deltas.  We randomly determine the
  number of predecessors to redeltify (always redeltify the immediate
  parent; 50% of the time redeltify the one two back, 25% of the time
  redeltify the one four back, etc.).  This saves us from doing a full
  walk of the predecessor list for each new node-revision.  Using
  random numbers in libsvn_fs is kind of icky from a code maintenance
  standpoint, though.

  Skip3: Deterministic skip-deltas with a minimum threshold.  If the
  number of predecessors is less than 32 (semi-arbitrary number), we
  don't deltify more than the immediate parent.  This is an attempt to
  avoid the space and re-deltification time penalty of skip deltas
  when the benefit would be small.  Unfortunately, it doesn't seem to
  help much.

Observations:

  * In my tests, none of the implementations yields particularly
    better checkout or update times than the trunk.  (Branko reported
    a factor of two gain from his delta combiner on checking out
    revision 500, but I didn't see that.  Not sure why.)

  * Deterministic skip-deltas don't seem to impose much of a space
    penalty in practice.  Non-deterministic ones do, though.

  * Skip-deltas impose a significant penalty on checkin time (a big
    increase in the time spent in user space, if you go see my raw
    data at the end).  Most of this penalty appears to come from
    having to do a complete walk of the predecessor list for each new
    node-revision.

I've included my implementation of "skip3" for the record.  Since I
don't plan for it to be checked in, no log message.  After the patch,
I've included the raw data for my table above.

Index: ./tree.c
===================================================================
--- ./tree.c
+++ ./tree.c	Fri Jul 26 01:39:47 2002
@@ -1129,6 +1129,75 @@
 };
 
 
+static svn_error_t *
+collect_predecessors (void *baton, dag_node_t *this_node, int *done,
+                      trail_t *trail)
+{
+  apr_array_header_t *predecessors = baton;
+  dag_node_t **slot;
+
+  if (this_node)
+    {
+      slot = apr_array_push (predecessors);
+      *slot = this_node;
+    }
+
+  return SVN_NO_ERROR;
+}
+
+
+/* Redeltify predecessor node-revisions of the one we added.  The idea
+   is to require at most 2*lg(N) deltas to be applied to get to any
+   node-revision in a chain of N predecessors.  We do this using a
+   technique derived from skip lists:
+
+     100% of the time we redeltify our parent
+     50% of the time we also redeltify two predecessors back
+     25% of the time we also redeltify four predecessors back
+     etc.
+
+   This implementation is deterministic, which guarantees the 2*lg(N)
+   upper bound but requires scanning the entire predecessor chain for
+   each commit to count the number of predecessors.  We could,
+   alternatively, choose nondeterministically and avoid the count. */
+static svn_error_t *
+txn_skip_deltify (dag_node_t *node, int props_only, trail_t *trail)
+{
+  apr_array_header_t *predecessors;
+  int stop, i;
+  dag_node_t *prednode;
+
+  predecessors = apr_array_make (trail->pool, 0, sizeof(dag_node_t *));
+  SVN_ERR (svn_fs__dag_walk_predecessors (node, collect_predecessors,
+					  predecessors, trail));
+
+  /* Find the first power of two which doesn't evenly divide the
+     number of predecessors.  This is the marker we will stop
+     just prior to. */
+  for (stop = 2; predecessors->nelts % stop == 0; stop *= 2)
+    ;
+
+  /* Tradeoff: until the number of predecessors is at least 32, don't
+     do skip deltas.  Otherwise we get the cost of skip-deltas up
+     front, whereas the scaling benefits don't show up until the
+     number of predecessors is large. */
+  if (predecessors->nelts < 32)
+    stop = 2;
+
+  /* Now deltify the predecessors one back, two back, four back, and
+     so on, until we hit stop. */
+  for (i = 1; i < stop; i *= 2)
+    {
+      /* elts[0] is the first predecessor back, so we have to subtract
+         one from the offset. */
+      prednode = ((dag_node_t **) predecessors->elts)[i - 1];
+      SVN_ERR (svn_fs__dag_deltify (prednode, node, props_only, trail));
+    }
+
+  return SVN_NO_ERROR;
+}
+
+
 /* Deltify ID's predecessor iff ID is mutable under TXN_ID in FS.  If
    ID is a mutable directory, recurse.  Do this as part of TRAIL. */
 static svn_error_t *
@@ -1141,7 +1210,6 @@
   dag_node_t *node;
   apr_hash_t *entries;
   int is_dir;
-  const svn_fs_id_t *pred_id;
 
   /* Not mutable?  Go no further.  This is safe to do because for
      items in the tree to be mutable, their parent dirs must also be
@@ -1180,14 +1248,8 @@
         }
     }
 
-  /* If we have have a predecessor, deltify it against the current
-     node revision. */
-  if ((pred_id = noderev->predecessor_id))
-    {
-      dag_node_t *pred_node;
-      SVN_ERR (svn_fs__dag_get_node (&pred_node, fs, pred_id, trail));
-      SVN_ERR (svn_fs__dag_deltify (pred_node, node, is_dir, trail));
-    }
+  if (noderev->predecessor_id)
+    SVN_ERR (txn_skip_deltify (node, is_dir, trail));
 
   return SVN_NO_ERROR;
 }


Trunk
-----

time /var/ghudson/svn/subversion/svnadmin/svnadmin load /var/ghudson/svn-test < /var/ghudson/svn-repos-2656.dumpfile
108.710u 20.020s 14:13.29 15.0% 0+0k 0+0io 453pf+0w

time /var/ghudson/svn/subversion/clients/cmdline/svn co -r500 file:///var/ghudson/svn-test /var/ghudson/svn-test-wc
12.880u 2.550s 0:43.69 35.3%    0+0k 0+0io 568pf+0w

time /var/ghudson/svn/subversion/clients/cmdline/svn up -r2500
119.560u 18.710s 5:21.21 43.0%  0+0k 0+0io 582pf+0w

time /var/ghudson/svn/subversion/clients/cmdline/svn up -r1500
23.200u 4.560s 1:59.79 23.1%    0+0k 0+0io 581pf+0w

total 55324


Delta combiner branch, without skip-deltas
------------------------------------------

time /var/ghudson/svnc/subversion/svnadmin/svnadmin load /var/ghudson/svnc2-test < /var/ghudson/svn-repos-2656.dumpfile
112.420u 22.340s 15:48.58 14.2% 0+0k 0+0io 478pf+0w

time /var/ghudson/svnc/subversion/clients/cmdline/svn co -r500 file:///var/ghudson/svnc2-test /var/ghudson/svnc2-test-wc
13.420u 2.450s 0:42.49 37.3%    0+0k 0+0io 568pf+0w

time /var/ghudson/svnc/subversion/clients/cmdline/svn up -r2500
122.450u 19.620s 5:44.68 41.2%  0+0k 0+0io 580pf+0w

time /var/ghudson/svnc/subversion/clients/cmdline/svn up -r1500
23.940u 4.160s 1:50.00 25.5%    0+0k 0+0io 584pf+0w

total 38712


Delta combiner branch, with skip-deltas
---------------------------------------

time /var/ghudson/svnc/subversion/svnadmin/svnadmin load /var/ghudson/svnc-test < /var/ghudson/svn-repos-2656.dumpfile
722.380u 32.530s 24:52.44 50.5% 0+0k 0+0io 491pf+0w

time /var/ghudson/svnc/subversion/clients/cmdline/svn co -r500 file:///var/ghudson/svnc-test /var/ghudson/svnc-test-wc
13.290u 2.360s 0:38.11 41.0%    0+0k 0+0io 581pf+0w

time /var/ghudson/svnc/subversion/clients/cmdline/svn up -r2500
119.040u 18.220s 5:27.78 41.8%  0+0k 0+0io 593pf+0w

time /var/ghudson/svnc/subversion/clients/cmdline/svn up -r1500
23.370u 4.160s 1:48.47 25.3%    0+0k 0+0io 596pf+0w

total 38744


Delta combiner branch, with non-deterministic skip-deltas
---------------------------------------------------------

time /var/ghudson/svnc/subversion/svnadmin/svnadmin load /var/ghudson/svnc3-test < /var/ghudson/svn-repos-2656.dumpfile
187.790u 23.870s 18:15.69 19.3% 0+0k 0+0io 484pf+0w

time /var/ghudson/svnc/subversion/clients/cmdline/svn co -r500 file:///var/ghudson/svnc3-test /var/ghudson/svnc3-test-wc
12.580u 2.280s 0:37.09 40.0%    0+0k 0+0io 575pf+0w

time /var/ghudson/svnc/subversion/clients/cmdline/svn up -r2500
118.930u 18.970s 5:55.42 38.7%  0+0k 0+0io 586pf+0w

time /var/ghudson/svnc/subversion/clients/cmdline/svn up -r1500
22.760u 4.360s 1:55.07 23.5%    0+0k 0+0io 586pf+0w

total 51048


Delta combiner branch, with deterministic limited skip-deltas
-------------------------------------------------------------

time /var/ghudson/svnc/subversion/svnadmin/svnadmin load /var/ghudson/svnc4-test < /var/ghudson/svn-repos-2656.dumpfile
691.360u 30.850s 24:01.12 50.1% 0+0k 0+0io 483pf+0w

time /var/ghudson/svnc/subversion/clients/cmdline/svn co -r500 file:///var/ghudson/svnc4-test /var/ghudson/svnc4-test-wc
13.740u 2.350s 0:41.14 39.1%    0+0k 0+0io 573pf+0w

time /var/ghudson/svnc/subversion/clients/cmdline/svn up -r2500
119.800u 19.490s 5:37.15 41.3%  0+0k 0+0io 584pf+0w

time /var/ghudson/svnc/subversion/clients/cmdline/svn up -r1500
23.250u 4.630s 2:07.66 21.8%    0+0k 0+0io 588pf+0w

total 38740

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org