You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Greg Hudson <gh...@MIT.EDU> on 2004/03/28 05:58:50 UTC

[PATCH] Optionally use deltas in dumpfiles

The problem: the output of "svnadmin dump" is useful as a
platform-independent backup format, as a way of transporting new
revisions across the net, and for other purposes.  But for some
workloads it's too big, because it dumps all the properties and all
the plaintext for each changed node-revision.

The following patch--another thousand-line ghudson weekend code bomb,
because I know you all love them so much--implements a new "svnadmin
dump --deltas" option to cause dumps to be made using deltas.
Although the dump format number has to be bumped to make this happen,
dumps made without the new option will continue to use the old format
number so that they can be read with Subversion 1.0.x.

Non-incremental dumps which start at a rev other than 0 will not use
deltas for the first revision.

After making this change, I'm not a big fan of the way the dumpfile
parser exports the parser interface.  I doubt anyone will ever use
that functionality, so it just makes it harder to add new features to
the dump format.  (Also, what's the point of exporting
svn_repos_get_fs_build_parser, when the only thing you can do with it
is call svn_repos_parse_dumpstream on it, and doing that is exactly
the same as calling svn_repos_load_fs?)

I'm soliciting comments on this patch before I commit it because it's
a significant change, so let me know what you think.  (I'd also like
to look into adding some test cases for the new code, and to get some
performance numbers on a dump of the Subversion repository.)  I've
prepared the patch with -uw for clarity, since some code moved due to
changing function names or the introduction of conditionals.

* notes/fs_dumprestore.txt: Document the version 3 dump format.
* subversion/svnadmin/main.c
  (svnadmin__deltas, options, cmd_table, svnadmin_opt_state, main):
   Add new "deltas" option.
  (subcommand_dump): Use new svn_repos_dump_fs2 function; pass true
   for new use_deltas parameter if --deltas specified.
* subversion/include/svn_repos.h
  (SVN_REPOS_DUMPFILE_FORMAT_VERSION): Bump.  However, we will
   continue to use the previous version except when doing a delta
   dump.
  (SVN_REPOS_DUMPFILE_PROP_DELTA, SVN_REPOS_DUMPFILE_TEXT_DELTA);
   New headers to signify whether the property and text contents are
   deltas.
  (svn_repos_dump_fs2): New variant of svn_repos_dump_fs with a new
   parameter use_deltas.
  (svn_repos_parse_fns2_t): New variant of svn_repos_parse_fns_t
   with added callbacks delete_node_property and apply_textdelta,
   to handle dumpstreams using deltas.
  (svn_repos_parse_dumpstream2): New variant of
   svn_repos_parse_dumpstream, accepts an svn_repos_parse_fns2_t.
  (svn_repos_get_fs_build_parser2): New variant of
   svn_repos_get_fs_build_parser, yields an svn_repos_parse_fns2_t.
  (svn_repos_dump_fs, svn_repos_parse_fns_t,
   svn_repos_parse_dumpstream, svn_repos_get_fs_build_parser):
   Deprecate.
* subversion/libsvn_repos/load.c
  (fns2_from_fs): Utility function to convert old-style callbacks to
   new-style callbacks, for API transition.
  (parse_property_block): Accept new-style callbacks.  For node
   property blocks, recognize "D <keylen>\n<key>\n" to mean a deleted
   property.
  (parse_text_block): Accept new-style callbacks and new parameter
   is_delta.  If is_delta is true, treat the text block as svndiff
   data.
  (svn_repos_parse_dumpstream2): Renamed from
   svn_repos_parse_dumpstream.  Accept new-style callbacks.  If given
   old-style callbacks via deprecated function, reject current-format
   dumpstreams.  Recognize delta property and text blocks.
  (svn_repos_parse_dumpstream): Implement deprecated function using
   fns2_from_fns and svn_repos_parse_dumpstream2.
  (delete_node_property, apply_textdelta): New callbacks to handle
   delta property and text blocks.
  (svn_repos_get_fs_build_parser2): Implement using deprecated
   svn_repos_get_fs_build_parser and fns2_from_fns.  This is sorta
   backwards, but this way we can convert in the same direction as
   svn_repos_parse_dumpstream2 does.
  (svn_repos_load_fs): Use new parsing functions so that we can load
   dumps which use deltas.
* subversion/libsvn_repos/dump.c
  (write_hash_to_stringbuf): Accept an oldhash parameter; if given,
   don't write unchanged props, and do write deleted props.
  (store_delta): New helper function to compute a delta, store it
   to a temporary file, and then rewind and return the file.
  (edit_baton): Add use_deltas field.
  (dump_node): If use_deltas is set, output delta prop and text
   blocks.  Also simplify the determination of whether we must dump
   props and text a bit.
  (get_dump_editor): Add use_deltas flag and store it in the editor
   baton.
  (write_revision_record): Adjust call to write_hash_to_stringbuf.
  (svn_repos_dump_fs2): Renamed from svn_repos_dump_fs.  Accept new
   use_deltas flag.  If not set, use the old format version so that
   dumps can still be read by svn 1.0.x.  Pass use_deltas flag to
   get_dump_editor, except for the first rev of a non-incremental
   dump.
  (svn_repos_dump_fs): Implement deprecated function in terms of
   svn_repos_dump_fs.

Index: notes/fs_dumprestore.txt
===================================================================
--- notes/fs_dumprestore.txt	(revision 9220)
+++ notes/fs_dumprestore.txt	(working copy)
@@ -199,4 +199,36 @@
 
     This should be used to indicate the UUID of the originating repository.
 
+-------------------------------------
 
+SVN DUMPFILE VERSION 3 FORMAT
+
+This format is equivalent to the VERSION 2 format except for the
+following:
+
+1.) The format starts with the new version number of the dump format
+    ("SVN-fs-dump-format-version: 3\n").
+
+2.) There are two new optional headers for node changes:
+
+[Text-delta: true|false]
+[Prop-delta: true|false]
+
+    The default value for these headers is "false".  If the value is
+    set to "true", then the text and property contents will be treated
+    as deltas against the previous contents of the node (as determined
+    by copy history for adds with history, or by the value in the
+    previous revision for changes--just as with commits).
+
+Property deltas have the same format as regular property lists except
+that (1) properties with the same value as in the previous contents of
+the node are not printed, and (2) deleted properties will be written
+out as
+
+D <name length>
+<name>
+
+just as a regular property is printed, but with the "K " changed to a
+"D " and with no value part.
+
+Text deltas are written out as a series of svndiff windows.
Index: subversion/svnadmin/main.c
===================================================================
--- subversion/svnadmin/main.c	(revision 9220)
+++ subversion/svnadmin/main.c	(working copy)
@@ -154,6 +154,7 @@
   { 
     svnadmin__version = SVN_OPT_FIRST_LONGOPT_ID,
     svnadmin__incremental,
+    svnadmin__deltas,
     svnadmin__ignore_uuid,
     svnadmin__force_uuid,
     svnadmin__parent_dir,
@@ -188,6 +189,9 @@
     {"incremental",   svnadmin__incremental, 0,
      "dump incrementally"},
 
+    {"deltas",        svnadmin__deltas, 0,
+     "use deltas in dump output"},
+
     {"bypass-hooks",  svnadmin__bypass_hooks, 0,
      "bypass the repository hook system"},
 
@@ -247,7 +251,7 @@
      "revision trees.  If only LOWER is given, dump that one revision tree.\n"
      "If --incremental is passed, then the first revision dumped will be\n"
      "a diff against the previous revision, instead of the usual fulltext.\n",
-     {'r', svnadmin__incremental, 'q'} },
+     {'r', svnadmin__incremental, svnadmin__deltas, 'q'} },
 
     {"help", subcommand_help, {"?", "h"},
      "usage: svnadmin help [SUBCOMMAND...]\n\n"
@@ -329,6 +333,7 @@
   svn_boolean_t help;                               /* --help or -? */
   svn_boolean_t version;                            /* --version */
   svn_boolean_t incremental;                        /* --incremental */
+  svn_boolean_t use_deltas;                         /* --deltas */
   svn_boolean_t quiet;                              /* --quiet */
   svn_boolean_t bdb_txn_nosync;                     /* --bdb-txn-nosync */
   svn_boolean_t bdb_log_keep;                       /* --bdb-log-keep */
@@ -492,9 +497,10 @@
     SVN_ERR (create_stdio_stream (&stderr_stream,
                                   apr_file_open_stderr, pool));
 
-  SVN_ERR (svn_repos_dump_fs (repos, stdout_stream, stderr_stream,
+  SVN_ERR (svn_repos_dump_fs2 (repos, stdout_stream, stderr_stream,
                               lower, upper, opt_state->incremental,
-                              check_cancel, NULL, pool));
+                               opt_state->use_deltas, check_cancel, NULL,
+                               pool));
 
   return SVN_NO_ERROR;
 }
@@ -919,6 +925,9 @@
       case svnadmin__incremental:
         opt_state.incremental = TRUE;
         break;
+      case svnadmin__deltas:
+        opt_state.use_deltas = TRUE;
+        break;
       case svnadmin__ignore_uuid:
         opt_state.uuid_action = svn_repos_load_uuid_ignore;
         break;
Index: subversion/include/svn_repos.h
===================================================================
--- subversion/include/svn_repos.h	(revision 9220)
+++ subversion/include/svn_repos.h	(working copy)
@@ -823,7 +823,7 @@
 
 /* The RFC822-style headers in our dumpfile format. */
 #define SVN_REPOS_DUMPFILE_MAGIC_HEADER            "SVN-fs-dump-format-version"
-#define SVN_REPOS_DUMPFILE_FORMAT_VERSION           2
+#define SVN_REPOS_DUMPFILE_FORMAT_VERSION           3
 #define SVN_REPOS_DUMPFILE_UUID                      "UUID"
 #define SVN_REPOS_DUMPFILE_CONTENT_LENGTH            "Content-length"
 
@@ -839,6 +839,8 @@
 
 #define SVN_REPOS_DUMPFILE_PROP_CONTENT_LENGTH       "Prop-content-length"
 #define SVN_REPOS_DUMPFILE_TEXT_CONTENT_LENGTH       "Text-content-length"
+#define SVN_REPOS_DUMPFILE_PROP_DELTA                "Prop-delta"
+#define SVN_REPOS_DUMPFILE_TEXT_DELTA                "Text-delta"
 
 /** The different "actions" attached to nodes in the dumpfile. */
 enum svn_node_action
@@ -871,10 +873,35 @@
  * against the previous revision (usually it looks like a full dump of
  * the tree).
  *
+ * If @a use_deltas is @c TRUE, output only node properties which have
+ * changed relative to the previous contents, and output text contents
+ * as svndiff data against the previous contents.  Regardless of how
+ * this flag is set, the first revision of a non-incremental dump will
+ * be done with full plain text.  A dump with @a use_deltas set cannot
+ * be loaded by Subversion 1.0.x.
+ *
  * If @a cancel_func is not @c NULL, it is called periodically with
  * @a cancel_baton as argument to see if the client wishes to cancel
  * the dump.
  */
+svn_error_t *svn_repos_dump_fs2 (svn_repos_t *repos,
+                                 svn_stream_t *dumpstream,
+                                 svn_stream_t *feedback_stream,
+                                 svn_revnum_t start_rev,
+                                 svn_revnum_t end_rev,
+                                 svn_boolean_t incremental,
+                                 svn_boolean_t use_deltas,
+                                 svn_cancel_func_t cancel_func,
+                                 void *cancel_baton,
+                                 apr_pool_t *pool);
+
+
+/**
+ * @deprecated Provided for backward compatibility with the 1.0.0 API.
+ *
+ * Similar to svn_repos_dump_fs2(), but with the @a use_deltas
+ * parameter always set to @c FALSE.
+ */
 svn_error_t *svn_repos_dump_fs (svn_repos_t *repos,
                                 svn_stream_t *dumpstream,
                                 svn_stream_t *feedback_stream,
@@ -924,8 +951,8 @@
                                 apr_pool_t *pool);
 
 
-/** A vtable that is driven by @c svn_repos_parse_dumpstream. */
-typedef struct svn_repos_parse_fns_t
+/** A vtable that is driven by @c svn_repos_parse_dumpstream2. */
+typedef struct svn_repos_parse_fns2_t
 {
   /** The parser has discovered a new revision record within the
    * parsing session represented by @a parse_baton.  All the headers are
@@ -968,6 +995,9 @@
                                      const char *name,
                                      const svn_string_t *value);
 
+  /** For a given @a node_baton, delete property @a name. */
+  svn_error_t *(*delete_node_property) (void *node_baton, const char *name);
+
   /** For a given @a node_baton, remove all properties. */
   svn_error_t *(*remove_node_props) (void *node_baton);
 
@@ -982,6 +1012,19 @@
   svn_error_t *(*set_fulltext) (svn_stream_t **stream,
                                 void *node_baton);
 
+  /** For a given @a node_baton, set @a handler and @a handler_baton
+   * to a window handler and baton capable of receiving a delta
+   * against the node's previous contents.  A NULL window will be
+   * sent to the handler after all the windows are sent.
+   *
+   * If a @c NULL is returned instead of a handler, the vtable is
+   * indicating that no delta is desired, and the parser will not
+   * attempt to send it.
+   */
+  svn_error_t *(*apply_textdelta) (svn_txdelta_window_handler_t *handler,
+                                   void **handler_baton,
+                                   void *node_baton);
+
   /** The parser has reached the end of the current node represented by
    * @a node_baton, it can be freed.
    */
@@ -993,7 +1036,7 @@
    */
   svn_error_t *(*close_revision) (void *revision_baton);
 
-} svn_repos_parser_fns_t;
+} svn_repos_parser_fns2_t;
 
 
 
@@ -1020,8 +1063,8 @@
  * but still allow expansion of the format:  most headers are ignored.
  */
 svn_error_t *
-svn_repos_parse_dumpstream (svn_stream_t *stream,
-                            const svn_repos_parser_fns_t *parse_fns,
+svn_repos_parse_dumpstream2 (svn_stream_t *stream,
+                             const svn_repos_parser_fns2_t *parse_fns,
                             void *parse_baton,
                             svn_cancel_func_t cancel_func,
                             void *cancel_baton,
@@ -1045,6 +1088,71 @@
  *
  */
 svn_error_t *
+svn_repos_get_fs_build_parser2 (const svn_repos_parser_fns2_t **parser,
+                                void **parse_baton,
+                                svn_repos_t *repos,
+                                svn_boolean_t use_history,
+                                enum svn_repos_load_uuid uuid_action,
+                                svn_stream_t *outstream,
+                                const char *parent_dir,
+                                apr_pool_t *pool);
+
+
+/**
+ * @deprecated Provided for backward compatibility with the 1.0.0 API.
+ *
+ * A vtable that is driven by @c svn_repos_parse_dumpstream.  Lacks
+ * the delete_node_property and apply_textdelta callbacks.
+ */
+typedef struct svn_repos_parse_fns_t
+{
+  svn_error_t *(*new_revision_record) (void **revision_baton,
+                                       apr_hash_t *headers,
+                                       void *parse_baton,
+                                       apr_pool_t *pool);
+  svn_error_t *(*uuid_record) (const char *uuid,
+                               void *parse_baton,
+                               apr_pool_t *pool);
+  svn_error_t *(*new_node_record) (void **node_baton,
+                                   apr_hash_t *headers,
+                                   void *revision_baton,
+                                   apr_pool_t *pool);
+  svn_error_t *(*set_revision_property) (void *revision_baton,
+                                         const char *name,
+                                         const svn_string_t *value);
+  svn_error_t *(*set_node_property) (void *node_baton,
+                                     const char *name,
+                                     const svn_string_t *value);
+  svn_error_t *(*remove_node_props) (void *node_baton);
+  svn_error_t *(*set_fulltext) (svn_stream_t **stream,
+                                void *node_baton);
+  svn_error_t *(*close_node) (void *node_baton);
+  svn_error_t *(*close_revision) (void *revision_baton);
+} svn_repos_parser_fns_t;
+
+
+/**
+ * @deprecated Provided for backward compatibility with the 1.0.0 API.
+ *
+ * Similar to svn_repos_parse_dumpstream2, but uses the more limited
+ * svn_repos_parser_fns_t vtable type.
+ */
+svn_error_t *
+svn_repos_parse_dumpstream (svn_stream_t *stream,
+                            const svn_repos_parser_fns_t *parse_fns,
+                            void *parse_baton,
+                            svn_cancel_func_t cancel_func,
+                            void *cancel_baton,
+                            apr_pool_t *pool);
+
+
+/**
+ * @deprecated Provided for backward compatibility with the 1.0.0 API.
+ *
+ * Similar to svn_repos_get_fs_build_parser2, but yields the more
+ * limited svn_repos_parser_fns_t vtable type.
+ */
+svn_error_t *
 svn_repos_get_fs_build_parser (const svn_repos_parser_fns_t **parser,
                                void **parse_baton,
                                svn_repos_t *repos,
@@ -1053,6 +1161,8 @@
                                svn_stream_t *outstream,
                                const char *parent_dir,
                                apr_pool_t *pool);
+
+
 /** @} */
 
 #ifdef __cplusplus
Index: subversion/libsvn_repos/load.c
===================================================================
--- subversion/libsvn_repos/load.c	(revision 9220)
+++ subversion/libsvn_repos/load.c	(working copy)
@@ -76,6 +76,31 @@
 
 /*----------------------------------------------------------------------*/
 
+/** A conversion function between the two vtable types. **/
+static
+svn_repos_parser_fns2_t *fns2_from_fns (const svn_repos_parser_fns_t *fns,
+                                        apr_pool_t *pool)
+{
+  svn_repos_parser_fns2_t *fns2;
+
+  fns2 = apr_palloc (pool, sizeof (*fns2));
+  fns2->new_revision_record = fns->new_revision_record;
+  fns2->uuid_record = fns->uuid_record;
+  fns2->new_node_record = fns->new_node_record;
+  fns2->set_revision_property = fns->set_revision_property;
+  fns2->set_node_property = fns->set_node_property;
+  fns2->remove_node_props = fns->remove_node_props;
+  fns2->set_fulltext = fns->set_fulltext;
+  fns2->close_node = fns->close_node;
+  fns2->close_revision = fns->close_revision;
+  fns2->delete_node_property = NULL;
+  fns2->apply_textdelta = NULL;
+  return fns2;
+}
+
+
+/*----------------------------------------------------------------------*/
+
 /** The parser and related helper funcs **/
 
 
@@ -172,7 +197,7 @@
 static svn_error_t *
 parse_property_block (svn_stream_t *stream,
                       svn_filesize_t content_length,
-                      const svn_repos_parser_fns_t *parse_fns,
+                      const svn_repos_parser_fns2_t *parse_fns,
                       void *record_baton,
                       svn_boolean_t is_node,
                       apr_pool_t *pool)
@@ -276,6 +301,41 @@
           else
             return stream_malformed (); /* didn't find expected 'V' line */
         }
+      else if ((buf[0] == 'D') && (buf[1] == ' '))
+        {
+          apr_size_t numread;
+          char *keybuf;
+          char c;
+          
+          /* Get the length of the key */
+          apr_size_t keylen = (apr_size_t) atoi (buf + 2);
+
+          /* Now read that much into a buffer, + 1 byte for null terminator */
+          keybuf = apr_pcalloc (pool, keylen + 1);
+          numread = keylen;
+          SVN_ERR (svn_stream_read (stream, keybuf, &numread));
+          content_length -= numread;
+          if (numread != keylen)
+            return stream_ran_dry ();
+          keybuf[keylen] = '\0';
+
+          /* Suck up extra newline after key data */
+          numread = 1;
+          SVN_ERR (svn_stream_read (stream, &c, &numread));
+          content_length -= numread;
+          if (numread != 1)
+            return stream_ran_dry ();
+          if (c != '\n') 
+            return stream_malformed ();
+
+          /* We don't expect these in revision properties, and if we see
+             one when we don't have a delete_node_property callback,
+             then we're seeing a v3 feature in a v2 dump. */
+          if (!is_node || !parse_fns->delete_node_property)
+            return stream_malformed ();
+
+          SVN_ERR (parse_fns->delete_node_property (record_baton, keybuf));
+        }
       else
         return stream_malformed (); /* didn't find expected 'K' line */
       
@@ -293,7 +353,8 @@
 static svn_error_t *
 parse_text_block (svn_stream_t *stream,
                   svn_filesize_t content_length,
-                  const svn_repos_parser_fns_t *parse_fns,
+                  svn_boolean_t is_delta,
+                  const svn_repos_parser_fns2_t *parse_fns,
                   void *record_baton,
                   char *buffer,
                   apr_size_t buflen,
@@ -302,8 +363,20 @@
   svn_stream_t *text_stream = NULL;
   apr_size_t num_to_read, rlen, wlen;
   
+  if (is_delta)
+    {
+      svn_txdelta_window_handler_t wh;
+      void *whb;
+
+      SVN_ERR (parse_fns->apply_textdelta (&wh, &whb, record_baton));
+      if (wh)
+        text_stream = svn_txdelta_parse_svndiff (wh, whb, TRUE, pool);
+    }
+  else
+    {
   /* Get a stream to which we can push the data. */
   SVN_ERR (parse_fns->set_fulltext (&text_stream, record_baton));
+    }
 
   /* If there are no contents to read, just write an empty buffer
      through our callback. */
@@ -384,8 +457,8 @@
 
 /* The Main Parser Logic */
 svn_error_t *
-svn_repos_parse_dumpstream (svn_stream_t *stream,
-                            const svn_repos_parser_fns_t *parse_fns,
+svn_repos_parse_dumpstream2 (svn_stream_t *stream,
+                             const svn_repos_parser_fns2_t *parse_fns,
                             void *parse_baton,
                             svn_cancel_func_t cancel_func,
                             void *cancel_baton,
@@ -409,6 +482,14 @@
      number, and a blank line. */
   SVN_ERR (parse_format_version (linebuf->data, &version));
 
+  /* If we were called from svn_repos_parse_dumpstream(), the
+     callbacks to handle delta contents will be NULL, so we have to
+     reject dumpfiles with the current version. */
+  if (version == SVN_REPOS_DUMPFILE_FORMAT_VERSION
+      && (!parse_fns->delete_node_property || !parse_fns->apply_textdelta))
+    return svn_error_createf (SVN_ERR_STREAM_MALFORMED_DATA, NULL,
+                              "Unsupported dumpfile version: %d", version);
+
   /* A dumpfile "record" is defined to be a header-block of
      rfc822-style headers, possibly followed by a content-block.
 
@@ -509,8 +590,14 @@
                                   SVN_REPOS_DUMPFILE_PROP_CONTENT_LENGTH,
                                   APR_HASH_KEY_STRING)))
         {
-          /* First, remove all node properties. */
-          if (found_node)
+          const char *delta = apr_hash_get (headers,
+                                            SVN_REPOS_DUMPFILE_PROP_DELTA,
+                                            APR_HASH_KEY_STRING);
+          svn_boolean_t is_delta = (delta && strcmp (delta, "true") == 0);
+
+          /* First, remove all node properties, unless this is a delta
+             property block. */
+          if (found_node && !is_delta)
             SVN_ERR (parse_fns->remove_node_props (node_baton));
 
           SVN_ERR (parse_property_block (stream,
@@ -526,8 +613,14 @@
                                   SVN_REPOS_DUMPFILE_TEXT_CONTENT_LENGTH,
                                   APR_HASH_KEY_STRING)))
         {
+          const char *delta = apr_hash_get (headers,
+                                            SVN_REPOS_DUMPFILE_TEXT_DELTA,
+                                            APR_HASH_KEY_STRING);
+          svn_boolean_t is_delta = (delta && strcmp (delta, "true") == 0);
+
           SVN_ERR (parse_text_block (stream, 
                                      svn__atoui64 (valstr),
+                                     is_delta,
                                      parse_fns,
                                      found_node ? node_baton : rev_baton,
                                      buffer, 
@@ -558,6 +651,19 @@
 }
 
 
+svn_error_t *
+svn_repos_parse_dumpstream (svn_stream_t *stream,
+                            const svn_repos_parser_fns_t *parse_fns,
+                            void *parse_baton,
+                            svn_cancel_func_t cancel_func,
+                            void *cancel_baton,
+                            apr_pool_t *pool)
+{
+  svn_repos_parser_fns2_t *fns2 = fns2_from_fns (parse_fns, pool);
+
+  return svn_repos_parse_dumpstream2 (stream, fns2, parse_baton,
+                                      cancel_func, cancel_baton, pool);
+}
 
 /*----------------------------------------------------------------------*/
 
@@ -873,6 +979,20 @@
 
 
 static svn_error_t *
+delete_node_property (void *baton,
+                      const char *name)
+{
+  struct node_baton *nb = baton;
+  struct revision_baton *rb = nb->rb;
+
+  SVN_ERR (svn_fs_change_node_prop (rb->txn_root, nb->path,
+                                    name, NULL, nb->pool));
+  
+  return SVN_NO_ERROR;
+}
+
+
+static svn_error_t *
 remove_node_props (void *baton)
 {
   struct node_baton *nb = baton;
@@ -901,6 +1021,21 @@
 
 
 static svn_error_t *
+apply_textdelta (svn_txdelta_window_handler_t *handler,
+                 void **handler_baton,
+                 void *node_baton)
+{
+  struct node_baton *nb = node_baton;
+  struct revision_baton *rb = nb->rb;
+
+  return svn_fs_apply_textdelta (handler, handler_baton,
+                                 rb->txn_root, nb->path,
+                                 NULL, nb->md5_checksum,
+                                 nb->pool);
+}
+
+
+static svn_error_t *
 set_fulltext (svn_stream_t **stream,
               void *node_baton)
 {
@@ -995,6 +1130,33 @@
 
 
 svn_error_t *
+svn_repos_get_fs_build_parser2 (const svn_repos_parser_fns2_t **callbacks,
+                                void **parse_baton,
+                                svn_repos_t *repos,
+                                svn_boolean_t use_history,
+                                enum svn_repos_load_uuid uuid_action,
+                                svn_stream_t *outstream,
+                                const char *parent_dir,
+                                apr_pool_t *pool)
+{
+  const svn_repos_parser_fns_t *fns;
+  svn_repos_parser_fns2_t *parser;
+
+  /* Fetch the old-style vtable and baton, convert the vtable to a
+   * new-style vtable, and set the new callbacks. */
+  SVN_ERR (svn_repos_get_fs_build_parser (&fns, parse_baton, repos,
+                                          use_history, uuid_action, outstream,
+                                          parent_dir, pool));
+  parser = fns2_from_fns (fns, pool);
+  parser->delete_node_property = delete_node_property;
+  parser->apply_textdelta = apply_textdelta;
+  *callbacks = parser;
+  return SVN_NO_ERROR;
+}
+
+
+
+svn_error_t *
 svn_repos_get_fs_build_parser (const svn_repos_parser_fns_t **parser_callbacks,
                                void **parse_baton,
                                svn_repos_t *repos,
@@ -1041,12 +1203,12 @@
                    void *cancel_baton,
                    apr_pool_t *pool)
 {
-  const svn_repos_parser_fns_t *parser;
+  const svn_repos_parser_fns2_t *parser;
   void *parse_baton;
   
   /* This is really simple. */  
 
-  SVN_ERR (svn_repos_get_fs_build_parser (&parser, &parse_baton,
+  SVN_ERR (svn_repos_get_fs_build_parser2 (&parser, &parse_baton,
                                           repos,
                                           TRUE, /* look for copyfrom revs */
                                           uuid_action,
@@ -1054,7 +1216,7 @@
                                           parent_dir,
                                           pool));
 
-  SVN_ERR (svn_repos_parse_dumpstream (dumpstream, parser, parse_baton,
+  SVN_ERR (svn_repos_parse_dumpstream2 (dumpstream, parser, parse_baton,
                                        cancel_func, cancel_baton, pool));
 
   return SVN_NO_ERROR;
Index: subversion/libsvn_repos/dump.c
===================================================================
--- subversion/libsvn_repos/dump.c	(revision 9220)
+++ subversion/libsvn_repos/dump.c	(working copy)
@@ -33,10 +33,14 @@
 
 /** A variant of our hash-writing routine in libsvn_subr;  this one
     writes to a stringbuf instead of a file, and outputs PROPS-END
-    instead of END. **/
+    instead of END.  If OLDHASH is not NULL, then only properties
+    which vary from OLDHASH will be written, and properties which
+    exist only in OLDHASH will be written out with "D" entries
+    (like "K" entries but with no corresponding value). **/
 
 static void
 write_hash_to_stringbuf (apr_hash_t *hash, 
+                         apr_hash_t *oldhash,
                          svn_stringbuf_t **strbuf,
                          apr_pool_t *pool)
 {
@@ -55,6 +59,16 @@
 
       /* Get this key and val. */
       apr_hash_this (this, &key, &keylen, &val);
+      value = val;
+
+      /* Don't output properties equal to the ones in oldhash, if present. */
+      if (oldhash)
+        {
+          svn_string_t *oldvalue = apr_hash_get (oldhash, key, keylen);
+
+          if (oldvalue && svn_string_compare (value, oldvalue))
+            continue;
+        }
 
       /* Output name length, then name. */
 
@@ -68,7 +82,6 @@
       svn_stringbuf_appendbytes (*strbuf, "\n", 1);
 
       /* Output value length, then value. */
-      value = val;
 
       svn_stringbuf_appendbytes (*strbuf, "V ", 2);
 
@@ -80,10 +93,77 @@
       svn_stringbuf_appendbytes (*strbuf, "\n", 1);
     }
 
+  if (oldhash)
+    {
+      /* Output a "D " entry for each property in oldhash but not hash. */
+      for (this = apr_hash_first (pool, oldhash); this;
+           this = apr_hash_next (this))
+        {
+          const void *key;
+          void *val;
+          apr_ssize_t keylen;
+          int bytes_used;
+
+          /* Get this key and val. */
+          apr_hash_this (this, &key, &keylen, &val);
+
+          /* Only output values deleted in hash. */
+          if (apr_hash_get (hash, key, keylen))
+            continue;
+
+          /* Output name length, then name. */
+
+          svn_stringbuf_appendbytes (*strbuf, "D ", 2);
+
+          sprintf (buf, "%" APR_SSIZE_T_FMT "%n", keylen, &bytes_used);
+          svn_stringbuf_appendbytes (*strbuf, buf, bytes_used);
+          svn_stringbuf_appendbytes (*strbuf, "\n", 1);
+
+          svn_stringbuf_appendbytes (*strbuf, (const char *) key, keylen);
+          svn_stringbuf_appendbytes (*strbuf, "\n", 1);
+        }
+    }
   svn_stringbuf_appendbytes (*strbuf, "PROPS-END\n", 10);
 }
 
 
+/* Compute the delta between OLDROOT/OLDPATH and NEWROOT/NEWPATH and
+   store it into a new temporary file *TEMPFILE.  Record the length of
+   the temporary file in *LEN, and rewind the file before returning. */
+static svn_error_t *
+store_delta (apr_file_t **tempfile, svn_filesize_t *len,
+             svn_fs_root_t *oldroot, const char *oldpath,
+             svn_fs_root_t *newroot, const char *newpath, apr_pool_t *pool)
+{
+  const char *tempdir, *name;
+  svn_stream_t *temp_stream;
+  apr_off_t offset = 0;
+  svn_txdelta_stream_t *delta_stream;
+  svn_txdelta_window_handler_t wh;
+  void *whb;
+
+  /* Create a temporary file and open a stream to it. */
+  SVN_ERR (svn_io_temp_dir (&tempdir, pool));
+  SVN_ERR (svn_io_open_unique_file (tempfile, &name,
+                                    apr_psprintf (pool, "%s/dump", tempdir),
+                                    ".tmp", TRUE, pool));
+  temp_stream = svn_stream_from_aprfile (*tempfile, pool);
+
+  /* Compute the delta and send it to the temporary file. */
+  SVN_ERR (svn_fs_get_file_delta_stream (&delta_stream, oldroot, oldpath,
+                                         newroot, newpath, pool));
+  svn_txdelta_to_svndiff (temp_stream, pool, &wh, &whb);
+  SVN_ERR (svn_txdelta_send_txstream (delta_stream, wh, whb, pool));
+
+  /* Get the length of the temporary file and rewind it. */
+  SVN_ERR (svn_io_file_seek (*tempfile, APR_CUR, &offset, pool));
+  *len = offset;
+  offset = 0;
+  SVN_ERR (svn_io_file_seek (*tempfile, APR_SET, &offset, pool));
+  return SVN_NO_ERROR;
+}
+
+
 /*----------------------------------------------------------------------*/
 
 /** An editor which dumps node-data in 'dumpfile format' to a file. **/
@@ -106,6 +186,9 @@
   svn_fs_root_t *fs_root;
   svn_revnum_t current_rev;
 
+  /* True if dumped nodes should output deltas instead of full text. */
+  svn_boolean_t use_deltas;
+
   /* The first revision dumped in this dumpstream. */
   svn_revnum_t oldest_dumped_rev;
 
@@ -223,12 +306,13 @@
            apr_pool_t *pool)
 {
   svn_stringbuf_t *propstring;
-  apr_hash_t *prophash;
-  svn_filesize_t textlen = 0, content_length = 0;
-  apr_size_t proplen = 0, len;
+  svn_filesize_t content_length = 0;
+  apr_size_t len;
   svn_boolean_t must_dump_text = FALSE, must_dump_props = FALSE;
   const char *compare_path = path;
   svn_revnum_t compare_rev = eb->current_rev - 1;
+  svn_fs_root_t *compare_root = NULL;
+  apr_file_t *delta_file = NULL;
 
   /* Write out metadata headers for this file node. */
   if (eb->stream)
@@ -257,9 +341,6 @@
 
   if (action == svn_node_action_change)
     {
-      svn_fs_root_t *compare_root;
-      svn_boolean_t text_changed = FALSE, props_changed = FALSE;
-
       if (eb->stream)
         SVN_ERR (svn_stream_printf (eb->stream, pool,
                                     SVN_REPOS_DUMPFILE_NODE_ACTION
@@ -270,17 +351,13 @@
                                      svn_fs_root_fs (eb->fs_root),
                                      compare_rev, pool));
       
-      SVN_ERR (svn_fs_props_changed (&props_changed,
+      SVN_ERR (svn_fs_props_changed (&must_dump_props,
                                      compare_root, compare_path,
                                      eb->fs_root, path, pool));
       if (kind == svn_node_file)
-        SVN_ERR (svn_fs_contents_changed (&text_changed,
+        SVN_ERR (svn_fs_contents_changed (&must_dump_text,
                                           compare_root, compare_path,
                                           eb->fs_root, path, pool));
-      if (props_changed)
-        must_dump_props = TRUE;
-      if (text_changed)
-        must_dump_text = TRUE;        
     }
   else if (action == svn_node_action_replace)
     {
@@ -345,9 +422,6 @@
         }
       else
         {
-          svn_fs_root_t *src_root;
-          svn_boolean_t text_changed = FALSE, props_changed = FALSE;
-
           if ((cmp_rev < eb->oldest_dumped_rev)
               && eb->feedback_stream)
             svn_stream_printf 
@@ -366,23 +440,19 @@
                                         ": %s\n",
                                         cmp_rev, cmp_path));
 
-          SVN_ERR (svn_fs_revision_root (&src_root, 
+          SVN_ERR (svn_fs_revision_root (&compare_root, 
                                          svn_fs_root_fs (eb->fs_root),
                                          compare_rev, pool));
 
           /* Need to decide if the copied node had any extra textual or
              property mods as well.  */
-          SVN_ERR (svn_fs_props_changed (&props_changed,
-                                         src_root, compare_path,
+          SVN_ERR (svn_fs_props_changed (&must_dump_props,
+                                         compare_root, compare_path,
                                          eb->fs_root, path, pool));
           if (kind == svn_node_file)
-            SVN_ERR (svn_fs_contents_changed (&text_changed,
-                                              src_root, compare_path,
+            SVN_ERR (svn_fs_contents_changed (&must_dump_text,
+                                              compare_root, compare_path,
                                               eb->fs_root, path, pool));
-          if (props_changed)
-            must_dump_props = TRUE;
-          if (text_changed)
-            must_dump_text = TRUE;
           
           /* ### someday write a node-copyfrom-source-checksum. */
         }
@@ -410,8 +480,21 @@
      property values here. */
   if (must_dump_props)
     {
+      apr_hash_t *prophash, *oldhash = NULL;
+      apr_size_t proplen;
+
       SVN_ERR (svn_fs_node_proplist (&prophash, eb->fs_root, path, pool));
-      write_hash_to_stringbuf (prophash, &propstring, pool);
+      if (eb->use_deltas && compare_root)
+        {
+          /* Fetch the old property hash to diff against and output a header
+             saying that our property contents are a delta. */
+          SVN_ERR (svn_fs_node_proplist (&oldhash, compare_root, compare_path,
+                                         pool));
+          if (eb->stream)
+            SVN_ERR (svn_stream_printf (eb->stream, pool,
+                                        SVN_REPOS_DUMPFILE_PROP_DELTA ": true\n"));
+        }
+      write_hash_to_stringbuf (prophash, oldhash, &propstring, pool);
       proplen = propstring->len;
       content_length += proplen;
       if (eb->stream)
@@ -426,15 +509,34 @@
     {
       unsigned char md5_digest[APR_MD5_DIGESTSIZE];
       const char *hex_digest;
+      svn_filesize_t textlen;
 
+      if (eb->use_deltas && compare_root)
+        {
+          /* Compute the text delta now and write it into a temporary
+             file, so that we can find its length.  Output a header
+             saying our text contents are a delta. */
+          SVN_ERR (store_delta (&delta_file, &textlen, compare_root,
+                                compare_path, eb->fs_root, path, pool));
+          if (eb->stream)
+            SVN_ERR (svn_stream_printf (eb->stream, pool,
+                                        SVN_REPOS_DUMPFILE_TEXT_DELTA
+                                        ": true\n"));
+        }
+      else
+        {
+          /* Just fetch the length of the file. */
       SVN_ERR (svn_fs_file_length (&textlen, eb->fs_root, path, pool));
+        }
+
       content_length += textlen;
-      SVN_ERR (svn_fs_file_md5_checksum (md5_digest, eb->fs_root, path, pool));
-      hex_digest = svn_md5_digest_to_cstring (md5_digest, pool);
       if (eb->stream)
         SVN_ERR (svn_stream_printf (eb->stream, pool,
                                     SVN_REPOS_DUMPFILE_TEXT_CONTENT_LENGTH 
                                     ": %" SVN_FILESIZE_T_FMT "\n", textlen));
+
+      SVN_ERR (svn_fs_file_md5_checksum (md5_digest, eb->fs_root, path, pool));
+      hex_digest = svn_md5_digest_to_cstring (md5_digest, pool);
       if (hex_digest && eb->stream)
         SVN_ERR (svn_stream_printf (eb->stream, pool,
                                     SVN_REPOS_DUMPFILE_TEXT_CONTENT_CHECKSUM 
@@ -463,7 +565,11 @@
     {
       svn_stream_t *contents;
 
+      if (delta_file)
+        contents = svn_stream_from_aprfile (delta_file, pool);
+      else
       SVN_ERR (svn_fs_file_contents (&contents, eb->fs_root, path, pool));
+
       SVN_ERR (svn_stream_copy (contents, eb->stream, pool));
     }
   
@@ -707,6 +813,7 @@
                  svn_stream_t *stream,
                  svn_stream_t *feedback_stream,
                  svn_revnum_t oldest_dumped_rev,
+                 svn_boolean_t use_deltas,
                  apr_pool_t *pool)
 {
   /* Allocate an edit baton to be stored in every directory baton.
@@ -723,6 +830,7 @@
   eb->path = apr_pstrdup (pool, root_path);
   SVN_ERR (svn_fs_revision_root (&(eb->fs_root), fs, to_rev, pool));
   eb->current_rev = to_rev;
+  eb->use_deltas = use_deltas;
 
   /* Set up the editor. */
   dump_editor->open_root = open_root;
@@ -784,7 +892,7 @@
                     datevalue);
     }
 
-  write_hash_to_stringbuf (props, &encoded_prophash, pool);
+  write_hash_to_stringbuf (props, NULL, &encoded_prophash, pool);
 
   /* ### someday write a revision-content-checksum */
 
@@ -816,12 +924,13 @@
 
 /* The main dumper. */
 svn_error_t *
-svn_repos_dump_fs (svn_repos_t *repos,
+svn_repos_dump_fs2 (svn_repos_t *repos,
                    svn_stream_t *stream,
                    svn_stream_t *feedback_stream,
                    svn_revnum_t start_rev,
                    svn_revnum_t end_rev,
                    svn_boolean_t incremental,
+                    svn_boolean_t use_deltas,
                    svn_cancel_func_t cancel_func,
                    void *cancel_baton,
                    apr_pool_t *pool)
@@ -867,9 +976,15 @@
      magic header followed by a dumpfile format version. */
   if (stream)
     {
+      int version = SVN_REPOS_DUMPFILE_FORMAT_VERSION;
+
+      /* If we're not using deltas, use the previous version, for
+         compatibility with svn 1.0.x. */
+      if (!use_deltas)
+        version--;
       SVN_ERR (svn_stream_printf (stream, pool, 
                                   SVN_REPOS_DUMPFILE_MAGIC_HEADER ": %d\n\n", 
-                                  SVN_REPOS_DUMPFILE_FORMAT_VERSION));
+                                  version));
       SVN_ERR (svn_stream_printf (stream, pool, SVN_REPOS_DUMPFILE_UUID
                                   ": %s\n\n", uuid));
     }
@@ -879,6 +994,7 @@
     {
       svn_revnum_t from_rev, to_rev;
       svn_fs_root_t *to_root;
+      svn_boolean_t use_deltas_for_rev;
 
       /* Check for cancellation. */
       if (cancel_func)
@@ -914,10 +1030,13 @@
       /* Write the revision record. */
       SVN_ERR (write_revision_record (stream, fs, to_rev, subpool));
 
-      /* The editor which dumps nodes to a file. */
-      SVN_ERR (get_dump_editor (&dump_editor, &dump_edit_baton, 
-                                fs, to_rev, "/", stream, feedback_stream,
-                                start_rev, subpool));
+      /* Fetch the editor which dumps nodes to a file.  Regardless of
+         what we've been told, don't use deltas for the first rev of a
+         non-incremental dump. */
+      use_deltas_for_rev = use_deltas && (incremental || i != start_rev);
+      SVN_ERR (get_dump_editor (&dump_editor, &dump_edit_baton, fs, to_rev,
+                                "/", stream, feedback_stream, start_rev,
+                                use_deltas_for_rev, subpool));
 
       /* Drive the editor in one way or another. */
       SVN_ERR (svn_fs_revision_root (&to_root, fs, to_rev, subpool));
@@ -959,3 +1078,19 @@
 
   return SVN_NO_ERROR;
 }
+
+svn_error_t *
+svn_repos_dump_fs (svn_repos_t *repos,
+                   svn_stream_t *stream,
+                   svn_stream_t *feedback_stream,
+                   svn_revnum_t start_rev,
+                   svn_revnum_t end_rev,
+                   svn_boolean_t incremental,
+                   svn_cancel_func_t cancel_func,
+                   void *cancel_baton,
+                   apr_pool_t *pool)
+{
+  return svn_repos_dump_fs2 (repos, stream, feedback_stream, start_rev,
+                             end_rev, incremental, FALSE, cancel_func,
+                             cancel_baton, pool);
+}

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by Martin Furter <mf...@rola.ch>.
On 30 Mar 2004 kfogel@collab.net wrote:

> Mark Benedetto King <mb...@lowlatency.com> writes:
> > We might want to provide a richer interface, then; I know of one person
> > who had an itch it couldn't help scratch: parsing two files at once in
> > a single thread (in order to merge dumpfiles).

That was me.

> Sure -- it's just a simple matter of defining "richer" and coming up
> with that interface, then.  Far be it from me to stand in the way... :-).
> 
> (I don't mean to sound flip; it's just that I'm approaching it from a
> "Is there a path from idea to action?" viewpoint.  For any interface
> in Subversion, there are probably an infinite number of itches it
> doesn't scratch -- the question is how important are those itches, how
> hard are they to scratch, and is this the best way to scratch them?)

The current implementation seems to be as simple as possible and doing
it's job. So don't waste time for a feature only one person requested
(except you have other reasons to do so). 

I've implemented my own code for reading and writing dump files and a
the merge tool.

It's implemented in 3 nice python classes and they are pretty
fast. Merging 2 dump files was much faster than loading them
into a repository over file://.
You can find them at http://queen.borg.ch:81/subversion/

Sorry for the advertisement and noise.
Martin


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by kf...@collab.net.
Mark Benedetto King <mb...@lowlatency.com> writes:
> We might want to provide a richer interface, then; I know of one person
> who had an itch it couldn't help scratch: parsing two files at once in
> a single thread (in order to merge dumpfiles).

Sure -- it's just a simple matter of defining "richer" and coming up
with that interface, then.  Far be it from me to stand in the way... :-).

(I don't mean to sound flip; it's just that I'm approaching it from a
"Is there a path from idea to action?" viewpoint.  For any interface
in Subversion, there are probably an infinite number of itches it
doesn't scratch -- the question is how important are those itches, how
hard are they to scratch, and is this the best way to scratch them?)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by Mark Benedetto King <mb...@lowlatency.com>.
On Mon, Mar 29, 2004 at 10:06:07AM -0600, kfogel@collab.net wrote:
> 
> Well, you probably knew all this -- I'm really responding so others
> see the answer to your question.  I think someday we (or someone) will
> be very glad we exported this interface, though the only evidence I
> can offer for this conjecture is our experience with cvs2svn.
> 

We might want to provide a richer interface, then; I know of one person
who had an itch it couldn't help scratch: parsing two files at once in
a single thread (in order to merge dumpfiles).

--ben


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by kf...@collab.net.
Greg Hudson <gh...@MIT.EDU> writes:
> After making this change, I'm not a big fan of the way the dumpfile
> parser exports the parser interface.  I doubt anyone will ever use
> that functionality, so it just makes it harder to add new features to
> the dump format.  

[Responding to this bit separately from the patch itself.]

The point of defining the parser interface was so that svn dumpfile
format could be easily used as an export format to other systems.
Offering a programmatic export format is even better than offering a
static one.  CVS essentially offers neither, and anyone working on
cvs2svn can testify to the pain :-).

One could respond: "It's easy to whip up a parser for our simple
dumpfile format, so why do we need to publicize the interface?"  But
it's the sort of thing where people are likely to make little
mistakes.  We shouldn't ask them to reinvent the wheel, when we can
just offer the wheel we had to implement for ourselves anyway.

Well, you probably knew all this -- I'm really responding so others
see the answer to your question.  I think someday we (or someone) will
be very glad we exported this interface, though the only evidence I
can offer for this conjecture is our experience with cvs2svn.

> (Also, what's the point of exporting
> svn_repos_get_fs_build_parser, when the only thing you can do with it
> is call svn_repos_parse_dumpstream on it, and doing that is exactly
> the same as calling svn_repos_load_fs?)

That I can't answer :-(.

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Sunday, March 28, 2004 12:58 AM -0500 Greg Hudson <gh...@MIT.EDU> 
wrote:

> The problem: the output of "svnadmin dump" is useful as a
> platform-independent backup format, as a way of transporting new
> revisions across the net, and for other purposes.  But for some
> workloads it's too big, because it dumps all the properties and all
> the plaintext for each changed node-revision.

Oh, dude, you rock.  ;-)

I have one repository that has a 1M binary file that has minor changes 
every revision and we're at around 1k revisions.  The dumps are, well, 
about 1,000 times bigger than the repository itself.  =)  Woot.  -- justin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by Greg Hudson <gh...@MIT.EDU>.
On Sun, 2004-03-28 at 03:02, Edmund Horner wrote:
> 1. If --deltas is specified and a path is being added without history,
> will it be stored as a delta against empty, or just as fulltext?  (I
> personally would prefer fulltext...)

As fulltext, though the loader should be able to handle a delta against
any.  It wouldn't be too hard to make the dumper do a delta against any,
but it didn't come naturally.  (Also, the binary DB stores plaintext for
the head revision of files, so it doesn't seem like it should hurt too
much space-wise to use plaintext for the first rev of each added file. 
Though I imagine it makes a difference for big imports....)

> 2. Does --deltas imply --incremental?  Because if it doesn't, it's going
> to be emitting do-nothing deltas for things that haven't changed in that
> first revision, right?  Sounds as if it might as well leave them out.

I can't quite figure out where you were coming from here, but you seem
to have been answered successfully in other mail.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by Edmund Horner <ch...@chrysophylax.cjb.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Vincent Lefevre wrote:

| On 2004-03-28 20:02:28 +1200, Edmund Horner wrote:
|
|>2. Does --deltas imply --incremental?  Because if it doesn't, it's going
|>to be emitting do-nothing deltas for things that haven't changed in that
|>first revision, right?  Sounds as if it might as well leave them out.
|
|
| Greg said: "Non-incremental dumps which start at a rev other than 0
| will not use deltas for the first revision."
|
| Doesn't this answer your question?

To be honest, I'm not sure.

.
.
.

Actually having read the whole thing several times I think I understand now.

Thanks.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFAZpqlEbvImmpUq7gRAuR1AKCAJbtmjRHfPItIi/W7oXE7HSqfaACgyH+U
I8C4yE12klfykbH2PwATf3g=
=3kyu
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by Vincent Lefevre <vi...@vinc17.org>.
On 2004-03-28 20:02:28 +1200, Edmund Horner wrote:
> 2. Does --deltas imply --incremental?  Because if it doesn't, it's going
> to be emitting do-nothing deltas for things that haven't changed in that
> first revision, right?  Sounds as if it might as well leave them out.

Greg said: "Non-incremental dumps which start at a rev other than 0
will not use deltas for the first revision."

Doesn't this answer your question?

-- 
Vincent Lefèvre <vi...@vinc17.org> - Web: <http://www.vinc17.org/> - 100%
validated (X)HTML - Acorn Risc PC, Yellow Pig 17, Championnat International
des Jeux Mathématiques et Logiques, TETRHEX, etc.
Work: CR INRIA - computer arithmetic / SPACES project at LORIA

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by Edmund Horner <ch...@chrysophylax.cjb.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Greg Hudson wrote:
|>1. If --deltas is specified and a path is being added without
|>history, will it be stored as a delta against empty, or just as
|>fulltext?  (I personally would prefer fulltext...)
|
|
| Hm, why would you prefer fulltext for adds-without-history?  I just
| discovered that it's a one-line change to do adds as deltas against
| empty, and it saves another 25% on the diffy dumpfile size for the
| Subversion project repository.  (The resulting diffy dumpfile isn't as
| compressible, but compressability wasn't really my goal.)  It also
| seems more elegant to me, since we transmit commits of added files
| over the wire as deltas against empty.

Actually it's due to my ignorance.  I've somehow never absorbed the fact
that deltas compress.  I just preferred plaintext wherever possible
because (no offense) deltas look ugly (meaning they're hard to read).

Since they save that much I say forget what I was talking about.

Edmund.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFAZ3zCEbvImmpUq7gRAg6LAKDZmx1YMhz3M+5pGLDBZhaFjNHbYwCgmAgQ
NqZ+IoNFLOwVxmsxyXHhSr8=
=tj1R
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by Greg Hudson <gh...@MIT.EDU>.
> 1. If --deltas is specified and a path is being added without
> history, will it be stored as a delta against empty, or just as
> fulltext?  (I personally would prefer fulltext...)

Hm, why would you prefer fulltext for adds-without-history?  I just
discovered that it's a one-line change to do adds as deltas against
empty, and it saves another 25% on the diffy dumpfile size for the
Subversion project repository.  (The resulting diffy dumpfile isn't as
compressible, but compressability wasn't really my goal.)  It also
seems more elegant to me, since we transmit commits of added files
over the wire as deltas against empty.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by Edmund Horner <ch...@chrysophylax.cjb.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Patch sounds very useful, considering lots of people (like me) use dump
files for backups.

Two questions:

1. If --deltas is specified and a path is being added without history,
will it be stored as a delta against empty, or just as fulltext?  (I
personally would prefer fulltext...)

2. Does --deltas imply --incremental?  Because if it doesn't, it's going
to be emitting do-nothing deltas for things that haven't changed in that
first revision, right?  Sounds as if it might as well leave them out.

I could think of lots more questions, but I thought that was enough to
be going on with :-)

Edmund.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFAZoaUEbvImmpUq7gRAphCAJwNkEInSajoljjgiXB57pblVPBpvACgr5ZY
Retop3gyd6PN4VDYM4GYilA=
=yIr5
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by "C. Michael Pilato" <cm...@collab.net>.
Greg Hudson <gh...@MIT.EDU> writes:

> Answer #2:
> 
>   * For an add with history, the copy history.

Okay, so an interesting side-effect of this change is that it will
further highlight the brokenness of svndumpfilter w.r.t. copies.  I
don't see this as a showstopper -- a) the dump filter was a good idea
impossible to implement as planned anyway, and b) folks can always
*not* pass --deltas.

I'm fine with the addition of this feature so long as its fragility
(insofar as dumpfile editing goes) is well-documented.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by Greg Hudson <gh...@MIT.EDU>.
On Sun, 2004-03-28 at 21:09, C. Michael Pilato wrote:
> I don't have time to read/review the patch right now, so I'll stick to
> asking questions that I feel you owe us as a penalty for dropping
> another code bomb. :-)

>   - Against what are the deltas for a given node?  The previous node?

Answer #1: against the same thing we compare against to decide if text
and properties have changed.  That had better be the right thing, or
there would be situations where a dump/load doesn't change a node which
ought to change.  If you look at the changes to dump_node, you can see
that I just expanded the scope of compare_root and used
compare_root/compare_path as the delta source.

(I actually went to significant effort to try to conduct a case where
the code compares against the wrong thing--by constructing a rev where a
directory is replaced with copy history and then a file within it is
modified to have the same contents as the previous rev had at that
path--but you guys were too clever, and had already taken care of it. 
So no torturous ghudson bugs to report on that front.)

Answer #2:

  * For an add with history, the copy history.
  * For an add without history, the empty source.
  * For a change, the previous contents of the node.

Answer #3: the same thing the delta would be against for a commit.

>   - In what situations are deltas used:  adds-with-history?
>     adds-without-history?  modifications?

In all three situations.  (In the patch I sent, adds-without-history are
dumped as plain text rather than as deltas against empty, but it's a
trivial one-line change to fix that.)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by "C. Michael Pilato" <cm...@collab.net>.
Greg Hudson <gh...@MIT.EDU> writes:

> The problem: the output of "svnadmin dump" is useful as a
> platform-independent backup format, as a way of transporting new
> revisions across the net, and for other purposes.  But for some
> workloads it's too big, because it dumps all the properties and all
> the plaintext for each changed node-revision.
> 
> The following patch--another thousand-line ghudson weekend code bomb,
> because I know you all love them so much--implements a new "svnadmin
> dump --deltas" option to cause dumps to be made using deltas.

I don't have time to read/review the patch right now, so I'll stick to
asking questions that I feel you owe us as a penalty for dropping
another code bomb. :-)

  - Against what are the deltas for a given node?  The previous node?

  - In what situations are deltas used:  adds-with-history?
    adds-without-history?  modifications?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by Greg Hudson <gh...@MIT.EDU>.
On Mon, 2004-03-29 at 08:53, Tobias Ringstrom wrote:
> It would be very interesting to hear what the size is of a dump of the 
> subversion repository for the following cases, and also how long the 
> dump (including any compression) takes:
> 
> 1. fulltext dump
> 2. fulltext dump + "gzip -9"
> 3. fulltext dump + "bzip2 -9"
> 4. delta dump
> 5. delta dump + "gzip -9"
> 6. delta dump + "bzip2 -9"

I realize you retracted this question, but I do have some new numbers.  

On the time issue, I just calculated the times for the dumps themselves,
without any compression, because I wanted to run five tests and average
and doing that six times is a lot.  I got 104.904u 12.344s 2:09.36 for a
standard dump and 292.826u 26.162s 5:20.84 for a delta dump, or about
two and a half times as long for a delta dump.  Not too surprising.

For the sizes:

  1. 648479605
  2. 144751988
  3. 99624568
  4. 24649260 (3.8% as big as (1))
  5. 14154777 (9.8% as big as (2))
  6. 12709484 (12.8% as big as (3))

These numbers are different from the ones I sent earlier because I've
enabled deltas against the empty source for adds without history.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by Tobias Ringstrom <to...@ringstrom.mine.nu>.
Tobias Ringstrom wrote:

> I'm planning to test this myself, but I'm thinking that you might have 
> some numbers already.

Gah.  I just saw your numbers in another mail in this thread, so just go 
ahead and ignore my post.

/Tobias


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by Tobias Ringstrom <to...@ringstrom.mine.nu>.
Greg Hudson wrote:

> The problem: the output of "svnadmin dump" is useful as a
> platform-independent backup format, as a way of transporting new
> revisions across the net, and for other purposes.  But for some
> workloads it's too big, because it dumps all the properties and all
> the plaintext for each changed node-revision.

It would be very interesting to hear what the size is of a dump of the 
subversion repository for the following cases, and also how long the 
dump (including any compression) takes:

1. fulltext dump
2. fulltext dump + "gzip -9"
3. fulltext dump + "bzip2 -9"
4. delta dump
5. delta dump + "gzip -9"
6. delta dump + "bzip2 -9"

I'm planning to test this myself, but I'm thinking that you might have 
some numbers already.

/Tobias


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by Ben Reser <be...@reser.org>.
On Thu, Apr 01, 2004 at 08:37:00PM -0500, Garrett Rooney wrote:
> Actually, if the values for the bitfield are named well, I think it can 
> be more readable, as opposed to remembering that the third boolean 
> argument to the function represents one thing, and the fourth something 
> else...

It certainly would be more readable in the perl bindings.  Right now I
have a bunch of function calls like this:
$ctx->status(getcwd(), 'WORKING', sub { status_func($ctx, @_) }, 1, 0,
0, 0);

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On Apr 1, 2004, at 8:32 PM, Ben Reser wrote:

>>   * It makes all of the APIs which take boolean flags (including the
>> majority which we'll never extend) more heavyweight, in terms of the
>> verbiage necessary to describe and use them.
>
> True.  Bitfields are more difficult to use.  It's a tradeoff and 
> depends
> on if you think the benefits are worth the cost.

Actually, if the values for the bitfield are named well, I think it can 
be more readable, as opposed to remembering that the third boolean 
argument to the function represents one thing, and the fourth something 
else...

It all depends on how you look at it I suppose.

-garrett


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by Ben Reser <be...@reser.org>.
On Thu, Apr 01, 2004 at 08:26:14PM -0500, Greg Hudson wrote:
> I guess I'm not terribly sanguine about this idea because:
> 
>   * Boolean flags are only one type of argument, so this only allows one
> type of compatible extension, covering maybe 25% of API extensions (just
> kind of estimating based on the number of extensions we've made so far).

Yes and no.  It would help us avoid spewing about new functions.  If we
could simply add a bit flag for some new behavior it would save us from
adding a lot of extraneous functions.

A lot of times we're adding changes that change the way a function
behaves where it would be nice to be able to do that without adding a
function or changing the functions signature.

>   * It makes all of the APIs which take boolean flags (including the
> majority which we'll never extend) more heavyweight, in terms of the
> verbiage necessary to describe and use them.

True.  Bitfields are more difficult to use.  It's a tradeoff and depends
on if you think the benefits are worth the cost.

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by Greg Hudson <gh...@MIT.EDU>.
On Thu, 2004-04-01 at 20:16, Ben Reser wrote:
> We had discussed this on IRC in the past.  Several people (though I
> don't remember who all) thought it was a good idea.  I think it's a good
> idea.  The intention was to take and start making boolean's into bit
> flags on new APIs and when we get around to 2.0 replace existing boolean
> API's with bit flag APIs.

I guess I'm not terribly sanguine about this idea because:

  * Boolean flags are only one type of argument, so this only allows one
type of compatible extension, covering maybe 25% of API extensions (just
kind of estimating based on the number of extensions we've made so far).

  * It makes all of the APIs which take boolean flags (including the
majority which we'll never extend) more heavyweight, in terms of the
verbiage necessary to describe and use them.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by Ben Reser <be...@reser.org>.
On Sun, Mar 28, 2004 at 01:34:50PM -0500, Greg Hudson wrote:
> On Sun, 2004-03-28 at 13:31, Garrett Rooney wrote:
> > Since this adds a second boolean flag to this function, would it make 
> > sense to switch it to an integer with several bit flags so we can add 
> > further such options in the future without changing the prototype?
> 
> That might not be a bad idea, but it feels a little out of character
> relative to the rest of the API.  Do other people have opinions?
> 
> (This is where it would be really nice to have a language with named
> optional function parameters.)

We had discussed this on IRC in the past.  Several people (though I
don't remember who all) thought it was a good idea.  I think it's a good
idea.  The intention was to take and start making boolean's into bit
flags on new APIs and when we get around to 2.0 replace existing boolean
API's with bit flag APIs.

-- 
Ben Reser <be...@reser.org>
http://ben.reser.org

"Conscience is the inner voice which warns us somebody may be looking."
- H.L. Mencken

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by Greg Hudson <gh...@MIT.EDU>.
On Sun, 2004-03-28 at 13:31, Garrett Rooney wrote:
> Since this adds a second boolean flag to this function, would it make 
> sense to switch it to an integer with several bit flags so we can add 
> further such options in the future without changing the prototype?

That might not be a bad idea, but it feels a little out of character
relative to the rest of the API.  Do other people have opinions?

(This is where it would be really nice to have a language with named
optional function parameters.)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On Mar 28, 2004, at 12:58 AM, Greg Hudson wrote:

> The problem: the output of "svnadmin dump" is useful as a
> platform-independent backup format, as a way of transporting new
> revisions across the net, and for other purposes.  But for some
> workloads it's too big, because it dumps all the properties and all
> the plaintext for each changed node-revision.

I like this idea a lot.

> +svn_error_t *svn_repos_dump_fs2 (svn_repos_t *repos,
> +                                 svn_stream_t *dumpstream,
> +                                 svn_stream_t *feedback_stream,
> +                                 svn_revnum_t start_rev,
> +                                 svn_revnum_t end_rev,
> +                                 svn_boolean_t incremental,
> +                                 svn_boolean_t use_deltas,
> +                                 svn_cancel_func_t cancel_func,
> +                                 void *cancel_baton,
> +                                 apr_pool_t *pool);

Since this adds a second boolean flag to this function, would it make 
sense to switch it to an integer with several bit flags so we can add 
further such options in the future without changing the prototype?

-garrett


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by Greg Hudson <gh...@MIT.EDU>.
On Sun, 2004-03-28 at 12:46, Martin Furter wrote:
> > They can already do that by limiting themselves to format version 2.
> 
> Yes, i also thought about that.
> But what if the dump file format changes again?
> For example if another node type like symbolic link is added?

We could add such a header when the format changes again, if it still
seems to be needed.  It seems to be a little speculative to add it now,
though I'm not totally opposed to it.

As an aside, I now have performance numbers.  Using the Subversion
repository as of r9219 (thanks to Ben for providing me with a dumpfile):

  Standard dump size:			648479605
  Compressed standard dump size:	144751988
  Repository size:			170291200
  Diffy dump size:			33113142
  Compressed diffy dump size:		13196422

(Ben says the repository on svn.collab.net is only 140MB, so there's
obviously some variation there.  I thought I was using BDB 4.2, but I
had to delete unused logfiles to get the repository size down from
1.65GB to the 162MB listed above, so maybe I'm using BDB 4.0 on account
of the APR configury bug.)

Anyway, the upshot is that diffy dumps are 20 times smaller than regular
dumps, or 10 times smaller when compressed.  Also, the repository seems
to be 4-5 times bigger than a diffy dump of the repository, which seems
like a lot of overhead.

I haven't collected time data, but subjectively, diffy dumps and loads
seem to be roughly as fast as regular dumps and loads.

On the testing front, we don't appear to have any dump-load test cases
(we have some very basic "dump" test cases which don't validate the
output), so I'm not sure if I'll write any test cases before checking in
my code.  It appears to dump and load the Subversion repository just
fine, though.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by Martin Furter <mf...@rola.ch>.

On Sun, 28 Mar 2004, Greg Hudson wrote:

> On Sun, 2004-03-28 at 11:26, Martin Furter wrote:
> > SVN-fs-dump-format-version: 3
> > SVN-fs-dump-deltas: true
> 
> > This flag could then be evaluated by other tools and if they don't know
> > how to handle deltas wouldn't have to read a few kB to find out that they
> > can't process the file.
> 
> They can already do that by limiting themselves to format version 2.

Yes, i also thought about that.
But what if the dump file format changes again?
For example if another node type like symbolic link is added?

Martin


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by Greg Hudson <gh...@MIT.EDU>.
On Sun, 2004-03-28 at 11:26, Martin Furter wrote:
> SVN-fs-dump-format-version: 3
> SVN-fs-dump-deltas: true

> This flag could then be evaluated by other tools and if they don't know
> how to handle deltas wouldn't have to read a few kB to find out that they
> can't process the file.

They can already do that by limiting themselves to format version 2.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Optionally use deltas in dumpfiles

Posted by Martin Furter <mf...@rola.ch>.
Hi

I propose adding a flag right after the format version line which says if
the dump can contain deltas or not.
The first lines of a dump would then look like that:

SVN-fs-dump-format-version: 3
SVN-fs-dump-deltas: true

UUID: ...

This flag could then be evaluated by other tools and if they don't know
how to handle deltas wouldn't have to read a few kB to find out that they
can't process the file.

Martin


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org