You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@subversion.apache.org by cm...@apache.org on 2012/05/04 15:13:00 UTC

svn commit: r1333936 - in /subversion/trunk/subversion: include/private/svn_wc_private.h include/svn_ra.h libsvn_client/ra.c libsvn_ra_serf/update.c libsvn_wc/adm_ops.c

Author: cmpilato
Date: Fri May  4 13:13:00 2012
New Revision: 1333936

URL: http://svn.apache.org/viewvc?rev=1333936&view=rev
Log:
Teach libsvn_ra_serf to make use of the server-provided SHA1 checksums
I introduced in 1.7 (iff they are provided, of course) to avoid
downloading server content that's already locally available.

* subversion/include/private/svn_wc_private.h,
* subversion/libsvn_wc/adm_ops.c
  (svn_wc__get_pristine_contents_by_checksum): New function.

* subversion/include/svn_ra.h
  (svn_ra_get_wc_contents_func_t): New callback type.
  (svn_ra_callbacks2_t): Add get_wc_contents vtable member.

* subversion/libsvn_client/ra.c
  (callback_baton_t): Add 'wcroot_abspath' member.
  (get_wc_contents): New function.
  (svn_client__open_ra_session_internal): Initialize new
    'get_wc_contents' baton member.

* subversion/libsvn_ra_serf/update.c
  (report_ctx_t): Add 'cached_contents' member.
  (local_fetch, handle_local_fetch): New functions.
  (fetch_file): Use the 'get_wc_contents' RA callback to check for a
    local copy of the file contents whose SHA1 checksum we are about to
    fetch from the server.  If we've got those contents already, read
    them (via the callback-returned stream) instead of from the network.

Modified:
    subversion/trunk/subversion/include/private/svn_wc_private.h
    subversion/trunk/subversion/include/svn_ra.h
    subversion/trunk/subversion/libsvn_client/ra.c
    subversion/trunk/subversion/libsvn_ra_serf/update.c
    subversion/trunk/subversion/libsvn_wc/adm_ops.c

Modified: subversion/trunk/subversion/include/private/svn_wc_private.h
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/include/private/svn_wc_private.h?rev=1333936&r1=1333935&r2=1333936&view=diff
==============================================================================
--- subversion/trunk/subversion/include/private/svn_wc_private.h (original)
+++ subversion/trunk/subversion/include/private/svn_wc_private.h Fri May  4 13:13:00 2012
@@ -1076,6 +1076,18 @@ svn_wc__node_get_md5_from_sha1(const svn
                                apr_pool_t *result_pool,
                                apr_pool_t *scratch_pool);
 
+/* Like svn_wc_get_pristine_contents2(), but keyed on the
+   SHA1_CHECKSUM rather than on the local absolute path of the working
+   file.  WCROOT_ABSPATH is the absolute path of the root of the
+   working copy in whose pristine database we'll be looking for these
+   contents.  */
+svn_error_t *
+svn_wc__get_pristine_contents_by_checksum(svn_stream_t **contents,
+                                          svn_wc_context_t *wc_ctx,
+                                          const char *wcroot_abspath,
+                                          const svn_checksum_t *sha1_checksum,
+                                          apr_pool_t *result_pool,
+                                          apr_pool_t *scratch_pool);
 
 /* Gets an array of const char *repos_relpaths of descendants of LOCAL_ABSPATH,
  * which must be the op root of an addition, copy or move. The descendants

Modified: subversion/trunk/subversion/include/svn_ra.h
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/include/svn_ra.h?rev=1333936&r1=1333935&r2=1333936&view=diff
==============================================================================
--- subversion/trunk/subversion/include/svn_ra.h (original)
+++ subversion/trunk/subversion/include/svn_ra.h Fri May  4 13:13:00 2012
@@ -120,6 +120,16 @@ typedef svn_error_t *(*svn_ra_invalidate
                                                           const char *name,
                                                           apr_pool_t *pool);
 
+/** This is a function type which allows the RA layer to fetch the
+ * cached pristine file contents whose SHA1 checksum is @a
+ * sha1_checksum, if any.  @a *contents will be a read stream
+ * containing those contents if they are found; NULL otherwise.
+ */
+typedef svn_error_t *(*svn_ra_get_wc_contents_func_t)(void *baton,
+                                                      svn_stream_t **contents,
+                                                      svn_checksum_t *sha1_checksum,
+                                                      apr_pool_t *pool);
+
 
 /** A function type for retrieving the youngest revision from a repos. */
 typedef svn_error_t *(*svn_ra_get_latest_revnum_func_t)(
@@ -138,6 +148,7 @@ typedef svn_error_t *(*svn_ra_get_client
                                                         apr_pool_t *pool);
 
 
+
 /**
  * A callback function type for use in @c get_file_revs.
  * @a baton is provided by the caller, @a path is the pathname of the file
@@ -516,6 +527,11 @@ typedef struct svn_ra_callbacks2_t
    */
   svn_ra_get_client_string_func_t get_client_string;
 
+  /** Working copy file content fetching function.
+   * @since New in 1.8.
+   */
+  svn_ra_get_wc_contents_func_t get_wc_contents;
+
 } svn_ra_callbacks2_t;
 
 /** Similar to svn_ra_callbacks2_t, except that the progress

Modified: subversion/trunk/subversion/libsvn_client/ra.c
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_client/ra.c?rev=1333936&r1=1333935&r2=1333936&view=diff
==============================================================================
--- subversion/trunk/subversion/libsvn_client/ra.c (original)
+++ subversion/trunk/subversion/libsvn_client/ra.c Fri May  4 13:13:00 2012
@@ -52,6 +52,10 @@ typedef struct callback_baton_t
      this base directory. */
   const char *base_dir_abspath;
 
+  /* Holds the absolute path of the working copy root for the working
+     copy in which BASE_DIR_ABSPATH is found. */
+  const char *wcroot_abspath;
+
   /* An array of svn_client_commit_item3_t * structures, present only
      during working copy commits. */
   const apr_array_header_t *commit_items;
@@ -234,6 +238,30 @@ invalidate_wc_props(void *baton,
 }
 
 
+/* This implements the `svn_ra_get_wc_contents_func_t' interface. */
+static svn_error_t *
+get_wc_contents(void *baton,
+                svn_stream_t **contents,
+                svn_checksum_t *sha1_checksum,
+                apr_pool_t *pool)
+{
+  callback_baton_t *cb = baton;
+
+  if (! cb->wcroot_abspath)
+    {
+      *contents = NULL;
+      return SVN_NO_ERROR;
+    }
+
+  return svn_error_trace(
+             svn_wc__get_pristine_contents_by_checksum(contents,
+                                                       cb->ctx->wc_ctx,
+                                                       cb->wcroot_abspath,
+                                                       sha1_checksum,
+                                                       pool, pool));
+}
+
+
 static svn_error_t *
 cancel_callback(void *baton)
 {
@@ -284,6 +312,7 @@ svn_client__open_ra_session_internal(svn
   cbtable->progress_baton = ctx->progress_baton;
   cbtable->cancel_func = ctx->cancel_func ? cancel_callback : NULL;
   cbtable->get_client_string = get_client_string;
+  cbtable->get_wc_contents = get_wc_contents;
 
   cb->base_dir_abspath = base_dir_abspath;
   cb->commit_items = commit_items;
@@ -291,6 +320,7 @@ svn_client__open_ra_session_internal(svn
 
   if (base_dir_abspath)
     {
+      const char *wcroot_abspath;
       svn_error_t *err = svn_wc__node_get_repos_info(NULL, &uuid, ctx->wc_ctx,
                                                      base_dir_abspath,
                                                      pool, pool);
@@ -303,7 +333,13 @@ svn_client__open_ra_session_internal(svn
           uuid = NULL;
         }
       else
-        SVN_ERR(err);
+        {
+          SVN_ERR(err);
+        }
+
+      SVN_ERR(svn_wc__get_wc_root(&wcroot_abspath, ctx->wc_ctx,
+                                  base_dir_abspath, pool, pool));
+      cb->wcroot_abspath = wcroot_abspath;
     }
 
   /* If the caller allows for auto-following redirections, and the

Modified: subversion/trunk/subversion/libsvn_ra_serf/update.c
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_ra_serf/update.c?rev=1333936&r1=1333935&r2=1333936&view=diff
==============================================================================
--- subversion/trunk/subversion/libsvn_ra_serf/update.c (original)
+++ subversion/trunk/subversion/libsvn_ra_serf/update.c Fri May  4 13:13:00 2012
@@ -237,6 +237,10 @@ typedef struct report_info_t
   /* Checksum for close_file */
   const char *final_checksum;
 
+  /* Stream containing file contents already cached in the working
+     copy (which may be used to avoid a GET request for the same). */
+  svn_stream_t *cached_contents;
+
   /* temporary property for this file which is currently being parsed
    * It will eventually be stored in our parent directory's property hash.
    */
@@ -1229,6 +1233,179 @@ handle_propchange_only(report_info_t *in
   return SVN_NO_ERROR;
 }
 
+/* "Fetch" a file whose contents were made available via the
+   get_wc_contents() callback (as opposed to requiring a GET to the
+   server), and feed the information through the associated update
+   editor. */
+static svn_error_t *
+local_fetch(report_info_t *info)
+{
+  const svn_delta_editor_t *update_editor = info->dir->update_editor;
+  svn_txdelta_window_t delta_window = { 0 };
+  svn_txdelta_op_t delta_op;
+  svn_string_t window_data;
+  char read_buf[SVN__STREAM_CHUNK_SIZE + 1];
+
+  SVN_ERR(open_dir(info->dir));
+  info->editor_pool = svn_pool_create(info->dir->dir_baton_pool);
+
+  /* Expand our full name now if we haven't done so yet. */
+  if (!info->name)
+    {
+      info->name = svn_relpath_join(info->dir->name, info->base_name,
+                                    info->editor_pool);
+    }
+
+  if (SVN_IS_VALID_REVNUM(info->base_rev))
+    {
+      SVN_ERR(update_editor->open_file(info->name,
+                                       info->dir->dir_baton,
+                                       info->base_rev,
+                                       info->editor_pool,
+                                       &info->file_baton));
+    }
+  else
+    {
+      SVN_ERR(update_editor->add_file(info->name,
+                                      info->dir->dir_baton,
+                                      info->copyfrom_path,
+                                      info->copyfrom_rev,
+                                      info->editor_pool,
+                                      &info->file_baton));
+    }
+  
+  SVN_ERR(update_editor->apply_textdelta(info->file_baton,
+                                         info->base_checksum,
+                                         info->editor_pool,
+                                         &info->textdelta,
+                                         &info->textdelta_baton));
+  
+  while (1)
+    {
+      apr_size_t read_len = SVN__STREAM_CHUNK_SIZE;
+      
+      SVN_ERR(svn_stream_read(info->cached_contents, read_buf, &read_len));
+      if (read_len == 0)
+        break;
+      
+      window_data.data = read_buf;
+      window_data.len = read_len;
+      
+      delta_op.action_code = svn_txdelta_new;
+      delta_op.offset = 0;
+      delta_op.length = read_len;
+      
+      delta_window.tview_len = read_len;
+      delta_window.num_ops = 1;
+      delta_window.ops = &delta_op;
+      delta_window.new_data = &window_data;
+      
+      SVN_ERR(info->textdelta(&delta_window, info->textdelta_baton));
+      
+      if (read_len < SVN__STREAM_CHUNK_SIZE)
+        break;
+    }
+  
+  SVN_ERR(info->textdelta(NULL, info->textdelta_baton));
+
+  SVN_ERR(svn_stream_close(info->cached_contents));
+  info->cached_contents = NULL;
+
+  if (info->lock_token)
+    check_lock(info);
+
+  /* set all of the properties we received */
+  SVN_ERR(svn_ra_serf__walk_all_props(info->props,
+                                      info->base_name,
+                                      info->base_rev,
+                                      set_file_props, info,
+                                      info->pool));
+  
+  SVN_ERR(svn_ra_serf__walk_all_props(info->dir->removed_props,
+                                      info->base_name,
+                                      info->base_rev,
+                                      remove_file_props, info,
+                                      info->pool));
+  if (info->fetch_props)
+    {
+      SVN_ERR(svn_ra_serf__walk_all_props(info->props,
+                                          info->url,
+                                          info->target_rev,
+                                          set_file_props, info,
+                                          info->pool));
+    }
+  
+  SVN_ERR(info->dir->update_editor->close_file(info->file_baton,
+                                               info->final_checksum,
+                                               info->pool));
+  
+  /* We're done with our pools. */
+  svn_pool_destroy(info->editor_pool);
+  svn_pool_destroy(info->pool);
+
+  return SVN_NO_ERROR;
+}
+
+/* Implements svn_ra_serf__response_handler_t */
+static svn_error_t *
+handle_local_fetch(serf_request_t *request,
+                   serf_bucket_t *response,
+                   void *handler_baton,
+                   apr_pool_t *pool)
+{
+  report_fetch_t *fetch_ctx = handler_baton;
+  apr_status_t status;
+  serf_status_line sl;
+  svn_error_t *err;
+  const char *data;
+  apr_size_t len;
+
+  /* If the error code wasn't 200, something went wrong. Don't use the returned
+     data as its probably an error message. Just bail out instead. */
+  status = serf_bucket_response_status(response, &sl);
+  if (SERF_BUCKET_READ_ERROR(status))
+    {
+      return svn_error_wrap_apr(status, NULL);
+    }
+  if (sl.code != 200)
+    {
+      err = svn_error_createf(SVN_ERR_RA_DAV_REQUEST_FAILED, NULL,
+                              _("GET request failed: %d %s"),
+                              sl.code, sl.reason);
+      return error_fetch(request, fetch_ctx, err);
+    }
+
+  while (1)
+    {
+      status = serf_bucket_read(response, 8000, &data, &len);
+      if (SERF_BUCKET_READ_ERROR(status))
+        {
+          return svn_error_wrap_apr(status, NULL);
+        }
+      if (APR_STATUS_IS_EOF(status))
+        {
+          err = local_fetch(fetch_ctx->info);
+          if (err)
+            {
+              return error_fetch(request, fetch_ctx, err);
+            }
+
+          fetch_ctx->done = TRUE;
+          fetch_ctx->done_item.data = fetch_ctx;
+          fetch_ctx->done_item.next = *fetch_ctx->done_list;
+          *fetch_ctx->done_list = &fetch_ctx->done_item;
+          return svn_error_wrap_apr(status, NULL);
+        }
+      if (APR_STATUS_IS_EAGAIN(status))
+        {
+          return svn_error_wrap_apr(status, NULL);
+        }
+    }
+  /* not reached */
+}
+
+/* --------------------------------------------------------- */
+
 static svn_error_t *
 fetch_file(report_context_t *ctx, report_info_t *info)
 {
@@ -1239,9 +1416,8 @@ fetch_file(report_context_t *ctx, report
   conn = ctx->sess->conns[ctx->sess->cur_conn];
 
   /* go fetch info->name from DAV:checked-in */
-  info->url =
-      svn_ra_serf__get_ver_prop(info->props, info->base_name,
-                                info->base_rev, "DAV:", "checked-in");
+  info->url = svn_ra_serf__get_ver_prop(info->props, info->base_name,
+                                        info->base_rev, "DAV:", "checked-in");
 
   if (!info->url)
     {
@@ -1269,34 +1445,98 @@ fetch_file(report_context_t *ctx, report
    */
   if (info->fetch_file && ctx->text_deltas)
     {
-      report_fetch_t *fetch_ctx;
+      svn_stream_t *contents = NULL;
 
-      fetch_ctx = apr_pcalloc(info->dir->pool, sizeof(*fetch_ctx));
-      fetch_ctx->info = info;
-      fetch_ctx->done_list = &ctx->done_fetches;
-      fetch_ctx->sess = ctx->sess;
-      fetch_ctx->conn = conn;
+      if (ctx->sess->wc_callbacks->get_wc_contents
+          && info->final_sha1_checksum)
+        {
+          svn_error_t *err;
+          svn_checksum_t *sha1_checksum;
 
-      handler = apr_pcalloc(info->dir->pool, sizeof(*handler));
+          err = svn_checksum_parse_hex(&sha1_checksum, svn_checksum_sha1,
+                                       info->final_sha1_checksum, info->pool);
+          if (!err)
+            {
+              err = ctx->sess->wc_callbacks->get_wc_contents(
+                        ctx->sess->wc_callback_baton, &contents,
+                        sha1_checksum, info->pool);
+            }
 
-      handler->method = "GET";
-      handler->path = fetch_ctx->info->url;
+          if (err)
+            {
+              /* Meh.  Maybe we'll care one day why we're in an
+                 errorful state, but this codepath is optional.  */
+              svn_error_clear(err);
+            }
+          else
+            {
+              info->cached_contents = contents;
+            }
+        }          
 
-      handler->conn = conn;
-      handler->session = ctx->sess;
+      /* If the working copy can provided cached contents for this
+         file, we'll send a simple HEAD request (which I'll claim is
+         to verify readability, but really is just so I can provide a
+         Serf-queued-request-compliant way of processing the contents
+         after the PROPFIND for the file's properties ... ugh).
 
-      handler->header_delegate = headers_fetch;
-      handler->header_delegate_baton = fetch_ctx;
+         Otherwise, we use a GET request for the file's contents. */
+      if (info->cached_contents)
+        {
+          report_fetch_t *fetch_ctx;
 
-      handler->response_handler = handle_fetch;
-      handler->response_baton = fetch_ctx;
+          fetch_ctx = apr_pcalloc(info->dir->pool, sizeof(*fetch_ctx));
+          fetch_ctx->info = info;
+          fetch_ctx->done_list = &ctx->done_fetches;
+          fetch_ctx->sess = ctx->sess;
+          fetch_ctx->conn = conn;
 
-      handler->response_error = cancel_fetch;
-      handler->response_error_baton = fetch_ctx;
+          handler = apr_pcalloc(info->dir->pool, sizeof(*handler));
 
-      svn_ra_serf__request_create(handler);
+          handler->method = "HEAD";
+          handler->path = fetch_ctx->info->url;
 
-      ctx->active_fetches++;
+          handler->conn = conn;
+          handler->session = ctx->sess;
+
+          handler->response_handler = handle_local_fetch;
+          handler->response_baton = fetch_ctx;
+
+          svn_ra_serf__request_create(handler);
+
+          ctx->active_fetches++;
+        }
+      else
+        {
+          report_fetch_t *fetch_ctx;
+
+          fetch_ctx = apr_pcalloc(info->dir->pool, sizeof(*fetch_ctx));
+          fetch_ctx->info = info;
+          fetch_ctx->done_list = &ctx->done_fetches;
+          fetch_ctx->sess = ctx->sess;
+          fetch_ctx->conn = conn;
+
+          handler = apr_pcalloc(info->dir->pool, sizeof(*handler));
+
+          handler->method = "GET";
+          handler->path = fetch_ctx->info->url;
+
+          handler->conn = conn;
+          handler->session = ctx->sess;
+
+          handler->header_delegate = headers_fetch;
+          handler->header_delegate_baton = fetch_ctx;
+
+          handler->response_handler = handle_fetch;
+          handler->response_baton = fetch_ctx;
+
+          handler->response_error = cancel_fetch;
+          handler->response_error_baton = fetch_ctx;
+
+          svn_ra_serf__request_create(handler);
+
+          ctx->active_fetches++;
+        }
     }
   else if (info->propfind)
     {

Modified: subversion/trunk/subversion/libsvn_wc/adm_ops.c
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_wc/adm_ops.c?rev=1333936&r1=1333935&r2=1333936&view=diff
==============================================================================
--- subversion/trunk/subversion/libsvn_wc/adm_ops.c (original)
+++ subversion/trunk/subversion/libsvn_wc/adm_ops.c Fri May  4 13:13:00 2012
@@ -2250,6 +2250,18 @@ svn_wc_get_pristine_contents2(svn_stream
                                                        scratch_pool));
 }
 
+svn_error_t *
+svn_wc__get_pristine_contents_by_checksum(svn_stream_t **contents,
+                                          svn_wc_context_t *wc_ctx,
+                                          const char *wcroot_abspath,
+                                          const svn_checksum_t *sha1_checksum,
+                                          apr_pool_t *result_pool,
+                                          apr_pool_t *scratch_pool)
+{
+  return svn_error_trace(svn_wc__db_pristine_read(contents, NULL, wc_ctx->db,
+                                                  wcroot_abspath, sha1_checksum,
+                                                  result_pool, scratch_pool));
+}
 
 svn_error_t *
 svn_wc__internal_remove_from_revision_control(svn_wc__db_t *db,



Re: svn commit: r1333936 - in /subversion/trunk/subversion: include/private/svn_wc_private.h include/svn_ra.h libsvn_client/ra.c libsvn_ra_serf/update.c libsvn_wc/adm_ops.c

Posted by "C. Michael Pilato" <cm...@collab.net>.
On 05/04/2012 05:16 PM, Greg Stein wrote:
> On Fri, May 4, 2012 at 9:13 AM,  <cm...@apache.org> wrote:
>> Author: cmpilato
>> Date: Fri May  4 13:13:00 2012
>> New Revision: 1333936
>>
>> URL: http://svn.apache.org/viewvc?rev=1333936&view=rev
>> Log:
>> Teach libsvn_ra_serf to make use of the server-provided SHA1 checksums
>> I introduced in 1.7 (iff they are provided, of course) to avoid
>> downloading server content that's already locally available.

[code review snipped]

Great feedback, Greg -- thanks.  I'm going to need to process it in waves.
Please continue watching the commits and offering your input!  :-)

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Enterprise Cloud Development


Re: svn commit: r1333936 - in /subversion/trunk/subversion: include/private/svn_wc_private.h include/svn_ra.h libsvn_client/ra.c libsvn_ra_serf/update.c libsvn_wc/adm_ops.c

Posted by Greg Stein <gs...@gmail.com>.
On Mon, May 7, 2012 at 5:22 PM, C. Michael Pilato <cm...@collab.net> wrote:
> On 05/04/2012 05:16 PM, Greg Stein wrote:
>> On Fri, May 4, 2012 at 9:13 AM,  <cm...@apache.org> wrote:
>>> Author: cmpilato
>>> Date: Fri May  4 13:13:00 2012
>>> New Revision: 1333936
>
> Okay.  I think I've digested and acted open all of the feedback I've cropped
> out of this response.  There's just this one task remaining now:

Nice. Thanks!

>...
>> As a recommendation, I would suggest creating a "lazy open" stream
>> which takes a callback that opens the stream upon first read. We are
>> going to need this lazy-open stream during commits (the RA layer may
>> choose not to deliver contents to the server, so we shouldn't bother
>> opening them).
>
> To be clear, you mean that we need something like:
>
> {{{
> typedef svn_error_t (*svn_stream_lazyopen_func_t)(svn_stream_t **stream,
>                                                  void *baton,
>                                                  apr_pool_t *result_pool,
>                                                  apr_pool_t *scratch_pool);
> svn_error_t *
> svn_stream_lazyopen_create(svn_stream_t **stream,
>                           svn_stream_lazyopen_func_t open_func,
>                           void *open_baton,
>                           apr_pool_t *result_pool,
>                           apr_pool_t *scratch_pool);
> }}}

Bingo.

Tho you can likely lose the scratch_pool on _create(), and likely just
return the bare stream. I can't foresee any complexity inside that
function.

> In my specific situation, svn_wc__get_pristine_contents_by_checksum()  would
> call svn_stream_lazyopen_create(), passing an open_func callback which is
> written to invoke svn_wc__db_pristine_read().

That was my thinking, yes.

> One downside of this is that svn_wc__db_pristine_read() promises that "even
> if the pristine text is removed from the store while it is being read, the
> stream will remain valid and readable until it is closed."  That promise
> becomes more valuable the earlier you open the stream to the pristine
> contents; in this delayed open scenario, I anticipate that a "move"
> operation from the server could see reduced benefit, even if the "add" side
> of the move is processed before the "delete").  While parsing the REPORT,
> the RA callback might say, "Yep, I've got those contents!"  But by the time
> the serf connection pipelining magic runs to actually read the contents,
> prior delete handling might have blown them away.

I believe that we take out an administrative lock on the working copy
during the update. Pristines will not be cleaned when those are
present (or when workqueue items are present).

In the total edge case of paranoia, the lazyopen could grab its own
administrative lock.

I'd just verify the admin lock during update, which means that the
callback you provide to the update process can work with that
assumption.

(and the ugly fact that we don't automagically clean them out anyways;
but we may figure that out one day, so can't be relied upon)

> Oh well, *if* that's even really what would happen, I suppose it still
> doesn't outweight the current badness of having Mark running out of file
> handles.  :-P

Sure thing, boss.

Cheers,
-g

Re: svn commit: r1333936 - in /subversion/trunk/subversion: include/private/svn_wc_private.h include/svn_ra.h libsvn_client/ra.c libsvn_ra_serf/update.c libsvn_wc/adm_ops.c

Posted by "C. Michael Pilato" <cm...@collab.net>.
On 05/04/2012 05:16 PM, Greg Stein wrote:
> On Fri, May 4, 2012 at 9:13 AM,  <cm...@apache.org> wrote:
>> Author: cmpilato
>> Date: Fri May  4 13:13:00 2012
>> New Revision: 1333936

Okay.  I think I've digested and acted open all of the feedback I've cropped
out of this response.  There's just this one task remaining now:

> Note: we need to make the callback smarter. In your implementation, it
> opens the pristine contents and *holds it open* until the HEAD
> completes. If ra_serf queues 3000 HEAD requests... you now require
> 3000 file handles. Not a good idea.
> 
> Also note: Mark already ran into the file handle problem.
> 
> The solution is a custom stream.
> svn_wc__get_pristine_contents_by_checksum() should use
> svn_wc__db_pristine_check() to see if a pristine is present. If it
> *does*, then it returns a stream with a custom read function. On the
> first read, it will use svn_wc__db_pristine_read() to open the
> pristine content stream (and the underlying file handle). Then it just
> delegates reads to that inner stream.
> 
> As a recommendation, I would suggest creating a "lazy open" stream
> which takes a callback that opens the stream upon first read. We are
> going to need this lazy-open stream during commits (the RA layer may
> choose not to deliver contents to the server, so we shouldn't bother
> opening them).

To be clear, you mean that we need something like:

{{{
typedef svn_error_t (*svn_stream_lazyopen_func_t)(svn_stream_t **stream,
                                                  void *baton,
                                                  apr_pool_t *result_pool,
                                                  apr_pool_t *scratch_pool);
svn_error_t *
svn_stream_lazyopen_create(svn_stream_t **stream,
                           svn_stream_lazyopen_func_t open_func,
                           void *open_baton,
                           apr_pool_t *result_pool,
                           apr_pool_t *scratch_pool);
}}}

In my specific situation, svn_wc__get_pristine_contents_by_checksum()  would
call svn_stream_lazyopen_create(), passing an open_func callback which is
written to invoke svn_wc__db_pristine_read().

One downside of this is that svn_wc__db_pristine_read() promises that "even
if the pristine text is removed from the store while it is being read, the
stream will remain valid and readable until it is closed."  That promise
becomes more valuable the earlier you open the stream to the pristine
contents; in this delayed open scenario, I anticipate that a "move"
operation from the server could see reduced benefit, even if the "add" side
of the move is processed before the "delete").  While parsing the REPORT,
the RA callback might say, "Yep, I've got those contents!"  But by the time
the serf connection pipelining magic runs to actually read the contents,
prior delete handling might have blown them away.

Oh well, *if* that's even really what would happen, I suppose it still
doesn't outweight the current badness of having Mark running out of file
handles.  :-P

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Enterprise Cloud Development

Re: svn commit: r1333936 - in /subversion/trunk/subversion: include/private/svn_wc_private.h include/svn_ra.h libsvn_client/ra.c libsvn_ra_serf/update.c libsvn_wc/adm_ops.c

Posted by Greg Stein <gs...@gmail.com>.
On Fri, May 4, 2012 at 9:13 AM,  <cm...@apache.org> wrote:
> Author: cmpilato
> Date: Fri May  4 13:13:00 2012
> New Revision: 1333936
>
> URL: http://svn.apache.org/viewvc?rev=1333936&view=rev
> Log:
> Teach libsvn_ra_serf to make use of the server-provided SHA1 checksums
> I introduced in 1.7 (iff they are provided, of course) to avoid
> downloading server content that's already locally available.

Nice!

>...
> +++ subversion/trunk/subversion/include/private/svn_wc_private.h Fri May  4 13:13:00 2012
> @@ -1076,6 +1076,18 @@ svn_wc__node_get_md5_from_sha1(const svn
>                                apr_pool_t *result_pool,
>                                apr_pool_t *scratch_pool);
>
> +/* Like svn_wc_get_pristine_contents2(), but keyed on the
> +   SHA1_CHECKSUM rather than on the local absolute path of the working
> +   file.  WCROOT_ABSPATH is the absolute path of the root of the
> +   working copy in whose pristine database we'll be looking for these
> +   contents.  */
> +svn_error_t *
> +svn_wc__get_pristine_contents_by_checksum(svn_stream_t **contents,
> +                                          svn_wc_context_t *wc_ctx,
> +                                          const char *wcroot_abspath,
> +                                          const svn_checksum_t *sha1_checksum,
> +                                          apr_pool_t *result_pool,
> +                                          apr_pool_t *scratch_pool);

That should be a WRI_ABSPATH, aka "Working copy Root Indicator". Any
path in the working copy is just fine. It doesn't have to be the
wcroot.

(and yes, I saw your and Bert's discussion about choosing this path,
but that is unrelated to *which* path you use from a working copy)

See the bottom of this email for more concerns about this function.

>
>  /* Gets an array of const char *repos_relpaths of descendants of LOCAL_ABSPATH,
>  * which must be the op root of an addition, copy or move. The descendants
>
> Modified: subversion/trunk/subversion/include/svn_ra.h
> URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/include/svn_ra.h?rev=1333936&r1=1333935&r2=1333936&view=diff
> ==============================================================================
> --- subversion/trunk/subversion/include/svn_ra.h (original)
> +++ subversion/trunk/subversion/include/svn_ra.h Fri May  4 13:13:00 2012
> @@ -120,6 +120,16 @@ typedef svn_error_t *(*svn_ra_invalidate
>                                                           const char *name,
>                                                           apr_pool_t *pool);
>
> +/** This is a function type which allows the RA layer to fetch the
> + * cached pristine file contents whose SHA1 checksum is @a
> + * sha1_checksum, if any.  @a *contents will be a read stream
> + * containing those contents if they are found; NULL otherwise.
> + */
> +typedef svn_error_t *(*svn_ra_get_wc_contents_func_t)(void *baton,
> +                                                      svn_stream_t **contents,
> +                                                      svn_checksum_t *sha1_checksum,
> +                                                      apr_pool_t *pool);

I understand this is "pattern", but we could simply declare the
callback in the structure without using a typedef. Maybe time to
start?

And SHA1_CHECKSUM should be const.

>...
> +++ subversion/trunk/subversion/libsvn_client/ra.c Fri May  4 13:13:00 2012
>...
> +/* This implements the `svn_ra_get_wc_contents_func_t' interface. */

Tho I guess this is a reasonable point in support of a typedef.

>...
> @@ -291,6 +320,7 @@ svn_client__open_ra_session_internal(svn
>
>   if (base_dir_abspath)
>     {
> +      const char *wcroot_abspath;
>       svn_error_t *err = svn_wc__node_get_repos_info(NULL, &uuid, ctx->wc_ctx,
>                                                      base_dir_abspath,
>                                                      pool, pool);
> @@ -303,7 +333,13 @@ svn_client__open_ra_session_internal(svn
>           uuid = NULL;
>         }
>       else
> -        SVN_ERR(err);
> +        {
> +          SVN_ERR(err);
> +        }
> +
> +      SVN_ERR(svn_wc__get_wc_root(&wcroot_abspath, ctx->wc_ctx,
> +                                  base_dir_abspath, pool, pool));
> +      cb->wcroot_abspath = wcroot_abspath;

That get_wc_root() can be avoided by using a WRI_ABSPATH.

>...
> +++ subversion/trunk/subversion/libsvn_ra_serf/update.c Fri May  4 13:13:00 2012
> @@ -237,6 +237,10 @@ typedef struct report_info_t
>   /* Checksum for close_file */
>   const char *final_checksum;
>
> +  /* Stream containing file contents already cached in the working
> +     copy (which may be used to avoid a GET request for the same). */
> +  svn_stream_t *cached_contents;
> +
>   /* temporary property for this file which is currently being parsed
>    * It will eventually be stored in our parent directory's property hash.
>    */
> @@ -1229,6 +1233,179 @@ handle_propchange_only(report_info_t *in
>   return SVN_NO_ERROR;
>  }
>
> +/* "Fetch" a file whose contents were made available via the
> +   get_wc_contents() callback (as opposed to requiring a GET to the
> +   server), and feed the information through the associated update
> +   editor. */
> +static svn_error_t *
> +local_fetch(report_info_t *info)
> +{
> +  const svn_delta_editor_t *update_editor = info->dir->update_editor;
> +  svn_txdelta_window_t delta_window = { 0 };
> +  svn_txdelta_op_t delta_op;
> +  svn_string_t window_data;
> +  char read_buf[SVN__STREAM_CHUNK_SIZE + 1];
> +
> +  SVN_ERR(open_dir(info->dir));
> +  info->editor_pool = svn_pool_create(info->dir->dir_baton_pool);
> +
> +  /* Expand our full name now if we haven't done so yet. */
> +  if (!info->name)
> +    {
> +      info->name = svn_relpath_join(info->dir->name, info->base_name,
> +                                    info->editor_pool);
> +    }
> +
> +  if (SVN_IS_VALID_REVNUM(info->base_rev))
> +    {
> +      SVN_ERR(update_editor->open_file(info->name,
> +                                       info->dir->dir_baton,
> +                                       info->base_rev,
> +                                       info->editor_pool,
> +                                       &info->file_baton));
> +    }
> +  else
> +    {
> +      SVN_ERR(update_editor->add_file(info->name,
> +                                      info->dir->dir_baton,
> +                                      info->copyfrom_path,
> +                                      info->copyfrom_rev,
> +                                      info->editor_pool,
> +                                      &info->file_baton));
> +    }

There is a lot of code duplication between the above and handle_fetch().

> +
> +  SVN_ERR(update_editor->apply_textdelta(info->file_baton,
> +                                         info->base_checksum,
> +                                         info->editor_pool,
> +                                         &info->textdelta,
> +                                         &info->textdelta_baton));
> +
> +  while (1)
> +    {
> +      apr_size_t read_len = SVN__STREAM_CHUNK_SIZE;
> +
> +      SVN_ERR(svn_stream_read(info->cached_contents, read_buf, &read_len));
> +      if (read_len == 0)
> +        break;
> +
> +      window_data.data = read_buf;
> +      window_data.len = read_len;
> +
> +      delta_op.action_code = svn_txdelta_new;
> +      delta_op.offset = 0;
> +      delta_op.length = read_len;
> +
> +      delta_window.tview_len = read_len;
> +      delta_window.num_ops = 1;
> +      delta_window.ops = &delta_op;
> +      delta_window.new_data = &window_data;
> +
> +      SVN_ERR(info->textdelta(&delta_window, info->textdelta_baton));
> +
> +      if (read_len < SVN__STREAM_CHUNK_SIZE)
> +        break;
> +    }
> +
> +  SVN_ERR(info->textdelta(NULL, info->textdelta_baton));

Replace the while() thru the NULL window to:

  SVN_ERR(svn_txdelta_send_stream(info->cached_contents,
info->textdelta, info->textdelta_baton, NULL, scratch_pool));

(I guess info->pool is the scratch_pool)

Even better: please take the code above and replace the contents of
svn_txdelta_send_stream(). That will benefit all callers.

> +
> +  SVN_ERR(svn_stream_close(info->cached_contents));
> +  info->cached_contents = NULL;
> +
> +  if (info->lock_token)
> +    check_lock(info);
> +
> +  /* set all of the properties we received */
> +  SVN_ERR(svn_ra_serf__walk_all_props(info->props,
> +                                      info->base_name,
> +                                      info->base_rev,
> +                                      set_file_props, info,
> +                                      info->pool));
> +
> +  SVN_ERR(svn_ra_serf__walk_all_props(info->dir->removed_props,
> +                                      info->base_name,
> +                                      info->base_rev,
> +                                      remove_file_props, info,
> +                                      info->pool));
> +  if (info->fetch_props)
> +    {
> +      SVN_ERR(svn_ra_serf__walk_all_props(info->props,
> +                                          info->url,
> +                                          info->target_rev,
> +                                          set_file_props, info,
> +                                          info->pool));
> +    }
> +
> +  SVN_ERR(info->dir->update_editor->close_file(info->file_baton,
> +                                               info->final_checksum,
> +                                               info->pool));
> +
> +  /* We're done with our pools. */
> +  svn_pool_destroy(info->editor_pool);
> +  svn_pool_destroy(info->pool);

And again duplication with the lock and property handling, with handle_fetch().

It would have been nice to see these factored out.

> +
> +  return SVN_NO_ERROR;
> +}
> +
> +/* Implements svn_ra_serf__response_handler_t */
> +static svn_error_t *
> +handle_local_fetch(serf_request_t *request,
> +                   serf_bucket_t *response,
> +                   void *handler_baton,
> +                   apr_pool_t *pool)
> +{
> +  report_fetch_t *fetch_ctx = handler_baton;
> +  apr_status_t status;
> +  serf_status_line sl;
> +  svn_error_t *err;
> +  const char *data;
> +  apr_size_t len;
> +
> +  /* If the error code wasn't 200, something went wrong. Don't use the returned
> +     data as its probably an error message. Just bail out instead. */
> +  status = serf_bucket_response_status(response, &sl);
> +  if (SERF_BUCKET_READ_ERROR(status))
> +    {
> +      return svn_error_wrap_apr(status, NULL);
> +    }
> +  if (sl.code != 200)
> +    {
> +      err = svn_error_createf(SVN_ERR_RA_DAV_REQUEST_FAILED, NULL,
> +                              _("GET request failed: %d %s"),
> +                              sl.code, sl.reason);

You sent a HEAD request.

>...

Note: we need to make the callback smarter. In your implementation, it
opens the pristine contents and *holds it open* until the HEAD
completes. If ra_serf queues 3000 HEAD requests... you now require
3000 file handles. Not a good idea.

Also note: Mark already ran into the file handle problem.

The solution is a custom stream.
svn_wc__get_pristine_contents_by_checksum() should use
svn_wc__db_pristine_check() to see if a pristine is present. If it
*does*, then it returns a stream with a custom read function. On the
first read, it will use svn_wc__db_pristine_read() to open the
pristine content stream (and the underlying file handle). Then it just
delegates reads to that inner stream.

As a recommendation, I would suggest creating a "lazy open" stream
which takes a callback that opens the stream upon first read. We are
going to need this lazy-open stream during commits (the RA layer may
choose not to deliver contents to the server, so we shouldn't bother
opening them).

Cheers,
-g