You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@subversion.apache.org by cm...@apache.org on 2012/11/30 22:54:35 UTC

svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

Author: cmpilato
Date: Fri Nov 30 21:54:35 2012
New Revision: 1415864

URL: http://svn.apache.org/viewvc?rev=1415864&view=rev
Log:
Implement in ra_serf "send-all" mode support for update-style REPORTs
and their responses.  (Currently disabled by compile-time conditionals.)

(This one goes out to Ivan Zhakov.)

* subversion/libsvn_ra_serf/update.c
  (report_state_e): New state 'TXDELTA'.
  (report_info_t): Add 'svndiff_decoder' and 'base64_decoder' members.
  (report_context_t): Add 'send_all_mode' flag.
  (fetch_file): Don't populate the report_info_t's 'url' member here;
    we'll do that in start_report() instead.
  (start_report): Notice the "send-all" attribute on the
    <update-report> tag, and set the 'send_all_mode' flag accordingly.
    Only expect to fetch file contents for added files when not
    in "send-all" mode.  Handle the <txdelta> tag in "send-all" mode,
    pushing a new TXDELTA state and cranking up the svndiff window
    handling stuff.
  (end_report): When closing opened and added files, calculate the
    report_info_t's 'url' field, and handle "send-all" mode where
    applicable.  Handle the </txdelta> tag by closing up the
    base64-decoding stream. 
  (cdata_report): Handle CDATA for <txdelta> blocks.
  (make_update_reporter): Add (disabled) support for asking the server
    to use "send-all" mode for update REPORT responses.  Also, drop
    the <text-deltas> element into the REPORT request when we don't
    wish to get real text-deltas.

Modified:
    subversion/trunk/subversion/libsvn_ra_serf/update.c

Modified: subversion/trunk/subversion/libsvn_ra_serf/update.c
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_ra_serf/update.c?rev=1415864&r1=1415863&r2=1415864&view=diff
==============================================================================
--- subversion/trunk/subversion/libsvn_ra_serf/update.c (original)
+++ subversion/trunk/subversion/libsvn_ra_serf/update.c Fri Nov 30 21:54:35 2012
@@ -74,7 +74,8 @@ typedef enum report_state_e {
     ABSENT_FILE,
     PROP,
     IGNORE_PROP_NAME,
-    NEED_PROP_NAME
+    NEED_PROP_NAME,
+    TXDELTA
 } report_state_e;
 
 
@@ -228,6 +229,8 @@ typedef struct report_info_t
   const char *final_sha1_checksum;
   svn_txdelta_window_handler_t textdelta;
   void *textdelta_baton;
+  svn_stream_t *svndiff_decoder;
+  svn_stream_t *base64_decoder;
 
   /* Checksum for close_file */
   const char *final_checksum;
@@ -318,6 +321,9 @@ struct report_context_t {
   /* Do we want the server to send copyfrom args or not? */
   svn_boolean_t send_copyfrom_args;
 
+  /* Is the server sending everything in one response? */
+  svn_boolean_t send_all_mode;
+
   /* Is the server including properties inline for newly added
      files/dirs? */
   svn_boolean_t add_props_included;
@@ -1448,17 +1454,6 @@ fetch_file(report_context_t *ctx, report
   /* What connection should we go on? */
   conn = get_best_connection(ctx);
 
-  /* go fetch info->name from DAV:checked-in */
-  info->url = svn_ra_serf__get_ver_prop(info->props, info->base_name,
-                                        info->base_rev, "DAV:", "checked-in");
-
-  if (!info->url)
-    {
-      return svn_error_create(SVN_ERR_RA_DAV_MALFORMED_DATA, NULL,
-                              _("The REPORT or PROPFIND response did not "
-                                "include the requested checked-in value"));
-    }
-
   /* If needed, create the PROPFIND to retrieve the file's properties. */
   info->propfind_handler = NULL;
   if (info->fetch_props)
@@ -1625,9 +1620,15 @@ start_report(svn_ra_serf__xml_parser_t *
 
   if (state == NONE && strcmp(name.name, "update-report") == 0)
     {
-      const char *val = svn_xml_get_attr_value("inline-props", attrs);
+      const char *val;
+
+      val = svn_xml_get_attr_value("inline-props", attrs);
       if (val && (strcmp(val, "true") == 0))
         ctx->add_props_included = TRUE;
+
+      val = svn_xml_get_attr_value("send-all", attrs);
+      if (val && (strcmp(val, "true") == 0))
+        ctx->send_all_mode = TRUE;
     }
   else if (state == NONE && strcmp(name.name, "target-revision") == 0)
     {
@@ -1830,7 +1831,11 @@ start_report(svn_ra_serf__xml_parser_t *
       info = push_state(parser, ctx, ADD_FILE);
 
       info->base_rev = SVN_INVALID_REVNUM;
-      info->fetch_file = TRUE;
+
+      /* If the server isn't in "send-all" mode, we should expect to
+         fetch contents for added files. */
+      if (! ctx->send_all_mode)
+        info->fetch_file = TRUE;
 
       /* If the server isn't included properties for added items,
          we'll need to fetch them ourselves. */
@@ -2070,7 +2075,31 @@ start_report(svn_ra_serf__xml_parser_t *
              addition to <fetch-file>s and such) when *not* in
              "send-all" mode.  As a client, we're smart enough to know
              that's wrong, so we'll just ignore these tags. */
-          ;
+          if (ctx->send_all_mode)
+            {
+              const svn_delta_editor_t *update_editor = ctx->update_editor;
+
+              info = push_state(parser, ctx, TXDELTA);
+
+              if (! info->file_baton)
+                {
+                  SVN_ERR(open_updated_file(info, FALSE, info->pool));
+                }
+
+              info->base_checksum = svn_xml_get_attr_value("base-checksum",
+                                                           attrs);
+              SVN_ERR(update_editor->apply_textdelta(info->file_baton,
+                                                     info->base_checksum,
+                                                     info->editor_pool,
+                                                     &info->textdelta,
+                                                     &info->textdelta_baton));
+              info->svndiff_decoder = svn_txdelta_parse_svndiff(
+                                          info->textdelta,
+                                          info->textdelta_baton,
+                                          TRUE, info->pool);
+              info->base64_decoder = svn_base64_decode(info->svndiff_decoder,
+                                                       info->pool);
+            }
         }
       else
         {
@@ -2264,13 +2293,89 @@ end_report(svn_ra_serf__xml_parser_t *pa
           info->delta_base = value ? value->data : NULL;
         }
 
-      SVN_ERR(fetch_file(ctx, info));
+      /* go fetch info->name from DAV:checked-in */
+      info->url = svn_ra_serf__get_ver_prop(info->props, info->base_name,
+                                            info->base_rev, "DAV:", "checked-in");
+      if (!info->url)
+        {
+          return svn_error_create(SVN_ERR_RA_DAV_MALFORMED_DATA, NULL,
+                                  _("The REPORT or PROPFIND response did not "
+                                    "include the requested checked-in value"));
+        }
+
+      /* If the server is in "send-all" mode, we might have opened the
+         file when we started seeing content for it.  If we didn't get
+         any content for it, we still need to open the file.  But in
+         any case, we can then immediately close it.  */
+      if (ctx->send_all_mode)
+        {
+          if (! info->file_baton)
+            {
+              SVN_ERR(open_updated_file(info, FALSE, info->pool));
+            }
+          SVN_ERR(close_updated_file(info, info->pool));
+          info->dir->ref_count--;
+        }
+      /* Otherwise, if the server is *not* in "send-all" mode, we
+         should be at a point where we can queue up any auxiliary
+         content-fetching requests.  */
+      else
+        {
+          SVN_ERR(fetch_file(ctx, info));
+        }
+
       svn_ra_serf__xml_pop_state(parser);
     }
   else if (state == ADD_FILE && strcmp(name.name, "add-file") == 0)
     {
-      /* We should have everything we need to fetch the file. */
-      SVN_ERR(fetch_file(ctx, parser->state->private));
+      report_info_t *info = parser->state->private;
+
+      /* go fetch info->name from DAV:checked-in */
+      info->url = svn_ra_serf__get_ver_prop(info->props, info->base_name,
+                                            info->base_rev, "DAV:", "checked-in");
+      if (!info->url)
+        {
+          return svn_error_create(SVN_ERR_RA_DAV_MALFORMED_DATA, NULL,
+                                  _("The REPORT or PROPFIND response did not "
+                                    "include the requested checked-in value"));
+        }
+
+      /* If the server is in "send-all" mode, we might have opened the
+         file when we started seeing content for it.  If we didn't get
+         any content for it, we still need to open the file.  But in
+         any case, we can then immediately close it.  */
+      if (ctx->send_all_mode)
+        {
+          if (! info->file_baton)
+            {
+              SVN_ERR(open_updated_file(info, FALSE, info->pool));
+            }
+          SVN_ERR(close_updated_file(info, info->pool));
+          info->dir->ref_count--;
+        }
+      /* Otherwise, if the server is *not* in "send-all" mode, we
+         should be at a point where we can queue up any auxiliary
+         content-fetching requests.  */
+      else
+        {
+          SVN_ERR(fetch_file(ctx, info));
+        }
+
+      svn_ra_serf__xml_pop_state(parser);
+    }
+  else if (state == TXDELTA && strcmp(name.name, "txdelta") == 0)
+    {
+      report_info_t *info = parser->state->private;
+
+      /* Pre 1.2, mod_dav_svn was using <txdelta> tags (in addition to
+         <fetch-file>s and such) when *not* in "send-all" mode.  As a
+         client, we're smart enough to know that's wrong, so when not
+         in "receiving-all" mode, we'll ignore these tags. */
+      if (ctx->send_all_mode)
+        {
+          SVN_ERR(svn_stream_close(info->base64_decoder));
+        }
+
       svn_ra_serf__xml_pop_state(parser);
     }
   else if (state == PROP)
@@ -2397,6 +2502,27 @@ cdata_report(svn_ra_serf__xml_parser_t *
 
       svn_stringbuf_appendbytes(info->prop_value, data, len);
     }
+  else if (parser->state->current_state == TXDELTA)
+    {
+      /* Pre 1.2, mod_dav_svn was using <txdelta> tags (in addition to
+         <fetch-file>s and such) when *not* in "send-all" mode.  As a
+         client, we're smart enough to know that's wrong, so when not
+         in "receiving-all" mode, we'll ignore these tags. */
+      if (ctx->send_all_mode)
+        {
+          apr_size_t nlen = len;
+          report_info_t *info = parser->state->private;
+
+          SVN_ERR(svn_stream_write(info->base64_decoder, data, &nlen));
+          if (nlen != len)
+            {
+              /* Short write without associated error?  "Can't happen." */
+              return svn_error_createf(SVN_ERR_STREAM_UNEXPECTED_EOF, NULL,
+                                       _("Error writing to '%s': unexpected EOF"),
+                                       info->name);
+            }
+        }
+    }
 
   return SVN_NO_ERROR;
 }
@@ -3035,9 +3161,19 @@ make_update_reporter(svn_ra_session_t *r
                                    svn_io_file_del_on_pool_cleanup,
                                    report->pool, scratch_pool));
 
+#ifdef SVN_RA_SERF__UPDATES_SEND_ALL
+  svn_xml_make_open_tag(&buf, scratch_pool, svn_xml_normal, "S:update-report",
+                        "xmlns:S", SVN_XML_NAMESPACE, "send-all", "true",
+                        NULL);
+#else
   svn_xml_make_open_tag(&buf, scratch_pool, svn_xml_normal, "S:update-report",
                         "xmlns:S", SVN_XML_NAMESPACE,
                         NULL);
+  /* Subversion 1.8+ servers can be told to send properties for newly
+     added items inline even when doing a skelta response. */
+  make_simple_xml_tag(&buf, "S:include-props", "yes", scratch_pool);
+#endif
+
 
   make_simple_xml_tag(&buf, "S:src-path", report->source, scratch_pool);
 
@@ -3076,9 +3212,18 @@ make_update_reporter(svn_ra_session_t *r
       make_simple_xml_tag(&buf, "S:recursive", "no", scratch_pool);
     }
 
-  /* Subversion 1.8+ servers can be told to send properties for newly
-     added items inline even when doing a skelta response. */
-  make_simple_xml_tag(&buf, "S:include-props", "yes", scratch_pool);
+  /* When in 'send-all' mode, mod_dav_svn will assume that it should
+     calculate and transmit real text-deltas (instead of empty windows
+     that merely indicate "text is changed") unless it finds this
+     element.  When not in 'send-all' mode, mod_dav_svn will never
+     send text-deltas at all.
+
+     NOTE: Do NOT count on servers actually obeying this, as some exist
+     which obey send-all, but do not check for this directive at all! */
+  if (! text_deltas)
+    {
+      make_simple_xml_tag(&buf, "S:text-deltas", "no", scratch_pool);
+    }
 
   make_simple_xml_tag(&buf, "S:depth", svn_depth_to_word(depth), scratch_pool);
 



Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On Sat, Dec 1, 2012 at 12:00 PM, Mark Phippard <ma...@gmail.com> wrote:

> I feel pretty strongly that we should at minimum use the send-all
> approach when talking to pre-1.8 servers.  Even though in some
> situations it could still offer good performance.  I just think it
> would be more respectful to our users (server admins in this case) to
> not change this behavior in a way that could surprise them.  Maybe we
> could come up with exceptions, such as older servers that are using
> the SVNAllowBulkUpdates off directive.  In that situation we should
> use the new behavior since that is basically what that directive is
> asking for.
>

Without a lot of concrete feedback that parallel updates should be removed
by default, I strongly believe that we should not be conservative on this
issue.  The issue here is not one of compatibility - ra_serf has been
around for years and can talk just fine to older servers (way back to prior
to 1.0 servers actually).  The only argument against altering the default
behavior is that there might be an admin of a high-traffic site somewhere
that might suddenly be shocked by more HTTP requests coming in.  I honestly
have little to no sympathy for such an admin who doesn't properly
understand how to manage a large installation - they likely have other
issues that they are not paying attention to.  Until we have hoards of
users coming in and complaining about this, I think it's silly to be
conservative here.

I'm definitely not against giving knobs to the client or to the admin in
weird corner cases (provided someone cares enough to write that up), but I
strongly believe that for now we should do the right thing out of the box
in 1.8 - which is to utilize parallel updates.  -- justin

Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

Posted by Mark Phippard <ma...@gmail.com>.
On Sat, Dec 1, 2012 at 12:36 AM, Justin Erenkrantz
<ju...@erenkrantz.com> wrote:
> On Fri, Nov 30, 2012 at 4:54 PM, <cm...@apache.org> wrote:
>>
>> Author: cmpilato
>> Date: Fri Nov 30 21:54:35 2012
>> New Revision: 1415864
>>
>> URL: http://svn.apache.org/viewvc?rev=1415864&view=rev
>> Log:
>> Implement in ra_serf "send-all" mode support for update-style REPORTs
>> and their responses.  (Currently disabled by compile-time conditionals.)
>>
>> (This one goes out to Ivan Zhakov.)
>
>
> I've stated for a long time that I think the send-all mode is a huge mistake
> architecturally because it is too prone to double-compression and TCP
> pipeline stalls and is a tremendous burden on a properly-configured httpd
> (by not taking advantage of server-side parallelism), it's nice to see it's
> not *too* hard to shoehorn this bad idea back into ra_serf.  We'd never be
> able to shove the non-send-all approach into ra_neon.  =)

Just to be clear, I do not believe anyone is suggesting we completely
abandon the non-send-all approach.  I like that this approach can
offer good performance on a well-configured server as well as enable
new features/ideas such as not even fetching the full-texts that we
already have locally.  I think the question is simply what is the best
way to deliver this.

> Here's my suggestion for consideration - let's experiment with this setting
> in the beta release process with the setting as-is - that is we always do
> the parallel updates unconditionally (except perhaps when svnrdump is being
> stupid).  If we get real users complaining about the update during that
> cycle, we can then figure out either switching the default and/or adding a
> config-option or even allowing some control via capabilities exchange.

I feel pretty strongly that we should at minimum use the send-all
approach when talking to pre-1.8 servers.  Even though in some
situations it could still offer good performance.  I just think it
would be more respectful to our users (server admins in this case) to
not change this behavior in a way that could surprise them.  Maybe we
could come up with exceptions, such as older servers that are using
the SVNAllowBulkUpdates off directive.  In that situation we should
use the new behavior since that is basically what that directive is
asking for.

As I said in another thread, I think we should treat a 1.8 server the
same way and require someone that was upgrading to add some new
directive to enable the new feature.  This would allow a server admin
to setup his server correctly, including using things like
mod_deflate, and turn on the new behavior rather than get it
automatically simply because they upgraded their binaries.

This seems like it satisfies everyone.  Existing users, especially
those running older server versions, would not be surprised by new and
unwanted client behavior, and it would still be easy to configure a
new server properly to support the non-send-all mode when it was
desired.  I just do not see what the downside would be to approaching
it this way.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

Posted by Daniel Shahaf <da...@apache.org>.
On Sat, Dec 01, 2012 at 12:36:30AM -0500, Justin Erenkrantz wrote:
> Here's my suggestion for consideration - let's experiment with this setting
> in the beta release process with the setting as-is - that is we always do
> the parallel updates unconditionally (except perhaps when svnrdump is being
> stupid).  If we get real users complaining about the update during that
> cycle, we can then figure out either switching the default and/or adding a
> config-option or even allowing some control via capabilities exchange.
> 
> Does that sound reasonable?  -- justin

Why wait until beta?  Can't we get such feedback by asking people to test the
nightly releases, or by tagging r1415926 and releasing it as pre-alpha?

Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

Posted by Lieven Govaerts <lg...@mobsol.be>.
Hi Markus,

On Mon, Dec 3, 2012 at 10:43 AM, Markus Schaber <m....@codesys.com> wrote:
> Hi,
>
> Just another crazy idea:
>
> The main problem with the parallelization in ra_serf seems to be the number of http requests (which potentially causes a high number of authentications and tcp connections).
>
> Maybe we could add some partitioned send-all request:
>
> When the client decides to use 4 connections, it could send 4 requests, with some parameter like send-all(1/4), send-all(2/4), ..., send-all(4/4).
>
> Then the server can send one quarter of the complete response on each connection.
>
> An advanced server could even share the common state of those 4 requests through some shared memory / caching scheme, to avoid doing the same work multiple times.
>
> Years ago, I implemented a similar scheme between caching GIS web frontend servers, and the rendering backend server, in the protocol for fetching and rendering the map tiles. It gave a nearly linear speedup with the number of connections, up to the point where the CPUs were saturated.
>

the concept implemented in ra_serf is to parallelize individual GET
requests, so that these can be cached by proxies either on the client
or on the server side. So we want to avoid using send-all as much as
possible, as this creates always one large uncacheable response.

I've made a mental note of your idea though, if we need to stick with
send-all and further improve it, your suggestion is one way to do
this.

>
> Best regards
>
> Markus Schaber
>

thanks,

Lieven
[..]

AW: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

Posted by Markus Schaber <m....@codesys.com>.
Hi,

Just another crazy idea:

The main problem with the parallelization in ra_serf seems to be the number of http requests (which potentially causes a high number of authentications and tcp connections).

Maybe we could add some partitioned send-all request:

When the client decides to use 4 connections, it could send 4 requests, with some parameter like send-all(1/4), send-all(2/4), ..., send-all(4/4).

Then the server can send one quarter of the complete response on each connection.

An advanced server could even share the common state of those 4 requests through some shared memory / caching scheme, to avoid doing the same work multiple times.

Years ago, I implemented a similar scheme between caching GIS web frontend servers, and the rendering backend server, in the protocol for fetching and rendering the map tiles. It gave a nearly linear speedup with the number of connections, up to the point where the CPUs were saturated.


Best regards

Markus Schaber

CODESYS® a trademark of 3S-Smart Software Solutions GmbH 

Inspiring Automation Solutions 
________________________________________
3S-Smart Software Solutions GmbH 
Dipl.-Inf. Markus Schaber | Product Development Core Technology 
Memminger Str. 151 | 87439 Kempten | Germany 
Tel. +49-831-54031-979 | Fax +49-831-54031-50 

E-Mail: m.schaber@codesys.com | Web: codesys.com 
CODESYS internet forum: forum.codesys.com 

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915 
Von: justin.erenkrantz@gmail.com [mailto:justin.erenkrantz@gmail.com] Im Auftrag von Justin Erenkrantz
Gesendet: Samstag, 1. Dezember 2012 15:15
An: Lieven Govaerts
Cc: Johan Corveleyn; dev@subversion.apache.org
Betreff: Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

On Sat, Dec 1, 2012 at 9:01 AM, Lieven Govaerts <sv...@mobsol.be> wrote:
There are some scenario's where either the server admin or the user
can decide if parallel requests make sense or not.

I'm specifically thinking of the use Kerberos per request
authentication. These responses can't be cached on the client side,
and require the authorization header to be sent for each request.
Assuming 2 step handshake of which serf can bypass the first, this
means an overhead per request of 1-10KB, with a 3 step handshake each
request has to be sent twice further increasing the overhead.
IMHO in this scenario the server admin should be able to veto the use
of parallel requests.

And the same is true for https connections, where it's also the server
admin who can decide if the necessary caches have been put in place to
enable the benefits of parallel requests.

Totally agreed.  I'd favor a three-value httpd directive option on the server-side that is advertised in the capabilities exchange:

- default (client defaults to parallel if ra_serf, serial if older ra_neon client; or if client overrides ra_serf via their local servers options)
- serial (server suggests to client that it should be serial; but permit parallel when client wants it)
- force-serial (same capability advertisement, but always trigger send-all responses regardless of what client asks for)

I'm 95% sure we have code in ra_serf that handles the case where the server sends us inline responses anyway as older (prior to 1.2, IIRC) always sent inline responses no matter what we send...so, it should be fairly straightforward decision tree with minimal code changes.

My $.02...which is still not enough for me to write the patch.  =)  -- justin

Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On Sat, Dec 1, 2012 at 9:01 AM, Lieven Govaerts <sv...@mobsol.be> wrote:

> There are some scenario's where either the server admin or the user
> can decide if parallel requests make sense or not.
>
> I'm specifically thinking of the use Kerberos per request
> authentication. These responses can't be cached on the client side,
> and require the authorization header to be sent for each request.
> Assuming 2 step handshake of which serf can bypass the first, this
> means an overhead per request of 1-10KB, with a 3 step handshake each
> request has to be sent twice further increasing the overhead.
> IMHO in this scenario the server admin should be able to veto the use
> of parallel requests.
>
> And the same is true for https connections, where it's also the server
> admin who can decide if the necessary caches have been put in place to
> enable the benefits of parallel requests.
>

Totally agreed.  I'd favor a three-value httpd directive option on the
server-side that is advertised in the capabilities exchange:

- default (client defaults to parallel if ra_serf, serial if older ra_neon
client; or if client overrides ra_serf via their local servers options)
- serial (server suggests to client that it should be serial; but permit
parallel when client wants it)
- force-serial (same capability advertisement, but always trigger send-all
responses regardless of what client asks for)

I'm 95% sure we have code in ra_serf that handles the case where the server
sends us inline responses anyway as older (prior to 1.2, IIRC) always sent
inline responses no matter what we send...so, it should be fairly
straightforward decision tree with minimal code changes.

My $.02...which is still not enough for me to write the patch.  =)  --
justin

Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

Posted by Lieven Govaerts <sv...@mobsol.be>.
On Sat, Dec 1, 2012 at 2:31 PM, Justin Erenkrantz <ju...@erenkrantz.com> wrote:
> On Sat, Dec 1, 2012 at 5:59 AM, Johan Corveleyn <jc...@gmail.com> wrote:
>>
>> I'm wondering whether your concerns apply to both internet-wide
>> deployments and local (all on the same LAN) ones.
>
>
> That line is certainly a fair one to draw in the sand.  That said, I think
> the internal use case cries out even *more* for the parallel updates as the
> internal server in that environment is often wildly over-provisioned on the
> CPU side - with a fairly low-traffic environment, you want to take advantage
> of the parallel cores of a CPU to drive the updates.
>
> Generally speaking, what I discovered years ago back in 2006 (yikes) and I
> believe is still true as we near 2013 (shudder), if everything else is
> perfectly optimized (disk, latency, bandwidth, etc.), you're going to
> eventually bottleneck on the checksumming on both client and server - which
> is entirely CPU-bound and is expensive.  You can solve that by splitting out
> the work across multiple cores - for a server, you need to utilize multiple
> parallel requests in-flight; and for a client, you then need to parallelize
> the editor drive.
>
> The reason that disk isn't such a bottleneck as you might first expect is
> due to the OS's buffer cache - for reads on the server-side, common data is
> already going to be in RAM so hot spots in the fsfs repos will already be in
> memory, for writes on the client-side, modern client OSes won't necessarily
> block you until everything is sync'd to disk.  But, once you exhaust the
> capabilities of RAM, your underlying disk architecture matters a lot and one
> that might not be intuitive to those that haven't spent a lot of time
> closely with them.  (Hi Brane!)  If you are using direct-attached storage
> locally on either server or client, then you will probably be bottlenecked
> right there.  However, if your corporate environment has an NFS filer or SAN
> (a la NetApp/EMC) backing the FSFS repository or as NFS working copies (oh
> so common), those large disk subsystems are geared towards parallel I/Os -
> not single-threaded I/O performance - Isilon/BlueArc-class storage is
> however; but I've yet to see anyone obsessed enough about SVN I/O perf to
> place either their repository or working copies on a BlueArc-class storage
> system!  So, if you are not using direct-attached storage and are using NFS
> today in a corporate environment on either client or server, then you want
> to parallelize everything so that you can take advantage of the disk/network
> I/O architecture preferred by NetApp/EMC.  Throwing more cores against a
> NetApp/EMC storage system in a high-available bandwidth environment allows
> for linear performance returns (i.e., reading/writing one I/O is 1X, two
> threads is 2X, three threads is 3X, etc, etc.).
>
> To that end, I'd eventually love to see ra_serf drive the update editor
> across multiple threads so that the checksum and disk I/O bottleneck can be
> distributed across cores on the client-side as well.  Compared to where we
> were in 2006, that's the biggest inefficiency we have yet to solve and take
> advantage of.  And, I'm sure this'll break all sorts of promises in the Ev1
> and perhaps Ev2 world and drive C-Mike even crazier.  =)  But, if you want
> to put a rocket pack on our HTTP performance, that's exactly what we should
> do.  I'm reasonably certain that serf itself could be finely tuned to handle
> network I/O in a single thread at or close to wire-speed even on a 10G
> connection with a modern processor/OS - it's what we do with the file
> contents/textdeltas that needs to be shoved to a different set of worker
> threads and remove all of that libsvn_wc processing from blocking network
> traffic processing and get it all distributed and thread-safe.  If we do
> that, woah, I'd bet that we are we going to make things way faster across
> the board and completely blow everything else out of the water when our
> available bandwidth is high - which is the case in an internal network.
> And, yes, that clearly could all be done in time for 1.8 without
> jeopardizing the timelines one tiny bit.  =P
>
> So, that's my long-winded answer of saying that, yah, even in an internal
> LAN environment, you still want to parallelize.
>
> However, I'm definitely not going to veto a patch that would add an httpd
> directive that allows the server to steer the client - unless overridden by
> the client's local config - to using parallel updates or not.  -- justin

There are some scenario's where either the server admin or the user
can decide if parallel requests make sense or not.

I'm specifically thinking of the use Kerberos per request
authentication. These responses can't be cached on the client side,
and require the authorization header to be sent for each request.
Assuming 2 step handshake of which serf can bypass the first, this
means an overhead per request of 1-10KB, with a 3 step handshake each
request has to be sent twice further increasing the overhead.
IMHO in this scenario the server admin should be able to veto the use
of parallel requests.

And the same is true for https connections, where it's also the server
admin who can decide if the necessary caches have been put in place to
enable the benefits of parallel requests.

Lieven

Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Justin Erenkrantz wrote on Sat, Dec 01, 2012 at 09:50:29 -0500:
> On Sat, Dec 1, 2012 at 9:41 AM, Branko Čibej <br...@wandisco.com> wrote:
> 
> > On 01.12.2012 14:31, Justin Erenkrantz wrote:
> > > And, yes, that clearly could all be done in time for 1.8 without
> > > jeopardizing the timelines one tiny bit. =P
> >
> > Eep ... :)
> >
> >
> > Another thing I've been thinking about is this: Why are we using SHA1
> > checksums on the server and on the wire for consistency checks when a
> > 64-bit CRC would do the job just as well, and 15 times cheaper? And
> > banging my head against the wall for not thinking of this 10 years ago.
> >
> > I can sort of understand the use of SHA1 as a content index for
> > client-side pristine files. On the server, however ... dunno. Maybe we
> > could design something akin to what the rsync protocol does, but for
> > repository-wide data storage. Could be quite tricky to achieve locality,
> > however.
> >
> 
> The one thing that's nice with using SHA checksums is we're using it
> everywhere.  It makes protocol debugging a *lot* easier - since we also
> used SHA checksums as the content index, that makes it easier to compare
> what we recorded in libsvn_wc to what was sent by the server.  If we
> diverged the checksums algorithms, it'd be hard to do a quick comparison
> visually (do the checksums match?) without actually running the checksum
> yourself!

If that's the problem, have the server send the recorded-in-fs sha1
checksum as an attribute that the client ignores. (in SVN_DEBUG builds
only)

> 
> So, I think we optimized for humans here...and I'm okay with that.  We can
> always build faster processors...and take advantage of parallelism.  =)
> 
> There I go off on a tangent again.
> >
> 
> *grin*  -- justin

Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On Sat, Dec 1, 2012 at 9:41 AM, Branko Čibej <br...@wandisco.com> wrote:

> On 01.12.2012 14:31, Justin Erenkrantz wrote:
> > And, yes, that clearly could all be done in time for 1.8 without
> > jeopardizing the timelines one tiny bit. =P
>
> Eep ... :)
>
>
> Another thing I've been thinking about is this: Why are we using SHA1
> checksums on the server and on the wire for consistency checks when a
> 64-bit CRC would do the job just as well, and 15 times cheaper? And
> banging my head against the wall for not thinking of this 10 years ago.
>
> I can sort of understand the use of SHA1 as a content index for
> client-side pristine files. On the server, however ... dunno. Maybe we
> could design something akin to what the rsync protocol does, but for
> repository-wide data storage. Could be quite tricky to achieve locality,
> however.
>

The one thing that's nice with using SHA checksums is we're using it
everywhere.  It makes protocol debugging a *lot* easier - since we also
used SHA checksums as the content index, that makes it easier to compare
what we recorded in libsvn_wc to what was sent by the server.  If we
diverged the checksums algorithms, it'd be hard to do a quick comparison
visually (do the checksums match?) without actually running the checksum
yourself!

So, I think we optimized for humans here...and I'm okay with that.  We can
always build faster processors...and take advantage of parallelism.  =)

There I go off on a tangent again.
>

*grin*  -- justin

Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

Posted by Branko Čibej <br...@wandisco.com>.
On 01.12.2012 14:31, Justin Erenkrantz wrote:
> And, yes, that clearly could all be done in time for 1.8 without
> jeopardizing the timelines one tiny bit. =P

Eep ... :)


Another thing I've been thinking about is this: Why are we using SHA1
checksums on the server and on the wire for consistency checks when a
64-bit CRC would do the job just as well, and 15 times cheaper? And
banging my head against the wall for not thinking of this 10 years ago.

I can sort of understand the use of SHA1 as a content index for
client-side pristine files. On the server, however ... dunno. Maybe we
could design something akin to what the rsync protocol does, but for
repository-wide data storage. Could be quite tricky to achieve locality,
however.

There I go off on a tangent again.

-- Brane

-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com


Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On Sat, Dec 1, 2012 at 5:59 AM, Johan Corveleyn <jc...@gmail.com> wrote:

> I'm wondering whether your concerns apply to both internet-wide
> deployments and local (all on the same LAN) ones.
>

That line is certainly a fair one to draw in the sand.  That said, I think
the internal use case cries out even *more* for the parallel updates as the
internal server in that environment is often wildly over-provisioned on the
CPU side - with a fairly low-traffic environment, you want to take
advantage of the parallel cores of a CPU to drive the updates.

Generally speaking, what I discovered years ago back in 2006 (yikes) and I
believe is still true as we near 2013 (shudder), if everything else is
perfectly optimized (disk, latency, bandwidth, etc.), you're going to
eventually bottleneck on the checksumming on both client and server - which
is entirely CPU-bound and is expensive.  You can solve that by splitting
out the work across multiple cores - for a server, you need to utilize
multiple parallel requests in-flight; and for a client, you then need to
parallelize the editor drive.

The reason that disk isn't such a bottleneck as you might first expect is
due to the OS's buffer cache - for reads on the server-side, common data is
already going to be in RAM so hot spots in the fsfs repos will already be
in memory, for writes on the client-side, modern client OSes won't
necessarily block you until everything is sync'd to disk.  But, once you
exhaust the capabilities of RAM, your underlying disk architecture matters
a lot and one that might not be intuitive to those that haven't spent a lot
of time closely with them.  (Hi Brane!)  If you are using direct-attached
storage locally on either server or client, then you will probably be
bottlenecked right there.  However, if your corporate environment has an
NFS filer or SAN (a la NetApp/EMC) backing the FSFS repository or as NFS
working copies (oh so common), those large disk subsystems are geared
towards parallel I/Os - not single-threaded I/O performance -
Isilon/BlueArc-class storage is however; but I've yet to see anyone
obsessed enough about SVN I/O perf to place either their repository or
working copies on a BlueArc-class storage system!  So, if you are not using
direct-attached storage and are using NFS today in a corporate environment
on either client or server, then you want to parallelize everything so that
you can take advantage of the disk/network I/O architecture preferred by
NetApp/EMC.  Throwing more cores against a NetApp/EMC storage system in a
high-available bandwidth environment allows for linear performance returns
(i.e., reading/writing one I/O is 1X, two threads is 2X, three threads is
3X, etc, etc.).

To that end, I'd eventually love to see ra_serf drive the update editor
across multiple threads so that the checksum and disk I/O bottleneck can be
distributed across cores on the client-side as well.  Compared to where we
were in 2006, that's the biggest inefficiency we have yet to solve and take
advantage of.  And, I'm sure this'll break all sorts of promises in the Ev1
and perhaps Ev2 world and drive C-Mike even crazier.  =)  But, if you want
to put a rocket pack on our HTTP performance, that's exactly what we should
do.  I'm reasonably certain that serf itself could be finely tuned to
handle network I/O in a single thread at or close to wire-speed even on a
10G connection with a modern processor/OS - it's what we do with the file
contents/textdeltas that needs to be shoved to a different set of worker
threads and remove all of that libsvn_wc processing from blocking network
traffic processing and get it all distributed and thread-safe.  If we do
that, woah, I'd bet that we are we going to make things way faster across
the board and completely blow everything else out of the water when our
available bandwidth is high - which is the case in an internal network.
 And, yes, that clearly could all be done in time for 1.8 without
jeopardizing the timelines one tiny bit.  =P

So, that's my long-winded answer of saying that, yah, even in an internal
LAN environment, you still want to parallelize.

However, I'm definitely not going to veto a patch that would add an httpd
directive that allows the server to steer the client - unless overridden by
the client's local config - to using parallel updates or not.  -- justin

Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

Posted by Johan Corveleyn <jc...@gmail.com>.
On Sat, Dec 1, 2012 at 6:36 AM, Justin Erenkrantz <ju...@erenkrantz.com> wrote:
> On Fri, Nov 30, 2012 at 4:54 PM, <cm...@apache.org> wrote:
>>
>> Author: cmpilato
>> Date: Fri Nov 30 21:54:35 2012
>> New Revision: 1415864
>>
>> URL: http://svn.apache.org/viewvc?rev=1415864&view=rev
>> Log:
>> Implement in ra_serf "send-all" mode support for update-style REPORTs
>> and their responses.  (Currently disabled by compile-time conditionals.)
>>
>> (This one goes out to Ivan Zhakov.)
>
>
> I've stated for a long time that I think the send-all mode is a huge mistake
> architecturally because it is too prone to double-compression and TCP
> pipeline stalls and is a tremendous burden on a properly-configured httpd
> (by not taking advantage of server-side parallelism), it's nice to see it's
> not *too* hard to shoehorn this bad idea back into ra_serf.  We'd never be
> able to shove the non-send-all approach into ra_neon.  =)

I'm wondering whether your concerns apply to both internet-wide
deployments and local (all on the same LAN) ones.

It seems to me that SVN has two sets of audiences when it comes to
networking: some have to support users over the internet with
sometimes slow and high-latency, perhaps flaky connections; and others
have all their users on a local (or almost-local) network, and want to
make optimal use of their infrastructure, which offers an absolutely
rock-solid low-latency connection ... they'd like to shove the content
through that (wide, short) pipe as quickly as possible.

I'm no expert, but I suppose it's possible that those two audiences
need two different networking configurations to make optimal use of
their environment. If that's the case, it would be great if we could
offer some (clear, simple to use) configuration directives for those
admins to tune things ...

Just my 2 cents ...
-- 
Johan

Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On Fri, Nov 30, 2012 at 4:54 PM, <cm...@apache.org> wrote:

> Author: cmpilato
> Date: Fri Nov 30 21:54:35 2012
> New Revision: 1415864
>
> URL: http://svn.apache.org/viewvc?rev=1415864&view=rev
> Log:
> Implement in ra_serf "send-all" mode support for update-style REPORTs
> and their responses.  (Currently disabled by compile-time conditionals.)
>
> (This one goes out to Ivan Zhakov.)
>

I've stated for a long time that I think the send-all mode is a huge
mistake architecturally because it is too prone to double-compression and
TCP pipeline stalls and is a tremendous burden on a properly-configured
httpd (by not taking advantage of server-side parallelism), it's nice to
see it's not *too* hard to shoehorn this bad idea back into ra_serf.  We'd
never be able to shove the non-send-all approach into ra_neon.  =)

Here's my suggestion for consideration - let's experiment with this setting
in the beta release process with the setting as-is - that is we always do
the parallel updates unconditionally (except perhaps when svnrdump is being
stupid).  If we get real users complaining about the update during that
cycle, we can then figure out either switching the default and/or adding a
config-option or even allowing some control via capabilities exchange.

Does that sound reasonable?  -- justin

Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

Posted by Mark Phippard <ma...@gmail.com>.
On Fri, Nov 30, 2012 at 5:38 PM, C. Michael Pilato <cm...@collab.net> wrote:
> On 11/30/2012 05:25 PM, Mark Phippard wrote:
>> On Fri, Nov 30, 2012 at 5:23 PM, C. Michael Pilato <cm...@collab.net> wrote:
>>> On 11/30/2012 05:00 PM, Mark Phippard wrote:
>>>> On Fri, Nov 30, 2012 at 4:54 PM,  <cm...@apache.org> wrote:
>>>>> Author: cmpilato
>>>>> Date: Fri Nov 30 21:54:35 2012
>>>>> New Revision: 1415864
>>>>>
>>>>> URL: http://svn.apache.org/viewvc?rev=1415864&view=rev
>>>>> Log:
>>>>> Implement in ra_serf "send-all" mode support for update-style REPORTs
>>>>> and their responses.  (Currently disabled by compile-time conditionals.)
>>>>
>>>> Sweet!
>>>>
>>>> Would this also resolve the issue with svnrdump, or could it?  When
>>>> Serf is using this mode, I assume it is also now conforming to Ev1?
>>>
>>> I guess it *could* based on what I'm reading is considered the source of
>>> svnrdump+ra_serf's problems, but I'm a bit confused -- I thought svnrdump
>>> used the ra-replay API instead of the ra-update one?
>>
>> Guess I am more wondering if it was another area where the same
>> solution could be applied?
>
> No, that's just it.  ra_serf's implementation of the ra-replay API is
> single-connection, just like ra-neon's was.  What suprises me is that
> svnrdump *does* use the ra-update API.
>
> Ah!  I see why, now.  When not doing an incremental dump, 'svnrdump dump'
> uses the ra-update API to handle that initial checkout-like revision.  After
> that (and otherwise when in incremental mode), it uses the ra-replay API.
> So yes, I believe svnrdump would be in fine shape over ra-serf if it was
> asking the server to use this "send-all" mode, where document Ev1 drive
> ordering *should* be honored.

So this sounds like pretty great news.  Regardless what we decide to
do for Serf with normal updates, it seems like we could
unconditionally make svnrdump tap into the send-all mode and that
would remove a release blocker.


-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

Posted by "C. Michael Pilato" <cm...@collab.net>.
On 11/30/2012 05:25 PM, Mark Phippard wrote:
> On Fri, Nov 30, 2012 at 5:23 PM, C. Michael Pilato <cm...@collab.net> wrote:
>> On 11/30/2012 05:00 PM, Mark Phippard wrote:
>>> On Fri, Nov 30, 2012 at 4:54 PM,  <cm...@apache.org> wrote:
>>>> Author: cmpilato
>>>> Date: Fri Nov 30 21:54:35 2012
>>>> New Revision: 1415864
>>>>
>>>> URL: http://svn.apache.org/viewvc?rev=1415864&view=rev
>>>> Log:
>>>> Implement in ra_serf "send-all" mode support for update-style REPORTs
>>>> and their responses.  (Currently disabled by compile-time conditionals.)
>>>
>>> Sweet!
>>>
>>> Would this also resolve the issue with svnrdump, or could it?  When
>>> Serf is using this mode, I assume it is also now conforming to Ev1?
>>
>> I guess it *could* based on what I'm reading is considered the source of
>> svnrdump+ra_serf's problems, but I'm a bit confused -- I thought svnrdump
>> used the ra-replay API instead of the ra-update one?
> 
> Guess I am more wondering if it was another area where the same
> solution could be applied?

No, that's just it.  ra_serf's implementation of the ra-replay API is
single-connection, just like ra-neon's was.  What suprises me is that
svnrdump *does* use the ra-update API.

Ah!  I see why, now.  When not doing an incremental dump, 'svnrdump dump'
uses the ra-update API to handle that initial checkout-like revision.  After
that (and otherwise when in incremental mode), it uses the ra-replay API.
So yes, I believe svnrdump would be in fine shape over ra-serf if it was
asking the server to use this "send-all" mode, where document Ev1 drive
ordering *should* be honored.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Enterprise Cloud Development


Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

Posted by Mark Phippard <ma...@gmail.com>.
On Fri, Nov 30, 2012 at 5:23 PM, C. Michael Pilato <cm...@collab.net> wrote:
> On 11/30/2012 05:00 PM, Mark Phippard wrote:
>> On Fri, Nov 30, 2012 at 4:54 PM,  <cm...@apache.org> wrote:
>>> Author: cmpilato
>>> Date: Fri Nov 30 21:54:35 2012
>>> New Revision: 1415864
>>>
>>> URL: http://svn.apache.org/viewvc?rev=1415864&view=rev
>>> Log:
>>> Implement in ra_serf "send-all" mode support for update-style REPORTs
>>> and their responses.  (Currently disabled by compile-time conditionals.)
>>
>> Sweet!
>>
>> Would this also resolve the issue with svnrdump, or could it?  When
>> Serf is using this mode, I assume it is also now conforming to Ev1?
>
> I guess it *could* based on what I'm reading is considered the source of
> svnrdump+ra_serf's problems, but I'm a bit confused -- I thought svnrdump
> used the ra-replay API instead of the ra-update one?

Guess I am more wondering if it was another area where the same
solution could be applied?


-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

Posted by "C. Michael Pilato" <cm...@collab.net>.
On 11/30/2012 05:00 PM, Mark Phippard wrote:
> On Fri, Nov 30, 2012 at 4:54 PM,  <cm...@apache.org> wrote:
>> Author: cmpilato
>> Date: Fri Nov 30 21:54:35 2012
>> New Revision: 1415864
>>
>> URL: http://svn.apache.org/viewvc?rev=1415864&view=rev
>> Log:
>> Implement in ra_serf "send-all" mode support for update-style REPORTs
>> and their responses.  (Currently disabled by compile-time conditionals.)
> 
> Sweet!
> 
> Would this also resolve the issue with svnrdump, or could it?  When
> Serf is using this mode, I assume it is also now conforming to Ev1?

I guess it *could* based on what I'm reading is considered the source of
svnrdump+ra_serf's problems, but I'm a bit confused -- I thought svnrdump
used the ra-replay API instead of the ra-update one?

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Enterprise Cloud Development


Re: svn commit: r1415864 - /subversion/trunk/subversion/libsvn_ra_serf/update.c

Posted by Mark Phippard <ma...@gmail.com>.
On Fri, Nov 30, 2012 at 4:54 PM,  <cm...@apache.org> wrote:
> Author: cmpilato
> Date: Fri Nov 30 21:54:35 2012
> New Revision: 1415864
>
> URL: http://svn.apache.org/viewvc?rev=1415864&view=rev
> Log:
> Implement in ra_serf "send-all" mode support for update-style REPORTs
> and their responses.  (Currently disabled by compile-time conditionals.)

Sweet!

Would this also resolve the issue with svnrdump, or could it?  When
Serf is using this mode, I assume it is also now conforming to Ev1?

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/