You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@subversion.apache.org by cm...@apache.org on 2012/05/08 18:56:43 UTC

svn commit: r1335639 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c

Author: cmpilato
Date: Tue May  8 16:56:42 2012
New Revision: 1335639

URL: http://svn.apache.org/viewvc?rev=1335639&view=rev
Log:
Avoid opening pristine store file handles until they are actually
required.

* subversion/libsvn_wc/adm_ops.c
  (get_pristine_lazyopen_baton_t): New private callback baton type.
  (get_pristine_lazyopen_func): Callback implementation.
  (svn_wc__get_pristine_contents_by_checksum): Use the new lazyopen
    stream mechanics to avoid opening a file handle to the pristine
    contents until it's first really needed.

Suggested by: gstein

Modified:
    subversion/trunk/subversion/libsvn_wc/adm_ops.c

Modified: subversion/trunk/subversion/libsvn_wc/adm_ops.c
URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/libsvn_wc/adm_ops.c?rev=1335639&r1=1335638&r2=1335639&view=diff
==============================================================================
--- subversion/trunk/subversion/libsvn_wc/adm_ops.c (original)
+++ subversion/trunk/subversion/libsvn_wc/adm_ops.c Tue May  8 16:56:42 2012
@@ -2250,6 +2250,31 @@ svn_wc_get_pristine_contents2(svn_stream
                                                        scratch_pool));
 }
 
+
+typedef struct get_pristine_lazyopen_baton_t
+{
+  svn_wc_context_t *wc_ctx;
+  const char *wri_abspath;
+  const svn_checksum_t *sha1_checksum;
+
+} get_pristine_lazyopen_baton_t;
+
+
+/* Implements svn_stream_lazyopen_func_t */
+static svn_error_t *
+get_pristine_lazyopen_func(svn_stream_t **stream,
+                           void *baton,
+                           apr_pool_t *result_pool,
+                           apr_pool_t *scratch_pool)
+{
+  get_pristine_lazyopen_baton_t *b = baton;
+
+  SVN_ERR(svn_wc__db_pristine_read(stream, NULL, b->wc_ctx->db,
+                                   b->wri_abspath, b->sha1_checksum,
+                                   result_pool, scratch_pool));
+  return SVN_NO_ERROR;
+}
+
 svn_error_t *
 svn_wc__get_pristine_contents_by_checksum(svn_stream_t **contents,
                                           svn_wc_context_t *wc_ctx,
@@ -2258,9 +2283,27 @@ svn_wc__get_pristine_contents_by_checksu
                                           apr_pool_t *result_pool,
                                           apr_pool_t *scratch_pool)
 {
-  return svn_error_trace(svn_wc__db_pristine_read(contents, NULL, wc_ctx->db,
-                                                  wri_abspath, sha1_checksum,
-                                                  result_pool, scratch_pool));
+  svn_boolean_t present;
+  
+  *contents = NULL;
+
+  SVN_ERR(svn_wc__db_pristine_check(&present, wc_ctx->db, wri_abspath,
+                                    sha1_checksum, scratch_pool));
+
+  if (present)
+    {
+      get_pristine_lazyopen_baton_t *gpl_baton;
+
+      gpl_baton = apr_pcalloc(result_pool, sizeof(*gpl_baton));
+      gpl_baton->wc_ctx = wc_ctx;
+      gpl_baton->wri_abspath = wri_abspath;
+      gpl_baton->sha1_checksum = sha1_checksum;
+      
+      SVN_ERR(svn_stream_lazyopen_create(contents, get_pristine_lazyopen_func,
+                                         gpl_baton, result_pool));
+    }
+
+  return SVN_NO_ERROR;
 }
 
 svn_error_t *



Re: svn commit: r1335639 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c

Posted by Greg Stein <gs...@gmail.com>.
On Tue, May 8, 2012 at 12:56 PM,  <cm...@apache.org> wrote:
>...
> +++ subversion/trunk/subversion/libsvn_wc/adm_ops.c Tue May  8 16:56:42 2012
>...
> +  SVN_ERR(svn_wc__db_pristine_check(&present, wc_ctx->db, wri_abspath,
> +                                    sha1_checksum, scratch_pool));
> +
> +  if (present)
> +    {
> +      get_pristine_lazyopen_baton_t *gpl_baton;
> +
> +      gpl_baton = apr_pcalloc(result_pool, sizeof(*gpl_baton));
> +      gpl_baton->wc_ctx = wc_ctx;
> +      gpl_baton->wri_abspath = wri_abspath;
> +      gpl_baton->sha1_checksum = sha1_checksum;

What are the lifetime guarantees of the parameters? I'm guessing that
WC_CTX can be defined as "must live at least as long as the RA
session" (that you provide the callback to), but that WRI_ABSPATH and
SHA1_CHECKSUM need to be copied into RESULT_POOL.

>...

Great work!

Cheers,
-g

Re: svn commit: r1335639 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c

Posted by Greg Stein <gs...@gmail.com>.
On Tue, May 8, 2012 at 2:49 PM, Mark Phippard <ma...@gmail.com> wrote:
> On Tue, May 8, 2012 at 2:45 PM, C. Michael Pilato <cm...@collab.net> wrote:
>> On 05/08/2012 02:34 PM, Mark Phippard wrote:
>>> Now that I can run the test I wanted, the performance improvement is
>>> pretty nice.  Checking out our code goes from 1m35s down to 0m44s.  I
>>> cannot help but think that number should still be a lot lower though.
>>> This scenario seems like it would be very similar to what a Git
>>> checkout would do now, probably even less work has to be done.  I do
>>> not have a Git-svn version of our codebase to test with, but I am
>>> guessing a Git checkout of our code would be less than 10 seconds.  So
>>> it might be an indication we could be doing more optimization in our
>>> libraries.
>>>
>>> That said, I still think it is a nice improvement and I imagine it
>>> would scale up and down based on size and number of files.
>>>
>>> Does anyone have a git version of our tree they could try this with?
>>> How long does it take git to materialize a working copy of our trunk?
>>
>> As I said before, I suspect your numbers would be much lower if I wasn't
>> sending HEAD requests for each file.  Unfortunately, ra_serf is depending on
>> the ordering of the pipelined requests (PROPFIND results for a given path
>> must be processed before the contents of the file are fetched), and I don't
>> know how to make that happen without ripping apart the pipelining machinery
>> and doing something custom there.
>
> Now that it works for me, I will try it some more with my own server
> so I can check the logs.  When I was trying this before, it seemed
> like all of the HEAD requests were sent in the first second (which I
> thought was also producing the open files problem?).  So I am
> operating under the theory that we are still kind of slow in
> installing the file from our pristine store.
>
> I assume that if I have a 44 second checkout, then I should see HEAD
> requests in the server logs happening during most of that 44 seconds
> if that is contributing to the problem?

Yup. But at a guess, I think we're slower than Git simply because we
do more work. Once the pipe is stuffed full of requests, there really
aren't any gaps/pauses that will occur. I think you'll see an even
spread of HEAD requests. They will just be "in between" other stuff,
without contributing measurably to the overall process duration. (ie.
they control the flow, rather than spend time)

Cheers,
-g

Re: svn commit: r1335639 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c

Posted by Mark Phippard <ma...@gmail.com>.
On Tue, May 8, 2012 at 2:45 PM, C. Michael Pilato <cm...@collab.net> wrote:
> On 05/08/2012 02:34 PM, Mark Phippard wrote:
>> Now that I can run the test I wanted, the performance improvement is
>> pretty nice.  Checking out our code goes from 1m35s down to 0m44s.  I
>> cannot help but think that number should still be a lot lower though.
>> This scenario seems like it would be very similar to what a Git
>> checkout would do now, probably even less work has to be done.  I do
>> not have a Git-svn version of our codebase to test with, but I am
>> guessing a Git checkout of our code would be less than 10 seconds.  So
>> it might be an indication we could be doing more optimization in our
>> libraries.
>>
>> That said, I still think it is a nice improvement and I imagine it
>> would scale up and down based on size and number of files.
>>
>> Does anyone have a git version of our tree they could try this with?
>> How long does it take git to materialize a working copy of our trunk?
>
> As I said before, I suspect your numbers would be much lower if I wasn't
> sending HEAD requests for each file.  Unfortunately, ra_serf is depending on
> the ordering of the pipelined requests (PROPFIND results for a given path
> must be processed before the contents of the file are fetched), and I don't
> know how to make that happen without ripping apart the pipelining machinery
> and doing something custom there.

Now that it works for me, I will try it some more with my own server
so I can check the logs.  When I was trying this before, it seemed
like all of the HEAD requests were sent in the first second (which I
thought was also producing the open files problem?).  So I am
operating under the theory that we are still kind of slow in
installing the file from our pristine store.

I assume that if I have a 44 second checkout, then I should see HEAD
requests in the server logs happening during most of that 44 seconds
if that is contributing to the problem?

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: svn commit: r1335639 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c

Posted by Greg Stein <gs...@gmail.com>.
On Tue, May 8, 2012 at 4:09 PM, C. Michael Pilato <cm...@collab.net> wrote:
> On 05/08/2012 03:35 PM, Greg Stein wrote:
>> One question: the ordering of PROPFIND and GET. Do you know if that is
>> a requirement, or simply that you were preserving prior behavior?
>
> Upon reflection, it's probably not a hard requirement.  In general, I
> suppose it's easier (and more efficient) to cache properties and stream
> contents while we drive an editor than the reverse, so that's probably why
> that ordering was chosen prior.
>
> I recall running into *some* problem when I tried to handle the contents
> inline (while processing the REPORT response).  I can't recall for sure what
> that was, though (weekends are just fantastic amnesia drugs, sometimes!),
> but I think it boiled down to trying to figure out how to practically drive
> the editor partially, leaving the file baton open until some future
> pipelined PROPFIND request response was finally processed.

Cool. Thanks. Sounds like I may have some flexibility. Ev2 will
simplify a good bit of this, but we still need Ev1 support.

I have no schedule at all for this, as I'm more interested in an Ev2
commit editor. But thanks to you adding it to the tracker, we won't
lose track...

Cheers,
-g

Re: svn commit: r1335639 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c

Posted by "C. Michael Pilato" <cm...@collab.net>.
On 05/08/2012 05:47 PM, Ivan Zhakov wrote:
> Well, it seems things are more complicated: current mod_dav_svn
> implementation never sends <fetch-props /> tag and ra_serf always asks
> for properties, even if there is no properties.

You know, I had a memory that this had changed at some point -- something
that pburba was working on.  I found this:

{{{
r1063337 | pburba | 2011-01-25 11:26:42 -0500 (Tue, 25 Jan 2011) | 47 lines
Changed paths:
   M /subversion/trunk/subversion/libsvn_ra_neon/fetch.c
   M /subversion/trunk/subversion/mod_dav_svn/reports/update.c

Server-side fix for issue #3657 'dav update report handler in skelta mode
can cause spurious conflicts'.

[...]
}}}

No time to process it this very second, but it's definitely something for me
(or others) to check into later.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Enterprise Cloud Development

Re: svn commit: r1335639 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On Wed, May 16, 2012 at 11:17 AM, Ivan Zhakov <iv...@visualsvn.com> wrote:
> It seems to be a good thing to try to implement on Hackathon in Berlin.
>
> I'm often switching between different branches and I'll benefit a lot
> if this operation will just take one REPORT request, instead of many
> PROPFINDs for each added file.

I'll be in Berlin as well...this would be a great thing to knock out
when a bunch of us are in the same room!  Assuming no one has gotten
around to it by then of course...  =P  -- justin

Re: svn commit: r1335639 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On Thu, May 10, 2012 at 5:41 PM, C. Michael Pilato <cm...@collab.net> wrote:
> On 05/10/2012 06:21 AM, Ivan Zhakov wrote:
>> It seems that ra_serf unconditionally retrieves properties using
>> PROPFIND for *all* added files:
>
> Yup, that's what I said.  :-)
>
>> subversion\libsvn_ra_serf\update.c:1633 (start_report)
>> [[[
>>   else if ((state == OPEN_DIR || state == ADD_DIR) &&
>>            strcmp(name.name, "add-file") == 0)
>>     {
>>       const char *file_name, *cf, *cr;
>>       report_info_t *info;
>>
>>       file_name = svn_xml_get_attr_value("name", attrs);
>>       cf = svn_xml_get_attr_value("copyfrom-path", attrs);
>>       cr = svn_xml_get_attr_value("copyfrom-rev", attrs);
>>
>>       if (!file_name)
>>         {
>>           return svn_error_create(
>>             SVN_ERR_RA_DAV_MALFORMED_DATA, NULL,
>>             _("Missing name attr in add-file element"));
>>         }
>>
>>       info = push_state(parser, ctx, ADD_FILE);
>>
>>       info->base_rev = SVN_INVALID_REVNUM;
>>       info->fetch_props = TRUE;
>>       info->fetch_file = TRUE;
>> ]]]
>>
>> Do you know why?
>
> "We've always done it that way."  :-)
>
> Seriously, this decision is so old that I can't remember its reasoning.  It
> could be because there are *always* some properties of interest on added
> files, therefore no reason to conditionalize the fetch.  Even if there are
> no user-defined props, the "entry props" for last-committed-rev,
> last-committed-author, etc. need to be fetched.
>
> One easy improvement we could make would be to amend the protocol to go
> ahead and pass directly those "entry props" for added files, too, and to
> return to using the <fetch-props/> tag for added files that have
> user-defined props set.  On the client side, the mere presence of
> <set-prop>...</set-prop> for an added object would mean "Don't assume that
> you need to fetch properties because I'm giving some of them to you
> directly", and of course the presence of the <fetch-props/> tag would then
> indicate "...but in this case, please go ahead and fetch them all".
>
> Such a change would also open the door for us passing *all* the properties
> for added objects inline in the future if we wish:  just
> <set-prop>...</set-prop> them all and *don't* include the <fetch-props/>
> indicator.
>
It seems to be a good thing to try to implement on Hackathon in Berlin.

I'm often switching between different branches and I'll benefit a lot
if this operation will just take one REPORT request, instead of many
PROPFINDs for each added file.

-- 
Ivan Zhakov

Re: svn commit: r1335639 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c

Posted by "C. Michael Pilato" <cm...@collab.net>.
On 05/10/2012 06:21 AM, Ivan Zhakov wrote:
> It seems that ra_serf unconditionally retrieves properties using
> PROPFIND for *all* added files:

Yup, that's what I said.  :-)

> subversion\libsvn_ra_serf\update.c:1633 (start_report)
> [[[
>   else if ((state == OPEN_DIR || state == ADD_DIR) &&
>            strcmp(name.name, "add-file") == 0)
>     {
>       const char *file_name, *cf, *cr;
>       report_info_t *info;
> 
>       file_name = svn_xml_get_attr_value("name", attrs);
>       cf = svn_xml_get_attr_value("copyfrom-path", attrs);
>       cr = svn_xml_get_attr_value("copyfrom-rev", attrs);
> 
>       if (!file_name)
>         {
>           return svn_error_create(
>             SVN_ERR_RA_DAV_MALFORMED_DATA, NULL,
>             _("Missing name attr in add-file element"));
>         }
> 
>       info = push_state(parser, ctx, ADD_FILE);
> 
>       info->base_rev = SVN_INVALID_REVNUM;
>       info->fetch_props = TRUE;
>       info->fetch_file = TRUE;
> ]]]
> 
> Do you know why?

"We've always done it that way."  :-)

Seriously, this decision is so old that I can't remember its reasoning.  It
could be because there are *always* some properties of interest on added
files, therefore no reason to conditionalize the fetch.  Even if there are
no user-defined props, the "entry props" for last-committed-rev,
last-committed-author, etc. need to be fetched.

One easy improvement we could make would be to amend the protocol to go
ahead and pass directly those "entry props" for added files, too, and to
return to using the <fetch-props/> tag for added files that have
user-defined props set.  On the client side, the mere presence of
<set-prop>...</set-prop> for an added object would mean "Don't assume that
you need to fetch properties because I'm giving some of them to you
directly", and of course the presence of the <fetch-props/> tag would then
indicate "...but in this case, please go ahead and fetch them all".

Such a change would also open the door for us passing *all* the properties
for added objects inline in the future if we wish:  just
<set-prop>...</set-prop> them all and *don't* include the <fetch-props/>
indicator.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Enterprise Cloud Development


Re: svn commit: r1335639 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On Wed, May 9, 2012 at 8:51 PM, C. Michael Pilato <cm...@collab.net> wrote:
> On 05/08/2012 05:47 PM, Ivan Zhakov wrote:
>> On Wed, May 9, 2012 at 1:34 AM, Ivan Zhakov <iv...@visualsvn.com> wrote:
>>> On Wed, May 9, 2012 at 12:49 AM, C. Michael Pilato <cm...@collab.net> wrote:
>>>> On 05/08/2012 04:39 PM, Mark Phippard wrote:
>>>>> On Tue, May 8, 2012 at 4:20 PM, Ivan Zhakov <iv...@visualsvn.com> wrote:
>>>>>> On Wed, May 9, 2012 at 12:09 AM, C. Michael Pilato <cm...@collab.net> wrote:
>>>>>>> On 05/08/2012 03:35 PM, Greg Stein wrote:
>>>>>>>> One question: the ordering of PROPFIND and GET. Do you know if that is
>>>>>>>> a requirement, or simply that you were preserving prior behavior?
>>>>>>>
>>>>>>> Upon reflection, it's probably not a hard requirement.  In general, I
>>>>>>> suppose it's easier (and more efficient) to cache properties and stream
>>>>>>> contents while we drive an editor than the reverse, so that's probably why
>>>>>>> that ordering was chosen prior.
>>>>>>>
>>>>>> Another option is to include properties in REPORT even in skelta-mode
>>>>>> if they are small. With defining small something like 0.5-1k.
>>>>>
>>>>> Agreed.  I had forgotten that we would still need these roundtrips to
>>>>> get the properties.  Maybe the REPORT request could at least indicate
>>>>> which items have properties.  It would be better for performance if
>>>>> things like svn:eol-style and svn:mimetype were included in the
>>>>> request so we then only had to go back to the server for custom props
>>>>> (and we knew which files have them).
>>>>
>>>> The REPORT request does include a <fetch-props/> type of indicator which
>>>> says "there's something worth fetching here".
>>>>
>>>> I'm quite in favor of including, say, the "svn:" class of properties in the
>>>> REPORT response proper.
>>>>
>>> We could include all properties if we know there are small. It seems
>>> to be possible implement this on server side, but I'm not sure that
>>> current client code is ready for mixing embedded and external
>>> properties in REPORT response.
>>>
>> Well, it seems things are more complicated: current mod_dav_svn
>> implementation never sends <fetch-props /> tag and ra_serf always asks
>> for properties, even if there is no properties.
>
> I double-checked this claim, Ivan, and don't believe it to be accurate.  I
> started with code inspection, and found things to be as I'd hoped.
>
> I then created an empty repository, and made these changes:
>
>   r1:  add a file, with properties set on it.
>   r2:  modified the file's contents only
>   r3:  modified the file's svn:eol-style property value (which also
>        tweaked its content)
>   r4;  delete the file's svn:eol-style property.
>
> Then, I backdated my working copy to r0, and started updating to each
> successive revision.
>
> $ svn up -r1:
>
>   "OPTIONS /tests/repos HTTP/1.1"
>   "OPTIONS /tests/repos HTTP/1.1"
>   "REPORT /tests/repos/!svn/me HTTP/1.1"
>   "PROPFIND /tests/repos/!svn/rvr/1/file.py HTTP/1.1"
>   "HEAD /tests/repos/!svn/rvr/1/file.py HTTP/1.1"
>
> This update adds the new file back to the working copy.  I get a PROPFIND to
> fetch its properties (the server never transmits props for added files/dirs)
> and a HEAD for the contents (because they are cached in the WC pristine
> store already ... else this would have been a GET).
>
> $ svn up -r2:
>
>   "OPTIONS /tests/repos HTTP/1.1"
>   "OPTIONS /tests/repos HTTP/1.1"
>   "REPORT /tests/repos/!svn/me HTTP/1.1"
>   "HEAD /tests/repos/!svn/rvr/2/file.py HTTP/1.1"
>
> No PROPFIND.  HEAD to install the new contents.
>
> $ svn up -r3:
>
>   "OPTIONS /tests/repos HTTP/1.1"
>   "OPTIONS /tests/repos HTTP/1.1"
>   "REPORT /tests/repos/!svn/me HTTP/1.1"
>   "HEAD /tests/repos/!svn/rvr/3/file.py HTTP/1.1"
>
> Again no PROPFIND, because the <D:set-prop> REPORT response item carried the
> svn:eol-style propchange.  HEAD because the eol-style changed and new
> contents were required.
>
> $ svn up -r4:
>
>   "OPTIONS /tests/repos HTTP/1.1"
>   "OPTIONS /tests/repos HTTP/1.1"
>   "REPORT /tests/repos/!svn/me HTTP/1.1"
>
> Again, no PROPFIND, because the <D:remove-prop> REPORT response item carried
> the propchange info.  No HEAD because the contents didn't change.
>
> So, it's only the added files and directories that seem to necessitate a
> PROPFIND.
>
Hi Mike,

I've only checked that mod_dav_svn never sends <fetch-props /> tags
while you mentioned that it does. Actually I checked it only by
analyzing the server-side code, not by capturing real transfer. Sorry
for confusion.

It seems that ra_serf unconditionally retrieves properties using
PROPFIND for *all* added files:
subversion\libsvn_ra_serf\update.c:1633 (start_report)
[[[
  else if ((state == OPEN_DIR || state == ADD_DIR) &&
           strcmp(name.name, "add-file") == 0)
    {
      const char *file_name, *cf, *cr;
      report_info_t *info;

      file_name = svn_xml_get_attr_value("name", attrs);
      cf = svn_xml_get_attr_value("copyfrom-path", attrs);
      cr = svn_xml_get_attr_value("copyfrom-rev", attrs);

      if (!file_name)
        {
          return svn_error_create(
            SVN_ERR_RA_DAV_MALFORMED_DATA, NULL,
            _("Missing name attr in add-file element"));
        }

      info = push_state(parser, ctx, ADD_FILE);

      info->base_rev = SVN_INVALID_REVNUM;
      info->fetch_props = TRUE;
      info->fetch_file = TRUE;
]]]

Do you know why?

-- 
Ivan Zhakov

Re: svn commit: r1335639 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c

Posted by "C. Michael Pilato" <cm...@collab.net>.
On 05/08/2012 05:47 PM, Ivan Zhakov wrote:
> On Wed, May 9, 2012 at 1:34 AM, Ivan Zhakov <iv...@visualsvn.com> wrote:
>> On Wed, May 9, 2012 at 12:49 AM, C. Michael Pilato <cm...@collab.net> wrote:
>>> On 05/08/2012 04:39 PM, Mark Phippard wrote:
>>>> On Tue, May 8, 2012 at 4:20 PM, Ivan Zhakov <iv...@visualsvn.com> wrote:
>>>>> On Wed, May 9, 2012 at 12:09 AM, C. Michael Pilato <cm...@collab.net> wrote:
>>>>>> On 05/08/2012 03:35 PM, Greg Stein wrote:
>>>>>>> One question: the ordering of PROPFIND and GET. Do you know if that is
>>>>>>> a requirement, or simply that you were preserving prior behavior?
>>>>>>
>>>>>> Upon reflection, it's probably not a hard requirement.  In general, I
>>>>>> suppose it's easier (and more efficient) to cache properties and stream
>>>>>> contents while we drive an editor than the reverse, so that's probably why
>>>>>> that ordering was chosen prior.
>>>>>>
>>>>> Another option is to include properties in REPORT even in skelta-mode
>>>>> if they are small. With defining small something like 0.5-1k.
>>>>
>>>> Agreed.  I had forgotten that we would still need these roundtrips to
>>>> get the properties.  Maybe the REPORT request could at least indicate
>>>> which items have properties.  It would be better for performance if
>>>> things like svn:eol-style and svn:mimetype were included in the
>>>> request so we then only had to go back to the server for custom props
>>>> (and we knew which files have them).
>>>
>>> The REPORT request does include a <fetch-props/> type of indicator which
>>> says "there's something worth fetching here".
>>>
>>> I'm quite in favor of including, say, the "svn:" class of properties in the
>>> REPORT response proper.
>>>
>> We could include all properties if we know there are small. It seems
>> to be possible implement this on server side, but I'm not sure that
>> current client code is ready for mixing embedded and external
>> properties in REPORT response.
>>
> Well, it seems things are more complicated: current mod_dav_svn
> implementation never sends <fetch-props /> tag and ra_serf always asks
> for properties, even if there is no properties.

I double-checked this claim, Ivan, and don't believe it to be accurate.  I
started with code inspection, and found things to be as I'd hoped.

I then created an empty repository, and made these changes:

   r1:  add a file, with properties set on it.
   r2:  modified the file's contents only
   r3:  modified the file's svn:eol-style property value (which also
        tweaked its content)
   r4;  delete the file's svn:eol-style property.

Then, I backdated my working copy to r0, and started updating to each
successive revision.

$ svn up -r1:

   "OPTIONS /tests/repos HTTP/1.1"
   "OPTIONS /tests/repos HTTP/1.1"
   "REPORT /tests/repos/!svn/me HTTP/1.1"
   "PROPFIND /tests/repos/!svn/rvr/1/file.py HTTP/1.1"
   "HEAD /tests/repos/!svn/rvr/1/file.py HTTP/1.1"

This update adds the new file back to the working copy.  I get a PROPFIND to
fetch its properties (the server never transmits props for added files/dirs)
and a HEAD for the contents (because they are cached in the WC pristine
store already ... else this would have been a GET).

$ svn up -r2:

   "OPTIONS /tests/repos HTTP/1.1"
   "OPTIONS /tests/repos HTTP/1.1"
   "REPORT /tests/repos/!svn/me HTTP/1.1"
   "HEAD /tests/repos/!svn/rvr/2/file.py HTTP/1.1"

No PROPFIND.  HEAD to install the new contents.

$ svn up -r3:

   "OPTIONS /tests/repos HTTP/1.1"
   "OPTIONS /tests/repos HTTP/1.1"
   "REPORT /tests/repos/!svn/me HTTP/1.1"
   "HEAD /tests/repos/!svn/rvr/3/file.py HTTP/1.1"

Again no PROPFIND, because the <D:set-prop> REPORT response item carried the
svn:eol-style propchange.  HEAD because the eol-style changed and new
contents were required.

$ svn up -r4:

   "OPTIONS /tests/repos HTTP/1.1"
   "OPTIONS /tests/repos HTTP/1.1"
   "REPORT /tests/repos/!svn/me HTTP/1.1"

Again, no PROPFIND, because the <D:remove-prop> REPORT response item carried
the propchange info.  No HEAD because the contents didn't change.

So, it's only the added files and directories that seem to necessitate a
PROPFIND.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Enterprise Cloud Development


Re: svn commit: r1335639 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On Wed, May 9, 2012 at 1:34 AM, Ivan Zhakov <iv...@visualsvn.com> wrote:
> On Wed, May 9, 2012 at 12:49 AM, C. Michael Pilato <cm...@collab.net> wrote:
>> On 05/08/2012 04:39 PM, Mark Phippard wrote:
>>> On Tue, May 8, 2012 at 4:20 PM, Ivan Zhakov <iv...@visualsvn.com> wrote:
>>>> On Wed, May 9, 2012 at 12:09 AM, C. Michael Pilato <cm...@collab.net> wrote:
>>>>> On 05/08/2012 03:35 PM, Greg Stein wrote:
>>>>>> One question: the ordering of PROPFIND and GET. Do you know if that is
>>>>>> a requirement, or simply that you were preserving prior behavior?
>>>>>
>>>>> Upon reflection, it's probably not a hard requirement.  In general, I
>>>>> suppose it's easier (and more efficient) to cache properties and stream
>>>>> contents while we drive an editor than the reverse, so that's probably why
>>>>> that ordering was chosen prior.
>>>>>
>>>> Another option is to include properties in REPORT even in skelta-mode
>>>> if they are small. With defining small something like 0.5-1k.
>>>
>>> Agreed.  I had forgotten that we would still need these roundtrips to
>>> get the properties.  Maybe the REPORT request could at least indicate
>>> which items have properties.  It would be better for performance if
>>> things like svn:eol-style and svn:mimetype were included in the
>>> request so we then only had to go back to the server for custom props
>>> (and we knew which files have them).
>>
>> The REPORT request does include a <fetch-props/> type of indicator which
>> says "there's something worth fetching here".
>>
>> I'm quite in favor of including, say, the "svn:" class of properties in the
>> REPORT response proper.
>>
> We could include all properties if we know there are small. It seems
> to be possible implement this on server side, but I'm not sure that
> current client code is ready for mixing embedded and external
> properties in REPORT response.
>
Well, it seems things are more complicated: current mod_dav_svn
implementation never sends <fetch-props /> tag and ra_serf always asks
for properties, even if there is no properties.


-- 
Ivan Zhakov

Re: svn commit: r1335639 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On Wed, May 9, 2012 at 12:49 AM, C. Michael Pilato <cm...@collab.net> wrote:
> On 05/08/2012 04:39 PM, Mark Phippard wrote:
>> On Tue, May 8, 2012 at 4:20 PM, Ivan Zhakov <iv...@visualsvn.com> wrote:
>>> On Wed, May 9, 2012 at 12:09 AM, C. Michael Pilato <cm...@collab.net> wrote:
>>>> On 05/08/2012 03:35 PM, Greg Stein wrote:
>>>>> One question: the ordering of PROPFIND and GET. Do you know if that is
>>>>> a requirement, or simply that you were preserving prior behavior?
>>>>
>>>> Upon reflection, it's probably not a hard requirement.  In general, I
>>>> suppose it's easier (and more efficient) to cache properties and stream
>>>> contents while we drive an editor than the reverse, so that's probably why
>>>> that ordering was chosen prior.
>>>>
>>> Another option is to include properties in REPORT even in skelta-mode
>>> if they are small. With defining small something like 0.5-1k.
>>
>> Agreed.  I had forgotten that we would still need these roundtrips to
>> get the properties.  Maybe the REPORT request could at least indicate
>> which items have properties.  It would be better for performance if
>> things like svn:eol-style and svn:mimetype were included in the
>> request so we then only had to go back to the server for custom props
>> (and we knew which files have them).
>
> The REPORT request does include a <fetch-props/> type of indicator which
> says "there's something worth fetching here".
>
> I'm quite in favor of including, say, the "svn:" class of properties in the
> REPORT response proper.
>
We could include all properties if we know there are small. It seems
to be possible implement this on server side, but I'm not sure that
current client code is ready for mixing embedded and external
properties in REPORT response.


-- 
Ivan Zhakov

Re: svn commit: r1335639 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c

Posted by "C. Michael Pilato" <cm...@collab.net>.
On 05/08/2012 04:39 PM, Mark Phippard wrote:
> On Tue, May 8, 2012 at 4:20 PM, Ivan Zhakov <iv...@visualsvn.com> wrote:
>> On Wed, May 9, 2012 at 12:09 AM, C. Michael Pilato <cm...@collab.net> wrote:
>>> On 05/08/2012 03:35 PM, Greg Stein wrote:
>>>> One question: the ordering of PROPFIND and GET. Do you know if that is
>>>> a requirement, or simply that you were preserving prior behavior?
>>>
>>> Upon reflection, it's probably not a hard requirement.  In general, I
>>> suppose it's easier (and more efficient) to cache properties and stream
>>> contents while we drive an editor than the reverse, so that's probably why
>>> that ordering was chosen prior.
>>>
>> Another option is to include properties in REPORT even in skelta-mode
>> if they are small. With defining small something like 0.5-1k.
> 
> Agreed.  I had forgotten that we would still need these roundtrips to
> get the properties.  Maybe the REPORT request could at least indicate
> which items have properties.  It would be better for performance if
> things like svn:eol-style and svn:mimetype were included in the
> request so we then only had to go back to the server for custom props
> (and we knew which files have them).

The REPORT request does include a <fetch-props/> type of indicator which
says "there's something worth fetching here".

I'm quite in favor of including, say, the "svn:" class of properties in the
REPORT response proper.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Enterprise Cloud Development


Re: svn commit: r1335639 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c

Posted by Mark Phippard <ma...@gmail.com>.
On Tue, May 8, 2012 at 4:20 PM, Ivan Zhakov <iv...@visualsvn.com> wrote:
> On Wed, May 9, 2012 at 12:09 AM, C. Michael Pilato <cm...@collab.net> wrote:
>> On 05/08/2012 03:35 PM, Greg Stein wrote:
>>> One question: the ordering of PROPFIND and GET. Do you know if that is
>>> a requirement, or simply that you were preserving prior behavior?
>>
>> Upon reflection, it's probably not a hard requirement.  In general, I
>> suppose it's easier (and more efficient) to cache properties and stream
>> contents while we drive an editor than the reverse, so that's probably why
>> that ordering was chosen prior.
>>
> Another option is to include properties in REPORT even in skelta-mode
> if they are small. With defining small something like 0.5-1k.

Agreed.  I had forgotten that we would still need these roundtrips to
get the properties.  Maybe the REPORT request could at least indicate
which items have properties.  It would be better for performance if
things like svn:eol-style and svn:mimetype were included in the
request so we then only had to go back to the server for custom props
(and we knew which files have them).


-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: svn commit: r1335639 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c

Posted by Ivan Zhakov <iv...@visualsvn.com>.
On Wed, May 9, 2012 at 12:09 AM, C. Michael Pilato <cm...@collab.net> wrote:
> On 05/08/2012 03:35 PM, Greg Stein wrote:
>> One question: the ordering of PROPFIND and GET. Do you know if that is
>> a requirement, or simply that you were preserving prior behavior?
>
> Upon reflection, it's probably not a hard requirement.  In general, I
> suppose it's easier (and more efficient) to cache properties and stream
> contents while we drive an editor than the reverse, so that's probably why
> that ordering was chosen prior.
>
Another option is to include properties in REPORT even in skelta-mode
if they are small. With defining small something like 0.5-1k.


-- 
Ivan Zhakov

Re: svn commit: r1335639 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c

Posted by "C. Michael Pilato" <cm...@collab.net>.
On 05/08/2012 03:35 PM, Greg Stein wrote:
> One question: the ordering of PROPFIND and GET. Do you know if that is
> a requirement, or simply that you were preserving prior behavior?

Upon reflection, it's probably not a hard requirement.  In general, I
suppose it's easier (and more efficient) to cache properties and stream
contents while we drive an editor than the reverse, so that's probably why
that ordering was chosen prior.

I recall running into *some* problem when I tried to handle the contents
inline (while processing the REPORT response).  I can't recall for sure what
that was, though (weekends are just fantastic amnesia drugs, sometimes!),
but I think it boiled down to trying to figure out how to practically drive
the editor partially, leaving the file baton open until some future
pipelined PROPFIND request response was finally processed.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Enterprise Cloud Development


Re: svn commit: r1335639 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c

Posted by Greg Stein <gs...@gmail.com>.
On Tue, May 8, 2012 at 3:23 PM, C. Michael Pilato <cm...@collab.net> wrote:
> On 05/08/2012 02:51 PM, Greg Stein wrote:
>> On Tue, May 8, 2012 at 2:45 PM, C. Michael Pilato <cm...@collab.net> wrote:
>>> As I said before, I suspect your numbers would be much lower if I wasn't
>>> sending HEAD requests for each file.  Unfortunately, ra_serf is depending on
>>> the ordering of the pipelined requests (PROPFIND results for a given path
>>> must be processed before the contents of the file are fetched), and I don't
>>> know how to make that happen without ripping apart the pipelining machinery
>>> and doing something custom there.
>>
>> I can get that fixed (at some point). A little hand-wavey magic in the
>> callbacks, and we'll be good to go.
>>
>> Would you mind opening an issue (assigned or cc'd to me) to track
>> that? It certainly isn't a blocker, but would be a very nice
>> enhancement.
>
> Does http://subversion.tigris.org/issues/show_bug.cgi?id=4179 carry the
> requisite information?

It does. Thanks!

I've added a few brief thoughts to the issue.

One question: the ordering of PROPFIND and GET. Do you know if that is
a requirement, or simply that you were preserving prior behavior?

Cheers,
-g

Re: svn commit: r1335639 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c

Posted by "C. Michael Pilato" <cm...@collab.net>.
On 05/08/2012 02:51 PM, Greg Stein wrote:
> On Tue, May 8, 2012 at 2:45 PM, C. Michael Pilato <cm...@collab.net> wrote:
>> As I said before, I suspect your numbers would be much lower if I wasn't
>> sending HEAD requests for each file.  Unfortunately, ra_serf is depending on
>> the ordering of the pipelined requests (PROPFIND results for a given path
>> must be processed before the contents of the file are fetched), and I don't
>> know how to make that happen without ripping apart the pipelining machinery
>> and doing something custom there.
> 
> I can get that fixed (at some point). A little hand-wavey magic in the
> callbacks, and we'll be good to go.
> 
> Would you mind opening an issue (assigned or cc'd to me) to track
> that? It certainly isn't a blocker, but would be a very nice
> enhancement.

Does http://subversion.tigris.org/issues/show_bug.cgi?id=4179 carry the
requisite information?

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Enterprise Cloud Development


Re: svn commit: r1335639 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c

Posted by Greg Stein <gs...@gmail.com>.
On Tue, May 8, 2012 at 2:45 PM, C. Michael Pilato <cm...@collab.net> wrote:
> On 05/08/2012 02:34 PM, Mark Phippard wrote:
>> Now that I can run the test I wanted, the performance improvement is
>> pretty nice.  Checking out our code goes from 1m35s down to 0m44s.  I
>> cannot help but think that number should still be a lot lower though.
>> This scenario seems like it would be very similar to what a Git
>> checkout would do now, probably even less work has to be done.  I do
>> not have a Git-svn version of our codebase to test with, but I am
>> guessing a Git checkout of our code would be less than 10 seconds.  So
>> it might be an indication we could be doing more optimization in our
>> libraries.
>>
>> That said, I still think it is a nice improvement and I imagine it
>> would scale up and down based on size and number of files.
>>
>> Does anyone have a git version of our tree they could try this with?
>> How long does it take git to materialize a working copy of our trunk?
>
> As I said before, I suspect your numbers would be much lower if I wasn't
> sending HEAD requests for each file.  Unfortunately, ra_serf is depending on
> the ordering of the pipelined requests (PROPFIND results for a given path
> must be processed before the contents of the file are fetched), and I don't
> know how to make that happen without ripping apart the pipelining machinery
> and doing something custom there.

I can get that fixed (at some point). A little hand-wavey magic in the
callbacks, and we'll be good to go.

Would you mind opening an issue (assigned or cc'd to me) to track
that? It certainly isn't a blocker, but would be a very nice
enhancement.

Thanks,
-g

Re: svn commit: r1335639 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c

Posted by "C. Michael Pilato" <cm...@collab.net>.
On 05/08/2012 02:34 PM, Mark Phippard wrote:
> Now that I can run the test I wanted, the performance improvement is
> pretty nice.  Checking out our code goes from 1m35s down to 0m44s.  I
> cannot help but think that number should still be a lot lower though.
> This scenario seems like it would be very similar to what a Git
> checkout would do now, probably even less work has to be done.  I do
> not have a Git-svn version of our codebase to test with, but I am
> guessing a Git checkout of our code would be less than 10 seconds.  So
> it might be an indication we could be doing more optimization in our
> libraries.
> 
> That said, I still think it is a nice improvement and I imagine it
> would scale up and down based on size and number of files.
> 
> Does anyone have a git version of our tree they could try this with?
> How long does it take git to materialize a working copy of our trunk?

As I said before, I suspect your numbers would be much lower if I wasn't
sending HEAD requests for each file.  Unfortunately, ra_serf is depending on
the ordering of the pipelined requests (PROPFIND results for a given path
must be processed before the contents of the file are fetched), and I don't
know how to make that happen without ripping apart the pipelining machinery
and doing something custom there.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Enterprise Cloud Development


Re: svn commit: r1335639 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c

Posted by Mark Phippard <ma...@gmail.com>.
On Tue, May 8, 2012 at 1:11 PM, C. Michael Pilato <cm...@collab.net> wrote:
> On 05/08/2012 01:03 PM, Mark Phippard wrote:
>> On Tue, May 8, 2012 at 12:59 PM, C. Michael Pilato <cm...@collab.net> wrote:
>>> Mark, can you see if this (and previous commits I've made) fixes the file
>>> handle abuse problem you reported?
>>>
>>> I tested this locally using "ulimit -n 200" to reduce the file handle limit
>>> on my box from 8192 to 200.  Before this change, I saw the same error you
>>> did.  Afterward, no error.  Hoping you experience the same.
>>
>> Confirmed.  This resolves it for me too.
>
> Sweet.  Thanks!

Now that I can run the test I wanted, the performance improvement is
pretty nice.  Checking out our code goes from 1m35s down to 0m44s.  I
cannot help but think that number should still be a lot lower though.
This scenario seems like it would be very similar to what a Git
checkout would do now, probably even less work has to be done.  I do
not have a Git-svn version of our codebase to test with, but I am
guessing a Git checkout of our code would be less than 10 seconds.  So
it might be an indication we could be doing more optimization in our
libraries.

That said, I still think it is a nice improvement and I imagine it
would scale up and down based on size and number of files.

Does anyone have a git version of our tree they could try this with?
How long does it take git to materialize a working copy of our trunk?

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: svn commit: r1335639 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c

Posted by "C. Michael Pilato" <cm...@collab.net>.
On 05/08/2012 01:03 PM, Mark Phippard wrote:
> On Tue, May 8, 2012 at 12:59 PM, C. Michael Pilato <cm...@collab.net> wrote:
>> Mark, can you see if this (and previous commits I've made) fixes the file
>> handle abuse problem you reported?
>>
>> I tested this locally using "ulimit -n 200" to reduce the file handle limit
>> on my box from 8192 to 200.  Before this change, I saw the same error you
>> did.  Afterward, no error.  Hoping you experience the same.
> 
> Confirmed.  This resolves it for me too.

Sweet.  Thanks!

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Enterprise Cloud Development


Re: svn commit: r1335639 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c

Posted by Mark Phippard <ma...@gmail.com>.
On Tue, May 8, 2012 at 12:59 PM, C. Michael Pilato <cm...@collab.net> wrote:
> On 05/08/2012 12:56 PM, cmpilato@apache.org wrote:
>> Author: cmpilato
>> Date: Tue May  8 16:56:42 2012
>> New Revision: 1335639
>>
>> URL: http://svn.apache.org/viewvc?rev=1335639&view=rev
>> Log:
>> Avoid opening pristine store file handles until they are actually
>> required.
>>
>> * subversion/libsvn_wc/adm_ops.c
>>   (get_pristine_lazyopen_baton_t): New private callback baton type.
>>   (get_pristine_lazyopen_func): Callback implementation.
>>   (svn_wc__get_pristine_contents_by_checksum): Use the new lazyopen
>>     stream mechanics to avoid opening a file handle to the pristine
>>     contents until it's first really needed.
>>
>> Suggested by: gstein
>
> Mark, can you see if this (and previous commits I've made) fixes the file
> handle abuse problem you reported?
>
> I tested this locally using "ulimit -n 200" to reduce the file handle limit
> on my box from 8192 to 200.  Before this change, I saw the same error you
> did.  Afterward, no error.  Hoping you experience the same.

Confirmed.  This resolves it for me too.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: svn commit: r1335639 - /subversion/trunk/subversion/libsvn_wc/adm_ops.c

Posted by "C. Michael Pilato" <cm...@collab.net>.
On 05/08/2012 12:56 PM, cmpilato@apache.org wrote:
> Author: cmpilato
> Date: Tue May  8 16:56:42 2012
> New Revision: 1335639
> 
> URL: http://svn.apache.org/viewvc?rev=1335639&view=rev
> Log:
> Avoid opening pristine store file handles until they are actually
> required.
> 
> * subversion/libsvn_wc/adm_ops.c
>   (get_pristine_lazyopen_baton_t): New private callback baton type.
>   (get_pristine_lazyopen_func): Callback implementation.
>   (svn_wc__get_pristine_contents_by_checksum): Use the new lazyopen
>     stream mechanics to avoid opening a file handle to the pristine
>     contents until it's first really needed.
> 
> Suggested by: gstein

Mark, can you see if this (and previous commits I've made) fixes the file
handle abuse problem you reported?

I tested this locally using "ulimit -n 200" to reduce the file handle limit
on my box from 8192 to 200.  Before this change, I saw the same error you
did.  Afterward, no error.  Hoping you experience the same.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Enterprise Cloud Development