You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Daniel Shahaf <d....@daniel.shahaf.name> on 2011/08/03 16:40:29 UTC

Re: svn commit: r1153493 - /subversion/branches/1.7.x/STATUS

cmpilato@apache.org wrote on Wed, Aug 03, 2011 at 14:00:39 -0000:
> +     (As an aside, Serf's potential as a platform for future
> +     improvement remains unproven and doubtful.  For example, HTTPv2
> +     removes canonical resource URLs, which works against the caching
> +     proxy concept that seems to be the strongest argument in favor of
> +     Serf's approach.  But that's not strictly germane here.)

That sounds odd for caching not to be taken into consideration in
HTTPv2's design.  And glancing at the httpv2 design notes suggests that
it was explicitly a goal.

Are you saying that somehow HTTPv2 actually made the cacheability
situation worse in some cases?  Or just that it doesn't make the
situation as as good as it promised to?

Re: svn commit: r1153493 - /subversion/branches/1.7.x/STATUS

Posted by "C. Michael Pilato" <cm...@collab.net>.
On 08/04/2011 04:16 PM, Greg Stein wrote:
> Finally, note that our intent is to switch to requesting content by
> <path, SHA1> tuples in the future. Those will be *very* cacheable (the
> SHA1 acts as the created-rev). So even if ra_serf didn't have the
> happy accident of using the server-provided URL, we would be returning
> to a cacheable form in 1.8.

Strictly speaking, in order to adhere to our current authz subsystem APIs
(which require both an svn_fs_root_t and a path), we'd need to use <path,
revision, SHA1>.  mod_authz_svn doesn't take the root into consideration,
but that's only one implementation of authz for Subversion. :-(

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand


Re: svn commit: r1153493 - /subversion/branches/1.7.x/STATUS

Posted by Greg Stein <gs...@gmail.com>.
On Thu, Aug 4, 2011 at 16:04, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
>...
> Yes; the problem is that some client-initiated requests do not
> canonicalize a node's identification to its created-path@created-rev,
> and consequently some[1] opportunities for caching are missed.
>
> <handwaving>I suppose that could be remedied --- eg, if the wc started
> caching those created-* coordinates.</handwaving>

Yup. That's what the dav_cache is for. We just avoid using it in HTTPv2 (today).

>
> Thanks for the explanation,
>
> Daniel
>
> [1] Not sure what fraction of opportunities that 'some' is.

Right. The version resource URL stored in the dav_cache only
represents the node in the working copy. If you need to fetch content
during a merge, or you do an 'svn cat', or a diff... or other similar
operations, then you probably do *not* want the version sitting in
your working copy.

Let's also note that a cache can/will fill up with path@latest
resources. If 'latest' doesn't change often (think smaller projects),
then you'll still get cache hits there. Projects (well, repositories)
with quickly increasing revisions (think svn.apache.org) will miss
opportunities to cache some of these alternate requests.

Finally, note that our intent is to switch to requesting content by
<path, SHA1> tuples in the future. Those will be *very* cacheable (the
SHA1 acts as the created-rev). So even if ra_serf didn't have the
happy accident of using the server-provided URL, we would be returning
to a cacheable form in 1.8.

Cheers,
-g

Re: svn commit: r1153493 - /subversion/branches/1.7.x/STATUS

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
C. Michael Pilato wrote on Thu, Aug 04, 2011 at 10:57:46 -0400:
> On 08/03/2011 10:40 AM, Daniel Shahaf wrote:
> > cmpilato@apache.org wrote on Wed, Aug 03, 2011 at 14:00:39 -0000:
> >> +     (As an aside, Serf's potential as a platform for future
> >> +     improvement remains unproven and doubtful.  For example, HTTPv2
> >> +     removes canonical resource URLs, which works against the caching
> >> +     proxy concept that seems to be the strongest argument in favor of
> >> +     Serf's approach.  But that's not strictly germane here.)
> > 
> > That sounds odd for caching not to be taken into consideration in
> > HTTPv2's design.  And glancing at the httpv2 design notes suggests that
> > it was explicitly a goal.
> > 
> > Are you saying that somehow HTTPv2 actually made the cacheability
> > situation worse in some cases?  Or just that it doesn't make the
> > situation as as good as it promised to?
> 
> As I dig into this a bit, I realize that the situation isn't quite as bad as
> I originally thought.  But that's only because of a coding oversight on my
> part.  (Happy accident?)  I'll explain below.  Note that I'm assuming that
> cacheability works best when the RA layers use a single canonical URL to
> fetch a given resource.
> 
> Let's first talk about the cost of addressing any particular server
> node@revision resource.  In HTTPv1, clients couldn't just calculate the URL
> of a resource -- they had to negotiate with the server using WebDAV/DeltaV
> abstractions.  Multiple roundtrips per calculation ... performance shutdown
> ... you get the picture.  mod_dav_svn helps here by transmitting in its
> update-style REPORT responses a "version resource URL", which the client
> caches via the davprops store in the working copy to avoid future costly
> lookups.  HTTPv2 facilitates client-side construction of resource URLs
> without server negotiation, therefore has no need of the davprops persistent
> cache mechanism, and as such the code doesn't use the davprops stuff at all
> when HTTPv2 is active.
> 
> The second factor of interest here is the canonical URL issue.  If I have a
> file that was created in revision 1 and remains unchanged henceforward, a
> client can address that file via any number of URLs.  After all, file@1 ==
> file@2 == file@3 == ..., right?  mod_dav_svn again tries to help here by
> normalizing the version resource URL that it sends to the client for a given
> resource based on the created-path and created-rev of the resource.  So no
> matter which version of our file we're talking about, mod_dav_svn will
> report its versioned resource URL as:
> 
>    .../!svn/ver/<CREATED-REV>/<CREATED-PATH>
> 
> Here's where I think the current code falls short.  While the update process
> still pays attention to the canonical version resource URL transmitted by
> the server (that was the happy accident ... ra_serf *could* be ignoring that
> today in favor of self-constructed URLs), that URL isn't cached in the WC
> any longer.  This means that future (non-update-style) operations performed
> by the client will be addressing the resources by some self-constructed,
> probably-non-canonical URL.  Stuff still works, of course, but this eats at
> the cache-friendliness.
> 
> Does that help to explain things better?
> 

Yes; the problem is that some client-initiated requests do not
canonicalize a node's identification to its created-path@created-rev,
and consequently some[1] opportunities for caching are missed.

<handwaving>I suppose that could be remedied --- eg, if the wc started
caching those created-* coordinates.</handwaving>

Thanks for the explanation,

Daniel

[1] Not sure what fraction of opportunities that 'some' is.

> -- C-Mike
> 
> [SIDEBAR:  It just occurred to me that the server is still transmitting
> HTTPv1-style version resource URLs, not HTTPv2-style URLs ... I guess that's
> a separate issue, though.]
> 
> -- 
> C. Michael Pilato <cm...@collab.net>
> CollabNet   <>   www.collab.net   <>   Distributed Development On Demand
> 



Re: svn commit: r1153493 - /subversion/branches/1.7.x/STATUS

Posted by "C. Michael Pilato" <cm...@collab.net>.
On 08/03/2011 10:40 AM, Daniel Shahaf wrote:
> cmpilato@apache.org wrote on Wed, Aug 03, 2011 at 14:00:39 -0000:
>> +     (As an aside, Serf's potential as a platform for future
>> +     improvement remains unproven and doubtful.  For example, HTTPv2
>> +     removes canonical resource URLs, which works against the caching
>> +     proxy concept that seems to be the strongest argument in favor of
>> +     Serf's approach.  But that's not strictly germane here.)
> 
> That sounds odd for caching not to be taken into consideration in
> HTTPv2's design.  And glancing at the httpv2 design notes suggests that
> it was explicitly a goal.
> 
> Are you saying that somehow HTTPv2 actually made the cacheability
> situation worse in some cases?  Or just that it doesn't make the
> situation as as good as it promised to?

As I dig into this a bit, I realize that the situation isn't quite as bad as
I originally thought.  But that's only because of a coding oversight on my
part.  (Happy accident?)  I'll explain below.  Note that I'm assuming that
cacheability works best when the RA layers use a single canonical URL to
fetch a given resource.

Let's first talk about the cost of addressing any particular server
node@revision resource.  In HTTPv1, clients couldn't just calculate the URL
of a resource -- they had to negotiate with the server using WebDAV/DeltaV
abstractions.  Multiple roundtrips per calculation ... performance shutdown
... you get the picture.  mod_dav_svn helps here by transmitting in its
update-style REPORT responses a "version resource URL", which the client
caches via the davprops store in the working copy to avoid future costly
lookups.  HTTPv2 facilitates client-side construction of resource URLs
without server negotiation, therefore has no need of the davprops persistent
cache mechanism, and as such the code doesn't use the davprops stuff at all
when HTTPv2 is active.

The second factor of interest here is the canonical URL issue.  If I have a
file that was created in revision 1 and remains unchanged henceforward, a
client can address that file via any number of URLs.  After all, file@1 ==
file@2 == file@3 == ..., right?  mod_dav_svn again tries to help here by
normalizing the version resource URL that it sends to the client for a given
resource based on the created-path and created-rev of the resource.  So no
matter which version of our file we're talking about, mod_dav_svn will
report its versioned resource URL as:

   .../!svn/ver/<CREATED-REV>/<CREATED-PATH>

Here's where I think the current code falls short.  While the update process
still pays attention to the canonical version resource URL transmitted by
the server (that was the happy accident ... ra_serf *could* be ignoring that
today in favor of self-constructed URLs), that URL isn't cached in the WC
any longer.  This means that future (non-update-style) operations performed
by the client will be addressing the resources by some self-constructed,
probably-non-canonical URL.  Stuff still works, of course, but this eats at
the cache-friendliness.

Does that help to explain things better?

-- C-Mike

[SIDEBAR:  It just occurred to me that the server is still transmitting
HTTPv1-style version resource URLs, not HTTPv2-style URLs ... I guess that's
a separate issue, though.]

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand