You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Johan Corveleyn <jc...@gmail.com> on 2019/09/02 14:20:16 UTC

Re: Last-Modified HTTP header in GET responses

On Fri, Jan 15, 2016 at 1:58 PM Ivan Zhakov <iv...@visualsvn.com> wrote:
> On 7 January 2016 at 10:34, Ivan Zhakov <iv...@visualsvn.com> wrote:
> > On 6 January 2016 at 08:14, Greg Stein <gs...@gmail.com> wrote:
> >> Personally, I'd be more interested in the effects on the network and its
> >> caching ability. Do we really need to save CPU/IO on the server? Today's
> >> servers seem more than capable, and are there really svn servers out in the
> >> wild getting so crushed, that this is important? It seems that as long as
> >> proxies/etc can properly cache the results, and (thus) avoid future touches
> >> on the backend server, then we're good to go.
> >>
> >> If the patch doesn't affect the caching (which it sounds like "no"), then
> >> just go with it. Sure, it is neat to look at ayscalls, but... why? I don't
> >> understand the need to examine/profile. Educate me?
> >>
> >
> > The patch should not affect HTTP caching for two reasons:
> > 1. Browsers and proxies supports ETag and use it instead of
> > Last-Modified header.
> > 2. ETag and Last-Modified headers are used only for cache
> > re-validation when max-age is expired. But Subversion sets max-age=1
> > week for resources with specific revision in URL
> > (http://server/!svn/rvr/1/path). max-age=0 is only used for public
> > URLs without revision, i.e. http://server/path)
> >
> > As far I know proxy usage are limited to public servers with anonymous
> > access, since caching of HTTP responses with Authorization is
> > prohibited by RFC.
> >
> > Anyway I agree that trading bandwidth usage to save some CPU/IO on the
> > server doesn't make sense, but Last-Modified case is the different:
> > Subversion server wasting 10%+ of server resources to produce unused
> > header.
> >
> > I don't have access to svn.apache.org server performance stats, but I
> > suppose it's pretty busy server and Infra team would welcome any
> > Subversion server performance improvements.
> >
> Committed in r1724790.
>
> --
> Ivan Zhakov

A bit late perhaps, but apparently this change (removing the
Last-Modified header from GET responses) broke a specific use case at
my company (we just upgraded our SVN server from 1.9 to 1.10, bringing
along this particular change):

- We use Apache Ivy (http://ant.apache.org/ivy/) for dependency
management of our Java applications.

- Third party jar files are kept in our svn repo under
/trunk/ivyrepository (and branched / tagged in release branches, so we
have completely reproducible builds, even if our third party jars or
their dependency structures change on trunk).

- We use Ivy's "URL Resolver" [1], which downloads the files with
regular GET requests (and HEAD requests to check the up-to-dateness of
the cache on the client). We effectively use SVN in this case as a
"regular" file server (which coincidentally has branches and tags so
we can resolve against the correct tree when making a build).

This last part now fails, i.e. Ivy's URLResolver no longer detects
that a file has changed. It used to compare its own "last-mod time of
the file on disk in the cache" with the Last-Modified header, which
works fine with all kinds of file servers, and worked with SVN < 1.10.

I think it's unfortunate that we broke compatibility here (even if
it's not usage by a normal svn client) for the sake of some relatively
small performance / load gain on the server. If we could get the old
behavior back with some Apache directive, that would have been fine,
but there is no such option at the moment.

Also: if the Last-Modified would have been removed only for the
"internal GET urls" (like http://server/!svn/rvr/1/path), for
optimizing checkout (as executed by normal svn clients), that would
have been understandable. But why remove it for the "external GET
urls" (http://svnserver/path) as well? Those have nothing to do with
checkout load, those are only used by browsers or "tools using SVN as
a glorified file server" :-).

I am by no means an expert in HTTP standards, and various online
sources give different recommendations for these headers (ETag,
Last-Modified, ... request headers for conditional GETs, ...). But we
found an old discussion thread on the "dev@rapidsvn.tigris.org"
mailinglist from 15 years ago, discussing "a very basic idea: let
mod_dav_svn set the Last-Modified HTTP header ..." [2]. Perhaps the
feature dates from back then (indicating that it wasn't an accidental
feature)?

Anyway, how about bringing this feature back in some form?
- Revert r1724790?
- or only for "external GET urls"?
- or only if some Apache directive is set?

Thoughts?

[1] http://ant.apache.org/ivy/history/2.5.0-rc1/resolver/url.html
[2] https://dev.rapidsvn.tigris.narkive.com/oRlt6xsW/last-modified-http-header-from-mod-dav-svn
-- 
Johan

Re: Last-Modified HTTP header in GET responses

Posted by Doug Robinson <do...@wandisco.com>.
Daniel:

On Mon, Nov 25, 2019 at 1:31 PM Daniel Shahaf <d....@daniel.shahaf.name>
wrote:

> Doug Robinson wrote on Mon, 25 Nov 2019 14:11 +00:00:
> > Can we get this fix back-ported into 1.10.x please? Breaking an LTS is
> > unfortunate as is waiting until the next LTS.
>
> r1866425 is already in
>
> https://svn.apache.org/repos/asf/subversion/branches/1.10.x/STATUS?p=r1870409
> .
> It will likely receive the third vote and be merged before the next
> 1.10.x release.  We haven't scheduled that release yet.
>
> That revision was already merged to 1.12.x, too, so it should be
> reasonably safe to cherry-pick.
>

Thank you.

Cheers!

Doug
-- 
*DOUGLAS B ROBINSON* SENIOR PRODUCT MANAGER

T +1 925 396 1125
*E* doug.robinson@wandisco.com

-- 


* <http://wandisco.com/>*

**The *LiveData* Company
*Find out more 
*wandisco.com <http://wandisco.com/>*



 
<https://www.wandisco.com/liveanalytics>


THIS MESSAGE AND ANY ATTACHMENTS 
ARE CONFIDENTIAL, PROPRIETARY AND MAY BE PRIVILEGED
*


If this message was 
misdirected, WANdisco, Inc. and its subsidiaries, ("WANdisco") does not 
waive any confidentiality or privilege. If you are not the intended 
recipient, please notify us immediately and destroy the message without 
disclosing its contents to anyone. Any distribution, use or copying of this 
email or the information it contains by other than an intended recipient is 
unauthorized. The views and opinions expressed in this email message are 
the author's own and may not reflect the views and opinions of WANdisco, 
unless the author is authorized by WANdisco to express such views or 
opinions on its behalf. All email sent to or from this address is subject 
to electronic storage and review by WANdisco. Although WANdisco operates 
anti-virus programs, it does not accept responsibility for any damage 
whatsoever caused by viruses being passed.

Re: Last-Modified HTTP header in GET responses

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Doug Robinson wrote on Mon, 25 Nov 2019 14:11 +00:00:
> Can we get this fix back-ported into 1.10.x please? Breaking an LTS is 
> unfortunate as is waiting until the next LTS.

r1866425 is already in
https://svn.apache.org/repos/asf/subversion/branches/1.10.x/STATUS?p=r1870409.
It will likely receive the third vote and be merged before the next
1.10.x release.  We haven't scheduled that release yet.

That revision was already merged to 1.12.x, too, so it should be
reasonably safe to cherry-pick.

Re: Last-Modified HTTP header in GET responses

Posted by Doug Robinson <do...@wandisco.com>.
Folks:

Can we get this fix back-ported into 1.10.x please?  Breaking an LTS is
unfortunate as is waiting until the next LTS.

Cheers.

Doug

On Wed, Sep 4, 2019 at 8:03 PM Johan Corveleyn <jc...@gmail.com> wrote:

> On Wed, Sep 4, 2019 at 2:47 PM Johan Corveleyn <jc...@gmail.com> wrote:
> >
> > On Wed, Sep 4, 2019 at 2:01 PM Branko Čibej <br...@apache.org> wrote:
> > >
> > > On 04.09.2019 11:44, Johan Corveleyn wrote:
> > > > On Tue, Sep 3, 2019 at 8:01 AM Branko Čibej <br...@apache.org>
> wrote:
> > > [...]
> > > >>> Anyway, how about bringing this feature back in some form?
> > > >>> - Revert r1724790?
> > > >> This is clearly the simplest solution, but I have no idea what the
> > > >> performance impact would be. From looking at the diff, my best
> guess is
> > > >> that svn_fs_node_created_rev() and svn_fs_revision_prop() dominate.
> > > >>
> > > >>> - or only for "external GET urls"?
> > > >>> - or only if some Apache directive is set?
> > > >>>
> > > >>> Thoughts?
> > > >> I would prefer not to add yet another configuration knob to the
> server.
> > > >> I agree that versioned-resource URLs are only interesting for
> DAV-aware
> > > >> clients, and those clients already know how to check for
> modifications
> > > >> without looking at Last-Modified. That would imply that adding the
> > > >> header for external URLs is the right solution.
> > > >>
> > > >> -- Brane
> > > > Thanks for your response, Brane.
> > > >
> > > > I think the below patch would do it (set the Last-Modified header
> only
> > > > for "external URLs").
> > > > It's basically a revert of r1724790 (and adding a test), plus
> wrapping
> > > > the actual setting of Last-Modified inside the following condition:
> > > >
> > > >     if ((resource->type == DAV_RESOURCE_TYPE_REGULAR)
> > > >         && (resource->info->repos_path ==
> resource->info->uri_path->data))
> > >
> > > Yes, all our resources are "regular", I think. The second part of the
> > > condition just says that if something is called "/foo/bar" in the
> > > repository, it's being accessed as
> > > "http://example.org/repos-root/foo/bar" and not some alias. Which is
> > > exactly what you want.
> >
> > Perfect.
> >
> > > >  const char *
> > > >  dav_svn__getetag(const dav_resource *resource, apr_pool_t *pool)
> > > >  {
> > > > @@ -3219,6 +3263,25 @@ set_headers(request_rec *r, const
> dav_resource *re
> > > >    if (!resource->exists)
> > > >      return NULL;
> > > >
> > > > +  if ((resource->type == DAV_RESOURCE_TYPE_REGULAR)
> > > > +      && (resource->info->repos_path ==
> resource->info->uri_path->data))
> > > > +    {
> > > > +      /* Include Last-Modified header for 'external' GET requests
> > > > +         (i.e. requests to URI's not under /!svn), to support usage
> of an
> > > > +         SVN server as a file server, where the client needs
> timestamps
> > > > +         for instance to use as "last modification time" of files
> on disk. */
> > > > +      apr_time_t last_modified;
> > > > +
> > > > +      last_modified = get_last_modified(resource);
> > >
> > > "Declaration is initialisation" please, wherever possible. So:
> > >
> > >     const apr_time_t last_modified = get_last_modified(resource);
> > >
> > >
> > > This also prevents you from accidentally modifying last_modified later
> on.
> >
> > Ack, will fix.
> >
> > > > +      if (last_modified != -1)
> > > > +        {
> > > > +          /* Note the modification time for the requested resource,
> and
> > > > +             include the Last-Modified header in the response. */
> > > > +          ap_update_mtime(r, last_modified);
> > > > +          ap_set_last_modified(r);
> > > > +        }
> > > > +    }
> > > > +
> > > >    /* generate our etag and place it into the output */
> > > >    apr_table_setn(r->headers_out, "ETag",
> > > >                   dav_svn__getetag(resource, resource->pool));
> > > > Index: subversion/tests/cmdline/mod_dav_svn_tests.py
> > > > ===================================================================
> > > > --- subversion/tests/cmdline/mod_dav_svn_tests.py    (revision
> 1866345)
> > > > +++ subversion/tests/cmdline/mod_dav_svn_tests.py    (working copy)
> > > > @@ -640,6 +640,29 @@ def propfind_propname(sbox):
> > > >    actual_response = r.read()
> > > >    verify_xml_response(expected_response, actual_response)
> > > >
> > > > +@SkipUnless(svntest.main.is_ra_type_dav)
> > > > +def last_modified_header(sbox):
> > > > +  "verify 'Last-Modified' header on 'external' GETs"
> > > > +
> > > > +  sbox.build(create_wc=False, read_only=True)
> > > > +
> > > > +  headers = {
> > > > +    'Authorization': 'Basic ' +
> > > > base64.b64encode(b'jconstant:rayjandom').decode(),
> > > > +  }
> > > > +
> > > > +  h = svntest.main.create_http_connection(sbox.repo_url)
> > > > +
> > > > +  # GET /repos/iota
> > > > +  # Expect to see a Last-Modified header.
> > > > +  h.request('GET', sbox.repo_url + '/iota', None, headers)
> > > > +  r = h.getresponse()
> > > > +  if r.status != httplib.OK:
> > > > +    raise svntest.Failure('Request failed: %d %s' % (r.status,
> r.reason))
> > > > +  svntest.verify.compare_and_display_lines(None, 'Last-Modified',
> > > > +
>  svntest.verify.RegexOutput('.+'),
> > > > +
>  r.getheader('Last-Modified'))
> > > > +  r.read()
> > >
> > > Interesting approach ... and sure, it should work.
> > >
> > > But I'd also add a test to check that Last-Modified is *not* set on
> > > responses to versioned-resource URLs. And maybe also check that the
> HEAD
> > > method returns this header as well -- it should, and checking for that
> > > won't hurt.
> >
> > OK, I'll add those tests too, should be easy enough.
> >
> > (BTW, I copied the test from the cache_control_header() test in the
> > same file :-))
> >
> > Thanks for the review!
> > --
> > Johan
>
> Committed in r1866425.
>
> --
> Johan
>


-- 
*DOUGLAS B ROBINSON* SENIOR PRODUCT MANAGER

T +1 925 396 1125
*E* doug.robinson@wandisco.com

-- 


* <http://wandisco.com/>*

**The *LiveData* Company
*Find out more 
*wandisco.com <http://wandisco.com/>*



 
<https://www.wandisco.com/liveanalytics>


THIS MESSAGE AND ANY ATTACHMENTS 
ARE CONFIDENTIAL, PROPRIETARY AND MAY BE PRIVILEGED
*


If this message was 
misdirected, WANdisco, Inc. and its subsidiaries, ("WANdisco") does not 
waive any confidentiality or privilege. If you are not the intended 
recipient, please notify us immediately and destroy the message without 
disclosing its contents to anyone. Any distribution, use or copying of this 
email or the information it contains by other than an intended recipient is 
unauthorized. The views and opinions expressed in this email message are 
the author's own and may not reflect the views and opinions of WANdisco, 
unless the author is authorized by WANdisco to express such views or 
opinions on its behalf. All email sent to or from this address is subject 
to electronic storage and review by WANdisco. Although WANdisco operates 
anti-virus programs, it does not accept responsibility for any damage 
whatsoever caused by viruses being passed.

Re: Last-Modified HTTP header in GET responses

Posted by Johan Corveleyn <jc...@gmail.com>.
On Wed, Sep 4, 2019 at 2:47 PM Johan Corveleyn <jc...@gmail.com> wrote:
>
> On Wed, Sep 4, 2019 at 2:01 PM Branko Čibej <br...@apache.org> wrote:
> >
> > On 04.09.2019 11:44, Johan Corveleyn wrote:
> > > On Tue, Sep 3, 2019 at 8:01 AM Branko Čibej <br...@apache.org> wrote:
> > [...]
> > >>> Anyway, how about bringing this feature back in some form?
> > >>> - Revert r1724790?
> > >> This is clearly the simplest solution, but I have no idea what the
> > >> performance impact would be. From looking at the diff, my best guess is
> > >> that svn_fs_node_created_rev() and svn_fs_revision_prop() dominate.
> > >>
> > >>> - or only for "external GET urls"?
> > >>> - or only if some Apache directive is set?
> > >>>
> > >>> Thoughts?
> > >> I would prefer not to add yet another configuration knob to the server.
> > >> I agree that versioned-resource URLs are only interesting for DAV-aware
> > >> clients, and those clients already know how to check for modifications
> > >> without looking at Last-Modified. That would imply that adding the
> > >> header for external URLs is the right solution.
> > >>
> > >> -- Brane
> > > Thanks for your response, Brane.
> > >
> > > I think the below patch would do it (set the Last-Modified header only
> > > for "external URLs").
> > > It's basically a revert of r1724790 (and adding a test), plus wrapping
> > > the actual setting of Last-Modified inside the following condition:
> > >
> > >     if ((resource->type == DAV_RESOURCE_TYPE_REGULAR)
> > >         && (resource->info->repos_path == resource->info->uri_path->data))
> >
> > Yes, all our resources are "regular", I think. The second part of the
> > condition just says that if something is called "/foo/bar" in the
> > repository, it's being accessed as
> > "http://example.org/repos-root/foo/bar" and not some alias. Which is
> > exactly what you want.
>
> Perfect.
>
> > >  const char *
> > >  dav_svn__getetag(const dav_resource *resource, apr_pool_t *pool)
> > >  {
> > > @@ -3219,6 +3263,25 @@ set_headers(request_rec *r, const dav_resource *re
> > >    if (!resource->exists)
> > >      return NULL;
> > >
> > > +  if ((resource->type == DAV_RESOURCE_TYPE_REGULAR)
> > > +      && (resource->info->repos_path == resource->info->uri_path->data))
> > > +    {
> > > +      /* Include Last-Modified header for 'external' GET requests
> > > +         (i.e. requests to URI's not under /!svn), to support usage of an
> > > +         SVN server as a file server, where the client needs timestamps
> > > +         for instance to use as "last modification time" of files on disk. */
> > > +      apr_time_t last_modified;
> > > +
> > > +      last_modified = get_last_modified(resource);
> >
> > "Declaration is initialisation" please, wherever possible. So:
> >
> >     const apr_time_t last_modified = get_last_modified(resource);
> >
> >
> > This also prevents you from accidentally modifying last_modified later on.
>
> Ack, will fix.
>
> > > +      if (last_modified != -1)
> > > +        {
> > > +          /* Note the modification time for the requested resource, and
> > > +             include the Last-Modified header in the response. */
> > > +          ap_update_mtime(r, last_modified);
> > > +          ap_set_last_modified(r);
> > > +        }
> > > +    }
> > > +
> > >    /* generate our etag and place it into the output */
> > >    apr_table_setn(r->headers_out, "ETag",
> > >                   dav_svn__getetag(resource, resource->pool));
> > > Index: subversion/tests/cmdline/mod_dav_svn_tests.py
> > > ===================================================================
> > > --- subversion/tests/cmdline/mod_dav_svn_tests.py    (revision 1866345)
> > > +++ subversion/tests/cmdline/mod_dav_svn_tests.py    (working copy)
> > > @@ -640,6 +640,29 @@ def propfind_propname(sbox):
> > >    actual_response = r.read()
> > >    verify_xml_response(expected_response, actual_response)
> > >
> > > +@SkipUnless(svntest.main.is_ra_type_dav)
> > > +def last_modified_header(sbox):
> > > +  "verify 'Last-Modified' header on 'external' GETs"
> > > +
> > > +  sbox.build(create_wc=False, read_only=True)
> > > +
> > > +  headers = {
> > > +    'Authorization': 'Basic ' +
> > > base64.b64encode(b'jconstant:rayjandom').decode(),
> > > +  }
> > > +
> > > +  h = svntest.main.create_http_connection(sbox.repo_url)
> > > +
> > > +  # GET /repos/iota
> > > +  # Expect to see a Last-Modified header.
> > > +  h.request('GET', sbox.repo_url + '/iota', None, headers)
> > > +  r = h.getresponse()
> > > +  if r.status != httplib.OK:
> > > +    raise svntest.Failure('Request failed: %d %s' % (r.status, r.reason))
> > > +  svntest.verify.compare_and_display_lines(None, 'Last-Modified',
> > > +                                           svntest.verify.RegexOutput('.+'),
> > > +                                           r.getheader('Last-Modified'))
> > > +  r.read()
> >
> > Interesting approach ... and sure, it should work.
> >
> > But I'd also add a test to check that Last-Modified is *not* set on
> > responses to versioned-resource URLs. And maybe also check that the HEAD
> > method returns this header as well -- it should, and checking for that
> > won't hurt.
>
> OK, I'll add those tests too, should be easy enough.
>
> (BTW, I copied the test from the cache_control_header() test in the
> same file :-))
>
> Thanks for the review!
> --
> Johan

Committed in r1866425.

-- 
Johan

Re: Last-Modified HTTP header in GET responses

Posted by Johan Corveleyn <jc...@gmail.com>.
On Wed, Sep 4, 2019 at 2:01 PM Branko Čibej <br...@apache.org> wrote:
>
> On 04.09.2019 11:44, Johan Corveleyn wrote:
> > On Tue, Sep 3, 2019 at 8:01 AM Branko Čibej <br...@apache.org> wrote:
> [...]
> >>> Anyway, how about bringing this feature back in some form?
> >>> - Revert r1724790?
> >> This is clearly the simplest solution, but I have no idea what the
> >> performance impact would be. From looking at the diff, my best guess is
> >> that svn_fs_node_created_rev() and svn_fs_revision_prop() dominate.
> >>
> >>> - or only for "external GET urls"?
> >>> - or only if some Apache directive is set?
> >>>
> >>> Thoughts?
> >> I would prefer not to add yet another configuration knob to the server.
> >> I agree that versioned-resource URLs are only interesting for DAV-aware
> >> clients, and those clients already know how to check for modifications
> >> without looking at Last-Modified. That would imply that adding the
> >> header for external URLs is the right solution.
> >>
> >> -- Brane
> > Thanks for your response, Brane.
> >
> > I think the below patch would do it (set the Last-Modified header only
> > for "external URLs").
> > It's basically a revert of r1724790 (and adding a test), plus wrapping
> > the actual setting of Last-Modified inside the following condition:
> >
> >     if ((resource->type == DAV_RESOURCE_TYPE_REGULAR)
> >         && (resource->info->repos_path == resource->info->uri_path->data))
>
> Yes, all our resources are "regular", I think. The second part of the
> condition just says that if something is called "/foo/bar" in the
> repository, it's being accessed as
> "http://example.org/repos-root/foo/bar" and not some alias. Which is
> exactly what you want.

Perfect.

> >  const char *
> >  dav_svn__getetag(const dav_resource *resource, apr_pool_t *pool)
> >  {
> > @@ -3219,6 +3263,25 @@ set_headers(request_rec *r, const dav_resource *re
> >    if (!resource->exists)
> >      return NULL;
> >
> > +  if ((resource->type == DAV_RESOURCE_TYPE_REGULAR)
> > +      && (resource->info->repos_path == resource->info->uri_path->data))
> > +    {
> > +      /* Include Last-Modified header for 'external' GET requests
> > +         (i.e. requests to URI's not under /!svn), to support usage of an
> > +         SVN server as a file server, where the client needs timestamps
> > +         for instance to use as "last modification time" of files on disk. */
> > +      apr_time_t last_modified;
> > +
> > +      last_modified = get_last_modified(resource);
>
> "Declaration is initialisation" please, wherever possible. So:
>
>     const apr_time_t last_modified = get_last_modified(resource);
>
>
> This also prevents you from accidentally modifying last_modified later on.

Ack, will fix.

> > +      if (last_modified != -1)
> > +        {
> > +          /* Note the modification time for the requested resource, and
> > +             include the Last-Modified header in the response. */
> > +          ap_update_mtime(r, last_modified);
> > +          ap_set_last_modified(r);
> > +        }
> > +    }
> > +
> >    /* generate our etag and place it into the output */
> >    apr_table_setn(r->headers_out, "ETag",
> >                   dav_svn__getetag(resource, resource->pool));
> > Index: subversion/tests/cmdline/mod_dav_svn_tests.py
> > ===================================================================
> > --- subversion/tests/cmdline/mod_dav_svn_tests.py    (revision 1866345)
> > +++ subversion/tests/cmdline/mod_dav_svn_tests.py    (working copy)
> > @@ -640,6 +640,29 @@ def propfind_propname(sbox):
> >    actual_response = r.read()
> >    verify_xml_response(expected_response, actual_response)
> >
> > +@SkipUnless(svntest.main.is_ra_type_dav)
> > +def last_modified_header(sbox):
> > +  "verify 'Last-Modified' header on 'external' GETs"
> > +
> > +  sbox.build(create_wc=False, read_only=True)
> > +
> > +  headers = {
> > +    'Authorization': 'Basic ' +
> > base64.b64encode(b'jconstant:rayjandom').decode(),
> > +  }
> > +
> > +  h = svntest.main.create_http_connection(sbox.repo_url)
> > +
> > +  # GET /repos/iota
> > +  # Expect to see a Last-Modified header.
> > +  h.request('GET', sbox.repo_url + '/iota', None, headers)
> > +  r = h.getresponse()
> > +  if r.status != httplib.OK:
> > +    raise svntest.Failure('Request failed: %d %s' % (r.status, r.reason))
> > +  svntest.verify.compare_and_display_lines(None, 'Last-Modified',
> > +                                           svntest.verify.RegexOutput('.+'),
> > +                                           r.getheader('Last-Modified'))
> > +  r.read()
>
> Interesting approach ... and sure, it should work.
>
> But I'd also add a test to check that Last-Modified is *not* set on
> responses to versioned-resource URLs. And maybe also check that the HEAD
> method returns this header as well -- it should, and checking for that
> won't hurt.

OK, I'll add those tests too, should be easy enough.

(BTW, I copied the test from the cache_control_header() test in the
same file :-))

Thanks for the review!
-- 
Johan

Re: Last-Modified HTTP header in GET responses

Posted by Branko Čibej <br...@apache.org>.
On 04.09.2019 11:44, Johan Corveleyn wrote:
> On Tue, Sep 3, 2019 at 8:01 AM Branko Čibej <br...@apache.org> wrote:
[...]
>>> Anyway, how about bringing this feature back in some form?
>>> - Revert r1724790?
>> This is clearly the simplest solution, but I have no idea what the
>> performance impact would be. From looking at the diff, my best guess is
>> that svn_fs_node_created_rev() and svn_fs_revision_prop() dominate.
>>
>>> - or only for "external GET urls"?
>>> - or only if some Apache directive is set?
>>>
>>> Thoughts?
>> I would prefer not to add yet another configuration knob to the server.
>> I agree that versioned-resource URLs are only interesting for DAV-aware
>> clients, and those clients already know how to check for modifications
>> without looking at Last-Modified. That would imply that adding the
>> header for external URLs is the right solution.
>>
>> -- Brane
> Thanks for your response, Brane.
>
> I think the below patch would do it (set the Last-Modified header only
> for "external URLs").
> It's basically a revert of r1724790 (and adding a test), plus wrapping
> the actual setting of Last-Modified inside the following condition:
>
>     if ((resource->type == DAV_RESOURCE_TYPE_REGULAR)
>         && (resource->info->repos_path == resource->info->uri_path->data))

Yes, all our resources are "regular", I think. The second part of the
condition just says that if something is called "/foo/bar" in the
repository, it's being accessed as
"http://example.org/repos-root/foo/bar" and not some alias. Which is
exactly what you want.


>  const char *
>  dav_svn__getetag(const dav_resource *resource, apr_pool_t *pool)
>  {
> @@ -3219,6 +3263,25 @@ set_headers(request_rec *r, const dav_resource *re
>    if (!resource->exists)
>      return NULL;
>
> +  if ((resource->type == DAV_RESOURCE_TYPE_REGULAR)
> +      && (resource->info->repos_path == resource->info->uri_path->data))
> +    {
> +      /* Include Last-Modified header for 'external' GET requests
> +         (i.e. requests to URI's not under /!svn), to support usage of an
> +         SVN server as a file server, where the client needs timestamps
> +         for instance to use as "last modification time" of files on disk. */
> +      apr_time_t last_modified;
> +
> +      last_modified = get_last_modified(resource);

"Declaration is initialisation" please, wherever possible. So:

    const apr_time_t last_modified = get_last_modified(resource);


This also prevents you from accidentally modifying last_modified later on.

> +      if (last_modified != -1)
> +        {
> +          /* Note the modification time for the requested resource, and
> +             include the Last-Modified header in the response. */
> +          ap_update_mtime(r, last_modified);
> +          ap_set_last_modified(r);
> +        }
> +    }
> +
>    /* generate our etag and place it into the output */
>    apr_table_setn(r->headers_out, "ETag",
>                   dav_svn__getetag(resource, resource->pool));
> Index: subversion/tests/cmdline/mod_dav_svn_tests.py
> ===================================================================
> --- subversion/tests/cmdline/mod_dav_svn_tests.py    (revision 1866345)
> +++ subversion/tests/cmdline/mod_dav_svn_tests.py    (working copy)
> @@ -640,6 +640,29 @@ def propfind_propname(sbox):
>    actual_response = r.read()
>    verify_xml_response(expected_response, actual_response)
>
> +@SkipUnless(svntest.main.is_ra_type_dav)
> +def last_modified_header(sbox):
> +  "verify 'Last-Modified' header on 'external' GETs"
> +
> +  sbox.build(create_wc=False, read_only=True)
> +
> +  headers = {
> +    'Authorization': 'Basic ' +
> base64.b64encode(b'jconstant:rayjandom').decode(),
> +  }
> +
> +  h = svntest.main.create_http_connection(sbox.repo_url)
> +
> +  # GET /repos/iota
> +  # Expect to see a Last-Modified header.
> +  h.request('GET', sbox.repo_url + '/iota', None, headers)
> +  r = h.getresponse()
> +  if r.status != httplib.OK:
> +    raise svntest.Failure('Request failed: %d %s' % (r.status, r.reason))
> +  svntest.verify.compare_and_display_lines(None, 'Last-Modified',
> +                                           svntest.verify.RegexOutput('.+'),
> +                                           r.getheader('Last-Modified'))
> +  r.read()

Interesting approach ... and sure, it should work.

But I'd also add a test to check that Last-Modified is *not* set on
responses to versioned-resource URLs. And maybe also check that the HEAD
method returns this header as well -- it should, and checking for that
won't hurt.

-- Brane

Re: Last-Modified HTTP header in GET responses

Posted by Johan Corveleyn <jc...@gmail.com>.
On Tue, Sep 3, 2019 at 8:01 AM Branko Čibej <br...@apache.org> wrote:
>
> On 02.09.2019 16:20, Johan Corveleyn wrote:
> > On Fri, Jan 15, 2016 at 1:58 PM Ivan Zhakov <iv...@visualsvn.com> wrote:
> >> On 7 January 2016 at 10:34, Ivan Zhakov <iv...@visualsvn.com> wrote:
> >>> On 6 January 2016 at 08:14, Greg Stein <gs...@gmail.com> wrote:
> >>>> Personally, I'd be more interested in the effects on the network and its
> >>>> caching ability. Do we really need to save CPU/IO on the server? Today's
> >>>> servers seem more than capable, and are there really svn servers out in the
> >>>> wild getting so crushed, that this is important? It seems that as long as
> >>>> proxies/etc can properly cache the results, and (thus) avoid future touches
> >>>> on the backend server, then we're good to go.
> >>>>
> >>>> If the patch doesn't affect the caching (which it sounds like "no"), then
> >>>> just go with it. Sure, it is neat to look at ayscalls, but... why? I don't
> >>>> understand the need to examine/profile. Educate me?
> >>>>
> >>> The patch should not affect HTTP caching for two reasons:
> >>> 1. Browsers and proxies supports ETag and use it instead of
> >>> Last-Modified header.
> >>> 2. ETag and Last-Modified headers are used only for cache
> >>> re-validation when max-age is expired. But Subversion sets max-age=1
> >>> week for resources with specific revision in URL
> >>> (http://server/!svn/rvr/1/path). max-age=0 is only used for public
> >>> URLs without revision, i.e. http://server/path)
> >>>
> >>> As far I know proxy usage are limited to public servers with anonymous
> >>> access, since caching of HTTP responses with Authorization is
> >>> prohibited by RFC.
> >>>
> >>> Anyway I agree that trading bandwidth usage to save some CPU/IO on the
> >>> server doesn't make sense, but Last-Modified case is the different:
> >>> Subversion server wasting 10%+ of server resources to produce unused
> >>> header.
> >>>
> >>> I don't have access to svn.apache.org server performance stats, but I
> >>> suppose it's pretty busy server and Infra team would welcome any
> >>> Subversion server performance improvements.
> >>>
> >> Committed in r1724790.
> >>
> >> --
> >> Ivan Zhakov
> > A bit late perhaps, but apparently this change (removing the
> > Last-Modified header from GET responses) broke a specific use case at
> > my company (we just upgraded our SVN server from 1.9 to 1.10, bringing
> > along this particular change):
> >
> > - We use Apache Ivy (http://ant.apache.org/ivy/) for dependency
> > management of our Java applications.
> >
> > - Third party jar files are kept in our svn repo under
> > /trunk/ivyrepository (and branched / tagged in release branches, so we
> > have completely reproducible builds, even if our third party jars or
> > their dependency structures change on trunk).
> >
> > - We use Ivy's "URL Resolver" [1], which downloads the files with
> > regular GET requests (and HEAD requests to check the up-to-dateness of
> > the cache on the client). We effectively use SVN in this case as a
> > "regular" file server (which coincidentally has branches and tags so
> > we can resolve against the correct tree when making a build).
> >
> > This last part now fails, i.e. Ivy's URLResolver no longer detects
> > that a file has changed. It used to compare its own "last-mod time of
> > the file on disk in the cache" with the Last-Modified header, which
> > works fine with all kinds of file servers, and worked with SVN < 1.10.
> >
> > I think it's unfortunate that we broke compatibility here (even if
> > it's not usage by a normal svn client) for the sake of some relatively
> > small performance / load gain on the server. If we could get the old
> > behavior back with some Apache directive, that would have been fine,
> > but there is no such option at the moment.
> >
> > Also: if the Last-Modified would have been removed only for the
> > "internal GET urls" (like http://server/!svn/rvr/1/path), for
> > optimizing checkout (as executed by normal svn clients), that would
> > have been understandable. But why remove it for the "external GET
> > urls" (http://svnserver/path) as well? Those have nothing to do with
> > checkout load, those are only used by browsers or "tools using SVN as
> > a glorified file server" :-).
> >
> > I am by no means an expert in HTTP standards, and various online
> > sources give different recommendations for these headers (ETag,
> > Last-Modified, ... request headers for conditional GETs, ...). But we
> > found an old discussion thread on the "dev@rapidsvn.tigris.org"
> > mailinglist from 15 years ago, discussing "a very basic idea: let
> > mod_dav_svn set the Last-Modified HTTP header ..." [2]. Perhaps the
> > feature dates from back then (indicating that it wasn't an accidental
> > feature)?
>
> I'm fairly sure that it wasn't accidental. The whole idea that you can
> use a browser to look at a Subversion repository was intentional, adding
> appropriate HTTP headers in responses was surely part of that.
>
> That said, Ivan's original argument about caching not being affected by
> this change is correct ... but it ignores your particular use case. The
> error was in assuming it's OK to break compatibility on the protocol
> level like this.
>
> > Anyway, how about bringing this feature back in some form?
> > - Revert r1724790?
>
> This is clearly the simplest solution, but I have no idea what the
> performance impact would be. From looking at the diff, my best guess is
> that svn_fs_node_created_rev() and svn_fs_revision_prop() dominate.
>
> > - or only for "external GET urls"?
> > - or only if some Apache directive is set?
> >
> > Thoughts?
>
> I would prefer not to add yet another configuration knob to the server.
> I agree that versioned-resource URLs are only interesting for DAV-aware
> clients, and those clients already know how to check for modifications
> without looking at Last-Modified. That would imply that adding the
> header for external URLs is the right solution.
>
> -- Brane

Thanks for your response, Brane.

I think the below patch would do it (set the Last-Modified header only
for "external URLs").
It's basically a revert of r1724790 (and adding a test), plus wrapping
the actual setting of Last-Modified inside the following condition:

    if ((resource->type == DAV_RESOURCE_TYPE_REGULAR)
        && (resource->info->repos_path == resource->info->uri_path->data))

I'm not entirely sure this is the perfect way to test for "only
external url's", but it seems to work (i.e. it adds the header to
/iota, but not to /!svn/rvr/1/iota). If I omit the second part of the
condition, it's not selective enough (apparently /!svn/rvr/1/iota is
also a REGULAR resource). That second part of the condition
corresponds to what is specifically set in the case of an "external
request" in parse_uri(), IIUC:
    http://svn.apache.org/viewvc/subversion/trunk/subversion/mod_dav_svn/repos.c?revision=1724790&view=markup&pathrev=1724790#l791

[[[
Index: subversion/mod_dav_svn/repos.c
===================================================================
--- subversion/mod_dav_svn/repos.c    (revision 1866345)
+++ subversion/mod_dav_svn/repos.c    (working copy)
@@ -3139,6 +3139,50 @@ seek_stream(dav_stream *stream, apr_off_t abs_posi
        && resource->baselined))


+/* Return the last modification time of RESOURCE, or -1 if the DAV
+   resource type is not handled, or if an error occurs.  Temporary
+   allocations are made from RESOURCE->POOL. */
+static apr_time_t
+get_last_modified(const dav_resource *resource)
+{
+  apr_time_t last_modified;
+  svn_error_t *serr;
+  svn_revnum_t created_rev;
+  svn_string_t *date_time;
+
+  if (RESOURCE_LACKS_ETAG_POTENTIAL(resource))
+    return -1;
+
+  if ((serr = svn_fs_node_created_rev(&created_rev, resource->info->root.root,
+                                      resource->info->repos_path,
+                                      resource->pool)))
+    {
+      svn_error_clear(serr);
+      return -1;
+    }
+
+  if ((serr = svn_fs_revision_prop2(&date_time, resource->info->repos->fs,
+                                    created_rev, SVN_PROP_REVISION_DATE,
+                                    TRUE, resource->pool, resource->pool)))
+    {
+      svn_error_clear(serr);
+      return -1;
+    }
+
+  if (date_time == NULL || date_time->data == NULL)
+    return -1;
+
+  if ((serr = svn_time_from_cstring(&last_modified, date_time->data,
+                                    resource->pool)))
+    {
+      svn_error_clear(serr);
+      return -1;
+    }
+
+  return last_modified;
+}
+
+
 const char *
 dav_svn__getetag(const dav_resource *resource, apr_pool_t *pool)
 {
@@ -3219,6 +3263,25 @@ set_headers(request_rec *r, const dav_resource *re
   if (!resource->exists)
     return NULL;

+  if ((resource->type == DAV_RESOURCE_TYPE_REGULAR)
+      && (resource->info->repos_path == resource->info->uri_path->data))
+    {
+      /* Include Last-Modified header for 'external' GET requests
+         (i.e. requests to URI's not under /!svn), to support usage of an
+         SVN server as a file server, where the client needs timestamps
+         for instance to use as "last modification time" of files on disk. */
+      apr_time_t last_modified;
+
+      last_modified = get_last_modified(resource);
+      if (last_modified != -1)
+        {
+          /* Note the modification time for the requested resource, and
+             include the Last-Modified header in the response. */
+          ap_update_mtime(r, last_modified);
+          ap_set_last_modified(r);
+        }
+    }
+
   /* generate our etag and place it into the output */
   apr_table_setn(r->headers_out, "ETag",
                  dav_svn__getetag(resource, resource->pool));
Index: subversion/tests/cmdline/mod_dav_svn_tests.py
===================================================================
--- subversion/tests/cmdline/mod_dav_svn_tests.py    (revision 1866345)
+++ subversion/tests/cmdline/mod_dav_svn_tests.py    (working copy)
@@ -640,6 +640,29 @@ def propfind_propname(sbox):
   actual_response = r.read()
   verify_xml_response(expected_response, actual_response)

+@SkipUnless(svntest.main.is_ra_type_dav)
+def last_modified_header(sbox):
+  "verify 'Last-Modified' header on 'external' GETs"
+
+  sbox.build(create_wc=False, read_only=True)
+
+  headers = {
+    'Authorization': 'Basic ' +
base64.b64encode(b'jconstant:rayjandom').decode(),
+  }
+
+  h = svntest.main.create_http_connection(sbox.repo_url)
+
+  # GET /repos/iota
+  # Expect to see a Last-Modified header.
+  h.request('GET', sbox.repo_url + '/iota', None, headers)
+  r = h.getresponse()
+  if r.status != httplib.OK:
+    raise svntest.Failure('Request failed: %d %s' % (r.status, r.reason))
+  svntest.verify.compare_and_display_lines(None, 'Last-Modified',
+                                           svntest.verify.RegexOutput('.+'),
+                                           r.getheader('Last-Modified'))
+  r.read()
+
 ########################################################################
 # Run the tests

@@ -652,6 +675,7 @@ test_list = [ None,
               propfind_404,
               propfind_allprop,
               propfind_propname,
+              last_modified_header,
              ]
 serial_only = True

]]]

-- 
Johan

Re: Last-Modified HTTP header in GET responses

Posted by Branko Čibej <br...@apache.org>.
On 02.09.2019 16:20, Johan Corveleyn wrote:
> On Fri, Jan 15, 2016 at 1:58 PM Ivan Zhakov <iv...@visualsvn.com> wrote:
>> On 7 January 2016 at 10:34, Ivan Zhakov <iv...@visualsvn.com> wrote:
>>> On 6 January 2016 at 08:14, Greg Stein <gs...@gmail.com> wrote:
>>>> Personally, I'd be more interested in the effects on the network and its
>>>> caching ability. Do we really need to save CPU/IO on the server? Today's
>>>> servers seem more than capable, and are there really svn servers out in the
>>>> wild getting so crushed, that this is important? It seems that as long as
>>>> proxies/etc can properly cache the results, and (thus) avoid future touches
>>>> on the backend server, then we're good to go.
>>>>
>>>> If the patch doesn't affect the caching (which it sounds like "no"), then
>>>> just go with it. Sure, it is neat to look at ayscalls, but... why? I don't
>>>> understand the need to examine/profile. Educate me?
>>>>
>>> The patch should not affect HTTP caching for two reasons:
>>> 1. Browsers and proxies supports ETag and use it instead of
>>> Last-Modified header.
>>> 2. ETag and Last-Modified headers are used only for cache
>>> re-validation when max-age is expired. But Subversion sets max-age=1
>>> week for resources with specific revision in URL
>>> (http://server/!svn/rvr/1/path). max-age=0 is only used for public
>>> URLs without revision, i.e. http://server/path)
>>>
>>> As far I know proxy usage are limited to public servers with anonymous
>>> access, since caching of HTTP responses with Authorization is
>>> prohibited by RFC.
>>>
>>> Anyway I agree that trading bandwidth usage to save some CPU/IO on the
>>> server doesn't make sense, but Last-Modified case is the different:
>>> Subversion server wasting 10%+ of server resources to produce unused
>>> header.
>>>
>>> I don't have access to svn.apache.org server performance stats, but I
>>> suppose it's pretty busy server and Infra team would welcome any
>>> Subversion server performance improvements.
>>>
>> Committed in r1724790.
>>
>> --
>> Ivan Zhakov
> A bit late perhaps, but apparently this change (removing the
> Last-Modified header from GET responses) broke a specific use case at
> my company (we just upgraded our SVN server from 1.9 to 1.10, bringing
> along this particular change):
>
> - We use Apache Ivy (http://ant.apache.org/ivy/) for dependency
> management of our Java applications.
>
> - Third party jar files are kept in our svn repo under
> /trunk/ivyrepository (and branched / tagged in release branches, so we
> have completely reproducible builds, even if our third party jars or
> their dependency structures change on trunk).
>
> - We use Ivy's "URL Resolver" [1], which downloads the files with
> regular GET requests (and HEAD requests to check the up-to-dateness of
> the cache on the client). We effectively use SVN in this case as a
> "regular" file server (which coincidentally has branches and tags so
> we can resolve against the correct tree when making a build).
>
> This last part now fails, i.e. Ivy's URLResolver no longer detects
> that a file has changed. It used to compare its own "last-mod time of
> the file on disk in the cache" with the Last-Modified header, which
> works fine with all kinds of file servers, and worked with SVN < 1.10.
>
> I think it's unfortunate that we broke compatibility here (even if
> it's not usage by a normal svn client) for the sake of some relatively
> small performance / load gain on the server. If we could get the old
> behavior back with some Apache directive, that would have been fine,
> but there is no such option at the moment.
>
> Also: if the Last-Modified would have been removed only for the
> "internal GET urls" (like http://server/!svn/rvr/1/path), for
> optimizing checkout (as executed by normal svn clients), that would
> have been understandable. But why remove it for the "external GET
> urls" (http://svnserver/path) as well? Those have nothing to do with
> checkout load, those are only used by browsers or "tools using SVN as
> a glorified file server" :-).
>
> I am by no means an expert in HTTP standards, and various online
> sources give different recommendations for these headers (ETag,
> Last-Modified, ... request headers for conditional GETs, ...). But we
> found an old discussion thread on the "dev@rapidsvn.tigris.org"
> mailinglist from 15 years ago, discussing "a very basic idea: let
> mod_dav_svn set the Last-Modified HTTP header ..." [2]. Perhaps the
> feature dates from back then (indicating that it wasn't an accidental
> feature)?

I'm fairly sure that it wasn't accidental. The whole idea that you can
use a browser to look at a Subversion repository was intentional, adding
appropriate HTTP headers in responses was surely part of that.

That said, Ivan's original argument about caching not being affected by
this change is correct ... but it ignores your particular use case. The
error was in assuming it's OK to break compatibility on the protocol
level like this.

> Anyway, how about bringing this feature back in some form?
> - Revert r1724790?

This is clearly the simplest solution, but I have no idea what the
performance impact would be. From looking at the diff, my best guess is
that svn_fs_node_created_rev() and svn_fs_revision_prop() dominate.

> - or only for "external GET urls"?
> - or only if some Apache directive is set?
>
> Thoughts?

I would prefer not to add yet another configuration knob to the server.
I agree that versioned-resource URLs are only interesting for DAV-aware
clients, and those clients already know how to check for modifications
without looking at Last-Modified. That would imply that adding the
header for external URLs is the right solution.

-- Brane