You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Ruediger Pluem <rp...@apache.org> on 2010/06/04 07:27:23 UTC

Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c

On 04.06.2010 02:17, minfrin@apache.org wrote:
> Author: minfrin
> Date: Fri Jun  4 00:17:16 2010
> New Revision: 951222
> 
> URL: http://svn.apache.org/viewvc?rev=951222&view=rev
> Log:
> mod_disk_cache: Decline the opportunity to cache if the response is
> a 206 Partial Content. This stops a reverse proxied partial response
> from becoming cached, and then being served in subsequent responses.
> 
> Modified:
>     httpd/httpd/trunk/CHANGES
>     httpd/httpd/trunk/modules/cache/mod_disk_cache.c
> 
> Modified: httpd/httpd/trunk/CHANGES
> URL: http://svn.apache.org/viewvc/httpd/httpd/trunk/CHANGES?rev=951222&r1=951221&r2=951222&view=diff
> ==============================================================================
> --- httpd/httpd/trunk/CHANGES [utf-8] (original)
> +++ httpd/httpd/trunk/CHANGES [utf-8] Fri Jun  4 00:17:16 2010
> @@ -28,6 +28,11 @@ Changes with Apache 2.3.6
>       processing is completed, avoiding orphaned callback pointers.
>       [Brett Gervasoni <brettg senseofsecurity.com>, Jeff Trawick]
>  
> +  *) mod_disk_cache: Decline the opportunity to cache if the response is
> +     a 206 Partial Content. This stops a reverse proxied partial response
> +     from becoming cached, and then being served in subsequent responses.
> +     [Graham Leggett]
> +
>    *) mod_deflate: avoid the risk of forwarding data before headers are set.
>       PR 49369 [Matthew Steele <mdsteele google.com>]
>  
> 
> Modified: httpd/httpd/trunk/modules/cache/mod_disk_cache.c
> URL: http://svn.apache.org/viewvc/httpd/httpd/trunk/modules/cache/mod_disk_cache.c?rev=951222&r1=951221&r2=951222&view=diff
> ==============================================================================
> --- httpd/httpd/trunk/modules/cache/mod_disk_cache.c (original)
> +++ httpd/httpd/trunk/modules/cache/mod_disk_cache.c Fri Jun  4 00:17:16 2010
> @@ -333,6 +333,14 @@ static int create_entity(cache_handle_t 
>          return DECLINED;
>      }
>  
> +    /* we don't support caching of range requests (yet) */
> +    if (r->status == HTTP_PARTIAL_CONTENT) {
> +        ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r->server,
> +                     "disk_cache: URL %s partial content response not cached",
> +                     key);
> +        return DECLINED;
> +    }
> +

Why is this needed? mod_cache itself does not allow partial content to be cached
and even if this does not work it should be fixed there and not in one of the
storage providers.

Regards

Rüdiger


Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c

Posted by Rainer Jung <ra...@kippdata.de>.
On 06.06.2010 13:35, Graham Leggett wrote:
> On 05 Jun 2010, at 6:51 PM, Ruediger Pluem wrote:
> The reason we don't see this problem very often is because we don't see
> a 206 very often at this point in the code. The filter that does content
> ranges sits after the CACHE_SAVE filter, and so under most
> circumstances, the response is still a complete 200 OK when it gets
> here, and the cache caches the full 200 OK response.
>
> If we reverse proxy to an application server of some kind, many
> applications don't consider the range request at all, and return a 200
> OK anyway. Again, the cache does "the right thing" at this point and
> caches the complete response, even if the browser has requested a range,
> the range filter turning the response into a range response after the
> cache has finished caching the full response.
>
> It's only when we reverse proxy to another server that understands and
> responds to range requests itself that we will see a 206 at this point,
> and this isn't a common configuration.
>
> Because the cache doesn't know any better, it passes the 206 back to
> mod_disk_cache. Because mod_disk_cache previously didn't know better
> either, it cached the 206 (this is now fixed). Subsequent ordinary
> requests were handed cached 206 partial responses in return even though
> the browser asked for a normal request, until an end user goes "huh",
> clicks "shift reload" and the cache fixes itself again.

I'm not very experienced with mod_cache, but that's exactly a situation 
I noticed with an early 2.2 release, where a caching reverse proxy 
cached parts of PDF documents served by a backend and served those parts 
on later requests for the document. The Byte Range requests came from 
Acrobat (loading individual pages). At that time I didn't investigate 
closer, but at least for an old 2.2 release I can confirm this behaviour.

Regards,

Rainer

Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c

Posted by Graham Leggett <mi...@sharp.fm>.
On 07 Jun 2010, at 8:19 AM, Ruediger Pluem wrote:

> I still fail to see how to integrate this easily with the current  
> mod_cache
> API, since you sure want to use all the generic freshness checks  
> within
> mod_cache. And if a copy is seen as fresh it is simply served.
> But I guess this now goes OT and it would be better to discuss this  
> in a
> separate thread as soon as someone offers an implementation to  
> actually
> cache 206's. It is easier to discuss the exact code then.
> Does someone actually plan to do this?

I may need to do this soon. We have been promoting the use of inline  
HTTP caching within our service oriented platform, and have been  
stress testing mod_cache quite significantly in the process. We  
deliver all sorts of content, including video file fragments.

One of the things that needs to be fixed is the current situation  
where a cached entity has one header file and one data file. This  
creates races in the cache. What really needs to happen is that there  
is one header file only, and the data file has a unique name contained  
in the header file, so that a second new data file can be safely moved  
into place at leisure without affecting the original data file, and  
the corresponding headers file renamed in atomically.

I predict this won't be a backportable change, so I'm getting the  
backportable fixes done first before I attempt this.

Regards,
Graham
--


Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c

Posted by Ruediger Pluem <rp...@apache.org>.

On 06/06/2010 01:35 PM, Graham Leggett wrote:
> On 05 Jun 2010, at 6:51 PM, Ruediger Pluem wrote:
> 
>>>> IMHO it does not (at least according to the comments and the code
>>>> looks like
>>>> to follow these):
>>>
>>> This is only present on trunk, and this needs to be fixed too. The
>>> problem we saw was in httpd v2.2.
>>
>> A similar code is part of 2.2. Hence I am still astonished why you
>> see this problem in 2.2.
> 
> I think we're talking across purposes, the code in the most recent v2.2
> branch I am looking at looks like this with the comments stripped:

Ah, sorry I missed the inner if which was added in r727599. You are correct,
2.2 caches 206 with expire or cache-control set.

> 
>> I am currently worried that a fresh cached copy of a 206 response cannot
>> be updated accordingly if a full response is requested. That puts the
>> burden
>> of this stuff on each provider instead of some general framework solution
>> in mod_cache. So I think mod_cache should provide a better framework for
>> 206 responses first before providers should implement it.
> 
> I don't follow what mod_cache would need to do in this case with respect
> to a 206 that makes a cache implementation require special support.
> 
> If a cache implementation wanted to, it could cache the 206, marking the
> range in the cache as appropriate. When a full request came along, the
> cache could potentially modify the request to ask for the missing range,
> and then decline the opportunity to serve from cache. The backend
> supplies the missing range, and the cache_save filter saves the missing
> range to complete the request, and inserts the existing cached data into
> the response as a simple file bucket, completing the response.

I still fail to see how to integrate this easily with the current mod_cache
API, since you sure want to use all the generic freshness checks within
mod_cache. And if a copy is seen as fresh it is simply served.
But I guess this now goes OT and it would be better to discuss this in a
separate thread as soon as someone offers an implementation to actually
cache 206's. It is easier to discuss the exact code then.
Does someone actually plan to do this?

> If we did find a way to support a generic 206 mechanism in mod_cache,
> that would be great (assuming it was necessary), but the opposite,
> banning any caching implementation from even attempting to cache a 206
> by enforcing a ban in mod_cache itself isn't ideal, if that makes sense?

Ok, I am fine with that now. I guess the discussion might continue as soon
as we see the first implementation for caching 206's.

Regards

Rüdiger

Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c

Posted by Graham Leggett <mi...@sharp.fm>.
On 05 Jun 2010, at 6:51 PM, Ruediger Pluem wrote:

>>> IMHO it does not (at least according to the comments and the code
>>> looks like
>>> to follow these):
>>
>> This is only present on trunk, and this needs to be fixed too. The
>> problem we saw was in httpd v2.2.
>
> A similar code is part of 2.2. Hence I am still astonished why you
> see this problem in 2.2.

I think we're talking across purposes, the code in the most recent  
v2.2 branch I am looking at looks like this with the comments stripped:

     if (r->status != HTTP_OK && r->status != HTTP_NON_AUTHORITATIVE
         && r->status != HTTP_MULTIPLE_CHOICES
         && r->status != HTTP_MOVED_PERMANENTLY
         && r->status != HTTP_NOT_MODIFIED) {
         if (exps != NULL || cc_out != NULL) {
// noop
         }
         else {
             reason = apr_psprintf(p, "Response status %d", r->status);
         }
     }

If the request survives to the end of this code with "reason" still  
NULL, the response is still cacheable at this point.

A 206 always gets past the first if, and is then tested against the  
second. If an Expires or a Cache-Control is present (and this is true  
in many cases, and particularly in our case where the news people  
found the bug), the "noop" path is followed, and we reach the end of  
the if statements with reason still NULL. The cache implementation  
then duly incorrectly caches the partial response.

The reason we don't see this problem very often is because we don't  
see a 206 very often at this point in the code. The filter that does  
content ranges sits after the CACHE_SAVE filter, and so under most  
circumstances, the response is still a complete 200 OK when it gets  
here, and the cache caches the full 200 OK response.

If we reverse proxy to an application server of some kind, many  
applications don't consider the range request at all, and return a 200  
OK anyway. Again, the cache does "the right thing" at this point and  
caches the complete response, even if the browser has requested a  
range, the range filter turning the response into a range response  
after the cache has finished caching the full response.

It's only when we reverse proxy to another server that understands and  
responds to range requests itself that we will see a 206 at this  
point, and this isn't a common configuration.

Because the cache doesn't know any better, it passes the 206 back to  
mod_disk_cache. Because mod_disk_cache previously didn't know better  
either, it cached the 206 (this is now fixed). Subsequent ordinary  
requests were handed cached 206 partial responses in return even  
though the browser asked for a normal request, until an end user goes  
"huh", clicks "shift reload" and the cache fixes itself again.

The fix to this problem comes in two parts, the first is for  
mod_disk_cache to say "I explicitly decline to cache range requests, I  
don't yet know how", the second part is to make the comments in  
mod_cache match the code: The comments in mod_cache say that a 206 is  
not cacheable, the code said it is, but only if CC or Expires is  
present. You then fixed this on trunk by banning 206 caching entirely  
for all mod_cache implementations, but this comes at the price of  
removing the option for an implementation to choose to cache a range  
request if it so wished.

> I am currently worried that a fresh cached copy of a 206 response  
> cannot
> be updated accordingly if a full response is requested. That puts  
> the burden
> of this stuff on each provider instead of some general framework  
> solution
> in mod_cache. So I think mod_cache should provide a better framework  
> for
> 206 responses first before providers should implement it.

I don't follow what mod_cache would need to do in this case with  
respect to a 206 that makes a cache implementation require special  
support.

If a cache implementation wanted to, it could cache the 206, marking  
the range in the cache as appropriate. When a full request came along,  
the cache could potentially modify the request to ask for the missing  
range, and then decline the opportunity to serve from cache. The  
backend supplies the missing range, and the cache_save filter saves  
the missing range to complete the request, and inserts the existing  
cached data into the response as a simple file bucket, completing the  
response.

If we did find a way to support a generic 206 mechanism in mod_cache,  
that would be great (assuming it was necessary), but the opposite,  
banning any caching implementation from even attempting to cache a 206  
by enforcing a ban in mod_cache itself isn't ideal, if that makes sense?

Regards,
Graham
--


Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c

Posted by Ruediger Pluem <rp...@apache.org>.

On 06/04/2010 05:09 PM, Graham Leggett wrote:
> On 04 Jun 2010, at 4:15 PM, Plüm, Rüdiger, VF-Group wrote:
> 
>> IMHO it does not (at least according to the comments and the code
>> looks like
>> to follow these):
> 
> This is only present on trunk, and this needs to be fixed too. The
> problem we saw was in httpd v2.2.

A similar code is part of 2.2. Hence I am still astonished why you
see this problem in 2.2.

> 
>>> implementation (mod_disk_cache) to decide whether it wants to
>>> handle a
>>> 206 or not.
>>>
>>> mod_cache is not the place to fix this. It is entirely valid for a
>>
>> So you think that should be fixed in every single provider?
> 
> Yes.
> 
> Each provider should have the opportunity to cache a 206 if it so
> wishes, as RFC2616 allows it. Remember that providers don't have to be
> written by us.
> 
> Any provider that chooses not to support a 206 should explicitly do so,
> not rely on mod_cache to enforce a blanket ban on supporting 206
> response caching.
> 
>> I am currently not convinced that any provider could cache a 206 with
>> the current mod_cache infrastructure.

I am currently worried that a fresh cached copy of a 206 response cannot
be updated accordingly if a full response is requested. That puts the burden
of this stuff on each provider instead of some general framework solution
in mod_cache. So I think mod_cache should provide a better framework for
206 responses first before providers should implement it.

Regards

Rüdiger

Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c

Posted by Graham Leggett <mi...@sharp.fm>.
On 04 Jun 2010, at 4:15 PM, Plüm, Rüdiger, VF-Group wrote:

> IMHO it does not (at least according to the comments and the code  
> looks like
> to follow these):

This is only present on trunk, and this needs to be fixed too. The  
problem we saw was in httpd v2.2.

>> implementation (mod_disk_cache) to decide whether it wants to
>> handle a
>> 206 or not.
>>
>> mod_cache is not the place to fix this. It is entirely valid for a
>
> So you think that should be fixed in every single provider?

Yes.

Each provider should have the opportunity to cache a 206 if it so  
wishes, as RFC2616 allows it. Remember that providers don't have to be  
written by us.

Any provider that chooses not to support a 206 should explicitly do  
so, not rely on mod_cache to enforce a blanket ban on supporting 206  
response caching.

> I am currently not convinced that any provider could cache a 206 with
> the current mod_cache infrastructure.

There was nothing in the original design for mod_cache that stopped a  
provider trying to cache a 206.

>> cache implementation to be given the opportunity to cache a 206, if
>
> Right, RFC2616 permits caching 206's.

Regards,
Graham
--


Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c

Posted by Graham Leggett <mi...@sharp.fm>.
On 04 Jun 2010, at 4:15 PM, Plüm, Rüdiger, VF-Group wrote:

>> cache implementation to be given the opportunity to cache a 206, if
>
> Right, RFC2616 permits caching 206's.

What I have in mind is this below. mod_cache supports the caching of a  
206, but mod_disk_cache chooses not to. Once backported to 2.2, this  
will fix the current problem where if a 206 arrives, and contains an  
Expires or Cache-Control header, it gets cached.

Index: modules/cache/mod_cache.c
===================================================================
--- modules/cache/mod_cache.c	(revision 951708)
+++ modules/cache/mod_cache.c	(working copy)
@@ -726,16 +726,17 @@
       * They are tested here one by one to be clear and unambiguous.
       */
      if (r->status != HTTP_OK && r->status != HTTP_NON_AUTHORITATIVE
+        && r->status != HTTP_PARTIAL_CONTENT
          && r->status != HTTP_MULTIPLE_CHOICES
          && r->status != HTTP_MOVED_PERMANENTLY
          && r->status != HTTP_NOT_MODIFIED) {
          /* RFC2616 13.4 we are allowed to cache 200, 203, 206, 300,  
301 or 410
-         * We don't cache 206, because we don't (yet) cache partial  
responses.
+         * We allow the caching of 206, but a cache implementation  
might choose
+         * to decline to cache a 206 if it doesn't know how to.
           * We include 304 Not Modified here too as this is the  
origin server
           * telling us to serve the cached copy.
           */
-        if ((exps != NULL || cc_out != NULL)
-            && r->status != HTTP_PARTIAL_CONTENT) {
+        if (exps != NULL || cc_out != NULL) {
              /* We are also allowed to cache any response given that  
it has a
               * valid Expires or Cache Control header. If we find a  
either of
               * those here,  we pass request through the rest of the  
tests. From
@@ -748,9 +749,6 @@
               * include the following: an Expires header (section  
14.21); a
               * "max-age", "s-maxage",  "must-revalidate", "proxy- 
revalidate",
               * "public" or "private" cache-control directive  
(section 14.9).
-             *
-             * But do NOT store 206 responses in any case since we
-             * don't (yet) cache partial responses.
               */
          }
          else {

Regards,
Graham
--


RE: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c

Posted by "Plüm, Rüdiger, VF-Group" <ru...@vodafone.com>.
 

> -----Original Message-----
> From: Graham Leggett 
> Sent: Freitag, 4. Juni 2010 15:44
> To: dev@httpd.apache.org
> Subject: Re: svn commit: r951222 - in /httpd/httpd/trunk: 
> CHANGES modules/cache/mod_disk_cache.c
> 
> On 04 Jun 2010, at 7:27 AM, Ruediger Pluem wrote:
> 
> > Why is this needed? mod_cache itself does not allow partial 
> content  
> > to be cached
> > and even if this does not work it should be fixed there and not in  
> > one of the
> > storage providers.
> 
> mod_cache does allow a 206 to be cached, it is up to the cache

IMHO it does not (at least according to the comments and the code looks like
to follow these):

    /*
     * what responses should we not cache?
     *
     * At this point we decide based on the response headers whether it
     * is appropriate _NOT_ to cache the data from the server. There are
     * a whole lot of conditions that prevent us from caching this data.
     * They are tested here one by one to be clear and unambiguous.
     */
    if (r->status != HTTP_OK && r->status != HTTP_NON_AUTHORITATIVE
        && r->status != HTTP_MULTIPLE_CHOICES
        && r->status != HTTP_MOVED_PERMANENTLY
        && r->status != HTTP_NOT_MODIFIED) {
        /* RFC2616 13.4 we are allowed to cache 200, 203, 206, 300, 301 or 410
         * We don't cache 206, because we don't (yet) cache partial responses.
         * We include 304 Not Modified here too as this is the origin server
         * telling us to serve the cached copy.
         */
        if ((exps != NULL || cc_out != NULL)
            && r->status != HTTP_PARTIAL_CONTENT) {
            /* We are also allowed to cache any response given that it has a
             * valid Expires or Cache Control header. If we find a either of
             * those here,  we pass request through the rest of the tests. From
             * the RFC:
             *
             * A response received with any other status code (e.g. status
             * codes 302 and 307) MUST NOT be returned in a reply to a
             * subsequent request unless there are cache-control directives or
             * another header(s) that explicitly allow it. For example, these
             * include the following: an Expires header (section 14.21); a
             * "max-age", "s-maxage",  "must-revalidate", "proxy-revalidate",
             * "public" or "private" cache-control directive (section 14.9).
             *
             * But do NOT store 206 responses in any case since we
             * don't (yet) cache partial responses.
             */
        }
        else {
            reason = apr_psprintf(p, "Response status %d", r->status);
        }
    }
  
> implementation (mod_disk_cache) to decide whether it wants to 
> handle a  
> 206 or not.
> 
> mod_cache is not the place to fix this. It is entirely valid for a  

So you think that should be fixed in every single provider?
I am currently not convinced that any provider could cache a 206 with
the current mod_cache infrastructure.

> cache implementation to be given the opportunity to cache a 206, if  

Right, RFC2616 permits caching 206's.


Regards

Rüdiger


 

Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c

Posted by Graham Leggett <mi...@sharp.fm>.
On 04 Jun 2010, at 7:27 AM, Ruediger Pluem wrote:

> Why is this needed? mod_cache itself does not allow partial content  
> to be cached
> and even if this does not work it should be fixed there and not in  
> one of the
> storage providers.

mod_cache does allow a 206 to be cached, it is up to the cache  
implementation (mod_disk_cache) to decide whether it wants to handle a  
206 or not.

mod_cache is not the place to fix this. It is entirely valid for a  
cache implementation to be given the opportunity to cache a 206, if  
the cache implementation wants to. It is also entirely valid for a  
cache implementation to choose not to cache a 206 if it doesn't yet  
know how to, and that's what this fix does.

This patch is in response to buggy behaviour discovered in one of the  
UK's news websites, where mod_cache on cluster A was reverse proxying  
content served from origin cluster B. Origin cluster B was reacting to  
range requests with a 206, and cluster A then cached the 206,  
returning it to subsequent visitors. This patch fixes the problem.

Regards,
Graham
--