You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Ruediger Pluem <rp...@apache.org> on 2010/06/04 07:27:23 UTC
Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c
On 04.06.2010 02:17, minfrin@apache.org wrote:
> Author: minfrin
> Date: Fri Jun 4 00:17:16 2010
> New Revision: 951222
>
> URL: http://svn.apache.org/viewvc?rev=951222&view=rev
> Log:
> mod_disk_cache: Decline the opportunity to cache if the response is
> a 206 Partial Content. This stops a reverse proxied partial response
> from becoming cached, and then being served in subsequent responses.
>
> Modified:
> httpd/httpd/trunk/CHANGES
> httpd/httpd/trunk/modules/cache/mod_disk_cache.c
>
> Modified: httpd/httpd/trunk/CHANGES
> URL: http://svn.apache.org/viewvc/httpd/httpd/trunk/CHANGES?rev=951222&r1=951221&r2=951222&view=diff
> ==============================================================================
> --- httpd/httpd/trunk/CHANGES [utf-8] (original)
> +++ httpd/httpd/trunk/CHANGES [utf-8] Fri Jun 4 00:17:16 2010
> @@ -28,6 +28,11 @@ Changes with Apache 2.3.6
> processing is completed, avoiding orphaned callback pointers.
> [Brett Gervasoni <brettg senseofsecurity.com>, Jeff Trawick]
>
> + *) mod_disk_cache: Decline the opportunity to cache if the response is
> + a 206 Partial Content. This stops a reverse proxied partial response
> + from becoming cached, and then being served in subsequent responses.
> + [Graham Leggett]
> +
> *) mod_deflate: avoid the risk of forwarding data before headers are set.
> PR 49369 [Matthew Steele <mdsteele google.com>]
>
>
> Modified: httpd/httpd/trunk/modules/cache/mod_disk_cache.c
> URL: http://svn.apache.org/viewvc/httpd/httpd/trunk/modules/cache/mod_disk_cache.c?rev=951222&r1=951221&r2=951222&view=diff
> ==============================================================================
> --- httpd/httpd/trunk/modules/cache/mod_disk_cache.c (original)
> +++ httpd/httpd/trunk/modules/cache/mod_disk_cache.c Fri Jun 4 00:17:16 2010
> @@ -333,6 +333,14 @@ static int create_entity(cache_handle_t
> return DECLINED;
> }
>
> + /* we don't support caching of range requests (yet) */
> + if (r->status == HTTP_PARTIAL_CONTENT) {
> + ap_log_error(APLOG_MARK, APLOG_DEBUG, 0, r->server,
> + "disk_cache: URL %s partial content response not cached",
> + key);
> + return DECLINED;
> + }
> +
Why is this needed? mod_cache itself does not allow partial content to be cached
and even if this does not work it should be fixed there and not in one of the
storage providers.
Regards
Rüdiger
Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c
Posted by Rainer Jung <ra...@kippdata.de>.
On 06.06.2010 13:35, Graham Leggett wrote:
> On 05 Jun 2010, at 6:51 PM, Ruediger Pluem wrote:
> The reason we don't see this problem very often is because we don't see
> a 206 very often at this point in the code. The filter that does content
> ranges sits after the CACHE_SAVE filter, and so under most
> circumstances, the response is still a complete 200 OK when it gets
> here, and the cache caches the full 200 OK response.
>
> If we reverse proxy to an application server of some kind, many
> applications don't consider the range request at all, and return a 200
> OK anyway. Again, the cache does "the right thing" at this point and
> caches the complete response, even if the browser has requested a range,
> the range filter turning the response into a range response after the
> cache has finished caching the full response.
>
> It's only when we reverse proxy to another server that understands and
> responds to range requests itself that we will see a 206 at this point,
> and this isn't a common configuration.
>
> Because the cache doesn't know any better, it passes the 206 back to
> mod_disk_cache. Because mod_disk_cache previously didn't know better
> either, it cached the 206 (this is now fixed). Subsequent ordinary
> requests were handed cached 206 partial responses in return even though
> the browser asked for a normal request, until an end user goes "huh",
> clicks "shift reload" and the cache fixes itself again.
I'm not very experienced with mod_cache, but that's exactly a situation
I noticed with an early 2.2 release, where a caching reverse proxy
cached parts of PDF documents served by a backend and served those parts
on later requests for the document. The Byte Range requests came from
Acrobat (loading individual pages). At that time I didn't investigate
closer, but at least for an old 2.2 release I can confirm this behaviour.
Regards,
Rainer
Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c
Posted by Graham Leggett <mi...@sharp.fm>.
On 07 Jun 2010, at 8:19 AM, Ruediger Pluem wrote:
> I still fail to see how to integrate this easily with the current
> mod_cache
> API, since you sure want to use all the generic freshness checks
> within
> mod_cache. And if a copy is seen as fresh it is simply served.
> But I guess this now goes OT and it would be better to discuss this
> in a
> separate thread as soon as someone offers an implementation to
> actually
> cache 206's. It is easier to discuss the exact code then.
> Does someone actually plan to do this?
I may need to do this soon. We have been promoting the use of inline
HTTP caching within our service oriented platform, and have been
stress testing mod_cache quite significantly in the process. We
deliver all sorts of content, including video file fragments.
One of the things that needs to be fixed is the current situation
where a cached entity has one header file and one data file. This
creates races in the cache. What really needs to happen is that there
is one header file only, and the data file has a unique name contained
in the header file, so that a second new data file can be safely moved
into place at leisure without affecting the original data file, and
the corresponding headers file renamed in atomically.
I predict this won't be a backportable change, so I'm getting the
backportable fixes done first before I attempt this.
Regards,
Graham
--
Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c
Posted by Ruediger Pluem <rp...@apache.org>.
On 06/06/2010 01:35 PM, Graham Leggett wrote:
> On 05 Jun 2010, at 6:51 PM, Ruediger Pluem wrote:
>
>>>> IMHO it does not (at least according to the comments and the code
>>>> looks like
>>>> to follow these):
>>>
>>> This is only present on trunk, and this needs to be fixed too. The
>>> problem we saw was in httpd v2.2.
>>
>> A similar code is part of 2.2. Hence I am still astonished why you
>> see this problem in 2.2.
>
> I think we're talking across purposes, the code in the most recent v2.2
> branch I am looking at looks like this with the comments stripped:
Ah, sorry I missed the inner if which was added in r727599. You are correct,
2.2 caches 206 with expire or cache-control set.
>
>> I am currently worried that a fresh cached copy of a 206 response cannot
>> be updated accordingly if a full response is requested. That puts the
>> burden
>> of this stuff on each provider instead of some general framework solution
>> in mod_cache. So I think mod_cache should provide a better framework for
>> 206 responses first before providers should implement it.
>
> I don't follow what mod_cache would need to do in this case with respect
> to a 206 that makes a cache implementation require special support.
>
> If a cache implementation wanted to, it could cache the 206, marking the
> range in the cache as appropriate. When a full request came along, the
> cache could potentially modify the request to ask for the missing range,
> and then decline the opportunity to serve from cache. The backend
> supplies the missing range, and the cache_save filter saves the missing
> range to complete the request, and inserts the existing cached data into
> the response as a simple file bucket, completing the response.
I still fail to see how to integrate this easily with the current mod_cache
API, since you sure want to use all the generic freshness checks within
mod_cache. And if a copy is seen as fresh it is simply served.
But I guess this now goes OT and it would be better to discuss this in a
separate thread as soon as someone offers an implementation to actually
cache 206's. It is easier to discuss the exact code then.
Does someone actually plan to do this?
> If we did find a way to support a generic 206 mechanism in mod_cache,
> that would be great (assuming it was necessary), but the opposite,
> banning any caching implementation from even attempting to cache a 206
> by enforcing a ban in mod_cache itself isn't ideal, if that makes sense?
Ok, I am fine with that now. I guess the discussion might continue as soon
as we see the first implementation for caching 206's.
Regards
Rüdiger
Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c
Posted by Graham Leggett <mi...@sharp.fm>.
On 05 Jun 2010, at 6:51 PM, Ruediger Pluem wrote:
>>> IMHO it does not (at least according to the comments and the code
>>> looks like
>>> to follow these):
>>
>> This is only present on trunk, and this needs to be fixed too. The
>> problem we saw was in httpd v2.2.
>
> A similar code is part of 2.2. Hence I am still astonished why you
> see this problem in 2.2.
I think we're talking across purposes, the code in the most recent
v2.2 branch I am looking at looks like this with the comments stripped:
if (r->status != HTTP_OK && r->status != HTTP_NON_AUTHORITATIVE
&& r->status != HTTP_MULTIPLE_CHOICES
&& r->status != HTTP_MOVED_PERMANENTLY
&& r->status != HTTP_NOT_MODIFIED) {
if (exps != NULL || cc_out != NULL) {
// noop
}
else {
reason = apr_psprintf(p, "Response status %d", r->status);
}
}
If the request survives to the end of this code with "reason" still
NULL, the response is still cacheable at this point.
A 206 always gets past the first if, and is then tested against the
second. If an Expires or a Cache-Control is present (and this is true
in many cases, and particularly in our case where the news people
found the bug), the "noop" path is followed, and we reach the end of
the if statements with reason still NULL. The cache implementation
then duly incorrectly caches the partial response.
The reason we don't see this problem very often is because we don't
see a 206 very often at this point in the code. The filter that does
content ranges sits after the CACHE_SAVE filter, and so under most
circumstances, the response is still a complete 200 OK when it gets
here, and the cache caches the full 200 OK response.
If we reverse proxy to an application server of some kind, many
applications don't consider the range request at all, and return a 200
OK anyway. Again, the cache does "the right thing" at this point and
caches the complete response, even if the browser has requested a
range, the range filter turning the response into a range response
after the cache has finished caching the full response.
It's only when we reverse proxy to another server that understands and
responds to range requests itself that we will see a 206 at this
point, and this isn't a common configuration.
Because the cache doesn't know any better, it passes the 206 back to
mod_disk_cache. Because mod_disk_cache previously didn't know better
either, it cached the 206 (this is now fixed). Subsequent ordinary
requests were handed cached 206 partial responses in return even
though the browser asked for a normal request, until an end user goes
"huh", clicks "shift reload" and the cache fixes itself again.
The fix to this problem comes in two parts, the first is for
mod_disk_cache to say "I explicitly decline to cache range requests, I
don't yet know how", the second part is to make the comments in
mod_cache match the code: The comments in mod_cache say that a 206 is
not cacheable, the code said it is, but only if CC or Expires is
present. You then fixed this on trunk by banning 206 caching entirely
for all mod_cache implementations, but this comes at the price of
removing the option for an implementation to choose to cache a range
request if it so wished.
> I am currently worried that a fresh cached copy of a 206 response
> cannot
> be updated accordingly if a full response is requested. That puts
> the burden
> of this stuff on each provider instead of some general framework
> solution
> in mod_cache. So I think mod_cache should provide a better framework
> for
> 206 responses first before providers should implement it.
I don't follow what mod_cache would need to do in this case with
respect to a 206 that makes a cache implementation require special
support.
If a cache implementation wanted to, it could cache the 206, marking
the range in the cache as appropriate. When a full request came along,
the cache could potentially modify the request to ask for the missing
range, and then decline the opportunity to serve from cache. The
backend supplies the missing range, and the cache_save filter saves
the missing range to complete the request, and inserts the existing
cached data into the response as a simple file bucket, completing the
response.
If we did find a way to support a generic 206 mechanism in mod_cache,
that would be great (assuming it was necessary), but the opposite,
banning any caching implementation from even attempting to cache a 206
by enforcing a ban in mod_cache itself isn't ideal, if that makes sense?
Regards,
Graham
--
Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c
Posted by Ruediger Pluem <rp...@apache.org>.
On 06/04/2010 05:09 PM, Graham Leggett wrote:
> On 04 Jun 2010, at 4:15 PM, Plüm, Rüdiger, VF-Group wrote:
>
>> IMHO it does not (at least according to the comments and the code
>> looks like
>> to follow these):
>
> This is only present on trunk, and this needs to be fixed too. The
> problem we saw was in httpd v2.2.
A similar code is part of 2.2. Hence I am still astonished why you
see this problem in 2.2.
>
>>> implementation (mod_disk_cache) to decide whether it wants to
>>> handle a
>>> 206 or not.
>>>
>>> mod_cache is not the place to fix this. It is entirely valid for a
>>
>> So you think that should be fixed in every single provider?
>
> Yes.
>
> Each provider should have the opportunity to cache a 206 if it so
> wishes, as RFC2616 allows it. Remember that providers don't have to be
> written by us.
>
> Any provider that chooses not to support a 206 should explicitly do so,
> not rely on mod_cache to enforce a blanket ban on supporting 206
> response caching.
>
>> I am currently not convinced that any provider could cache a 206 with
>> the current mod_cache infrastructure.
I am currently worried that a fresh cached copy of a 206 response cannot
be updated accordingly if a full response is requested. That puts the burden
of this stuff on each provider instead of some general framework solution
in mod_cache. So I think mod_cache should provide a better framework for
206 responses first before providers should implement it.
Regards
Rüdiger
Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c
Posted by Graham Leggett <mi...@sharp.fm>.
On 04 Jun 2010, at 4:15 PM, Plüm, Rüdiger, VF-Group wrote:
> IMHO it does not (at least according to the comments and the code
> looks like
> to follow these):
This is only present on trunk, and this needs to be fixed too. The
problem we saw was in httpd v2.2.
>> implementation (mod_disk_cache) to decide whether it wants to
>> handle a
>> 206 or not.
>>
>> mod_cache is not the place to fix this. It is entirely valid for a
>
> So you think that should be fixed in every single provider?
Yes.
Each provider should have the opportunity to cache a 206 if it so
wishes, as RFC2616 allows it. Remember that providers don't have to be
written by us.
Any provider that chooses not to support a 206 should explicitly do
so, not rely on mod_cache to enforce a blanket ban on supporting 206
response caching.
> I am currently not convinced that any provider could cache a 206 with
> the current mod_cache infrastructure.
There was nothing in the original design for mod_cache that stopped a
provider trying to cache a 206.
>> cache implementation to be given the opportunity to cache a 206, if
>
> Right, RFC2616 permits caching 206's.
Regards,
Graham
--
Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c
Posted by Graham Leggett <mi...@sharp.fm>.
On 04 Jun 2010, at 4:15 PM, Plüm, Rüdiger, VF-Group wrote:
>> cache implementation to be given the opportunity to cache a 206, if
>
> Right, RFC2616 permits caching 206's.
What I have in mind is this below. mod_cache supports the caching of a
206, but mod_disk_cache chooses not to. Once backported to 2.2, this
will fix the current problem where if a 206 arrives, and contains an
Expires or Cache-Control header, it gets cached.
Index: modules/cache/mod_cache.c
===================================================================
--- modules/cache/mod_cache.c (revision 951708)
+++ modules/cache/mod_cache.c (working copy)
@@ -726,16 +726,17 @@
* They are tested here one by one to be clear and unambiguous.
*/
if (r->status != HTTP_OK && r->status != HTTP_NON_AUTHORITATIVE
+ && r->status != HTTP_PARTIAL_CONTENT
&& r->status != HTTP_MULTIPLE_CHOICES
&& r->status != HTTP_MOVED_PERMANENTLY
&& r->status != HTTP_NOT_MODIFIED) {
/* RFC2616 13.4 we are allowed to cache 200, 203, 206, 300,
301 or 410
- * We don't cache 206, because we don't (yet) cache partial
responses.
+ * We allow the caching of 206, but a cache implementation
might choose
+ * to decline to cache a 206 if it doesn't know how to.
* We include 304 Not Modified here too as this is the
origin server
* telling us to serve the cached copy.
*/
- if ((exps != NULL || cc_out != NULL)
- && r->status != HTTP_PARTIAL_CONTENT) {
+ if (exps != NULL || cc_out != NULL) {
/* We are also allowed to cache any response given that
it has a
* valid Expires or Cache Control header. If we find a
either of
* those here, we pass request through the rest of the
tests. From
@@ -748,9 +749,6 @@
* include the following: an Expires header (section
14.21); a
* "max-age", "s-maxage", "must-revalidate", "proxy-
revalidate",
* "public" or "private" cache-control directive
(section 14.9).
- *
- * But do NOT store 206 responses in any case since we
- * don't (yet) cache partial responses.
*/
}
else {
Regards,
Graham
--
RE: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c
Posted by "Plüm, Rüdiger, VF-Group" <ru...@vodafone.com>.
> -----Original Message-----
> From: Graham Leggett
> Sent: Freitag, 4. Juni 2010 15:44
> To: dev@httpd.apache.org
> Subject: Re: svn commit: r951222 - in /httpd/httpd/trunk:
> CHANGES modules/cache/mod_disk_cache.c
>
> On 04 Jun 2010, at 7:27 AM, Ruediger Pluem wrote:
>
> > Why is this needed? mod_cache itself does not allow partial
> content
> > to be cached
> > and even if this does not work it should be fixed there and not in
> > one of the
> > storage providers.
>
> mod_cache does allow a 206 to be cached, it is up to the cache
IMHO it does not (at least according to the comments and the code looks like
to follow these):
/*
* what responses should we not cache?
*
* At this point we decide based on the response headers whether it
* is appropriate _NOT_ to cache the data from the server. There are
* a whole lot of conditions that prevent us from caching this data.
* They are tested here one by one to be clear and unambiguous.
*/
if (r->status != HTTP_OK && r->status != HTTP_NON_AUTHORITATIVE
&& r->status != HTTP_MULTIPLE_CHOICES
&& r->status != HTTP_MOVED_PERMANENTLY
&& r->status != HTTP_NOT_MODIFIED) {
/* RFC2616 13.4 we are allowed to cache 200, 203, 206, 300, 301 or 410
* We don't cache 206, because we don't (yet) cache partial responses.
* We include 304 Not Modified here too as this is the origin server
* telling us to serve the cached copy.
*/
if ((exps != NULL || cc_out != NULL)
&& r->status != HTTP_PARTIAL_CONTENT) {
/* We are also allowed to cache any response given that it has a
* valid Expires or Cache Control header. If we find a either of
* those here, we pass request through the rest of the tests. From
* the RFC:
*
* A response received with any other status code (e.g. status
* codes 302 and 307) MUST NOT be returned in a reply to a
* subsequent request unless there are cache-control directives or
* another header(s) that explicitly allow it. For example, these
* include the following: an Expires header (section 14.21); a
* "max-age", "s-maxage", "must-revalidate", "proxy-revalidate",
* "public" or "private" cache-control directive (section 14.9).
*
* But do NOT store 206 responses in any case since we
* don't (yet) cache partial responses.
*/
}
else {
reason = apr_psprintf(p, "Response status %d", r->status);
}
}
> implementation (mod_disk_cache) to decide whether it wants to
> handle a
> 206 or not.
>
> mod_cache is not the place to fix this. It is entirely valid for a
So you think that should be fixed in every single provider?
I am currently not convinced that any provider could cache a 206 with
the current mod_cache infrastructure.
> cache implementation to be given the opportunity to cache a 206, if
Right, RFC2616 permits caching 206's.
Regards
Rüdiger
Re: svn commit: r951222 - in /httpd/httpd/trunk: CHANGES modules/cache/mod_disk_cache.c
Posted by Graham Leggett <mi...@sharp.fm>.
On 04 Jun 2010, at 7:27 AM, Ruediger Pluem wrote:
> Why is this needed? mod_cache itself does not allow partial content
> to be cached
> and even if this does not work it should be fixed there and not in
> one of the
> storage providers.
mod_cache does allow a 206 to be cached, it is up to the cache
implementation (mod_disk_cache) to decide whether it wants to handle a
206 or not.
mod_cache is not the place to fix this. It is entirely valid for a
cache implementation to be given the opportunity to cache a 206, if
the cache implementation wants to. It is also entirely valid for a
cache implementation to choose not to cache a 206 if it doesn't yet
know how to, and that's what this fix does.
This patch is in response to buggy behaviour discovered in one of the
UK's news websites, where mod_cache on cluster A was reverse proxying
content served from origin cluster B. Origin cluster B was reacting to
range requests with a 206, and cluster A then cached the 206,
returning it to subsequent visitors. This patch fixes the problem.
Regards,
Graham
--