You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Sander Striker <st...@apache.org> on 2005/03/04 23:55:12 UTC

mod_cache

Hi,

I'm going over mod_cache and I'm wondering about some things:

--
modules/cache/mod_cache.c:271

    /* If the request has Cache-Control: no-store from RFC 2616, don't store
     * unless CacheStoreNoStore is active.
     */
    cc_in = apr_table_get(r->headers_in, "Cache-Control");
    if (r->no_cache ||
        (!conf->store_nostore &&
         ap_cache_liststr(NULL, cc_in, "no-store", NULL))) {
        ap_remove_output_filter(f);
        return ap_pass_brigade(f->next, in);
    }

What happens if the 'Cache-Control: no-store' header came in with a
304 Not Modified and the original request wasn't conditional?
If I read the spec correctly a 304 can carry a Cache-Control header,
if it has a different value since a previous 200 (or 304).

--
modules/cache/mod_cache.c:308

    /* have we already run the cachability check and set up the
     * cached file handle? 
     */
    if (cache->in_checked) {
        /* pass the brigades into the cache, then pass them
         * up the filter stack
         */

I haven't tracked cache->in_checked fully, so I wonder if it is
possible that this is set on a validating request?  That would
cause the cache not being updated, which is what I am trying to
track down FWIW.  This is not 'my' bug though, since I am seeing
the following line in the log:

  [debug] mod_disk_cache.c(616): disk_cache: Stored headers for URL xxx

However the cache files on disk don't change... I'm a bit puzzled why
not from looking at the code.

--
modules/cache/mod_cache.c:371

    /*
     * what responses should we not cache?
     *
     * At this point we decide based on the response headers whether it
     * is appropriate _NOT_ to cache the data from the server. There are
     * a whole lot of conditions that prevent us from caching this data.
     * They are tested here one by one to be clear and unambiguous. 
     */
    if (r->status != HTTP_OK && r->status != HTTP_NON_AUTHORITATIVE
        && r->status != HTTP_MULTIPLE_CHOICES
        && r->status != HTTP_MOVED_PERMANENTLY
        && r->status != HTTP_NOT_MODIFIED) {
        /* RFC2616 13.4 we are allowed to cache 200, 203, 206, 300, 301 or 410
         * We don't cache 206, because we don't (yet) cache partial responses.
         * We include 304 Not Modified here too as this is the origin server
         * telling us to serve the cached copy.
         */
        reason = apr_psprintf(p, "Response status %d", r->status);
    } 

AIUI, we can cache "302 Found" (HTTP_MOVED_TEMPORARILY) when it has an Expires
or Cache-Control indicating that the request can be cached.

--
modules/cache/mod_cache.c:685

    /* Did we just update the cached headers on a revalidated response?
     *
     * If so, we can now decide what to serve to the client:
     * - If the original request was conditional and is satisified, send 304.
     * - Otherwise, send the cached body.
    */
    if (rv == APR_SUCCESS && cache->stale_handle) {
        apr_bucket_brigade *bb;
        apr_bucket *bkt;

        bb = apr_brigade_create(r->pool, r->connection->bucket_alloc);

        /* Were we initially a conditional request? */
        if (ap_cache_request_is_conditional(cache->stale_headers)) {
            /* FIXME: We must ensure that the request meets conditions. */

            /* Set the status to be a 304. */
            r->status = HTTP_NOT_MODIFIED;

Is this as simple as clearing r->headers_in, overwriting with cache->stale_headers,
and the calling ap_meets_conditions()?



Sander

Re: mod_cache

Posted by Sander Striker <st...@apache.org>.
> Justin Erenkrantz wrote:
>> Sander Striker wrote:
> 
>> I completely agree.  So much even that I just committed it (r156306).
>> Why are we storing the header fd in the disk object anyways?  I haven't
>> gone through mod_disk_cache.c yet, but at least for store_headers() it
>> doesn't seem to make any sense.
> 
> It's not really the 'disk' object per se - it's part of mod_disk_cache's 
> private cache handle.  When we open the cache handle, we also open the 
> associated file descriptors in open_entity to ensure that we have 
> something valid cached.  We then use those open descriptors in 
> recall_headers and recall_body later on.

Right.
 
> Note that we're a little tricky and we partially read the header file in 
> open_entity to get some meta-data via file_cache_recall_mydata, but we 
> hold off parsing the full MIME headers from disk until recall_headers time.

Gotcha.

>> When it comes to store_headers(), shouldn't that be done using a temp
>> file and moving it into place atomically just like store_body?  Are
>> we relying on OS buffering and header size being small enough to pull
>> that off?  Or am I just being paranoid? :)
> 
> Yes, there's probably a race condition here.  However, store_headers is 
> only called once, so the window is relatively small compared to 
> store_body which can be called multiple times.  Feel free to have at 
> it.  ;-)

*grin*  Will do.


Sander


Re: mod_cache

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Sunday, March 6, 2005 1:54 PM +0100 Sander Striker <st...@apache.org> 
wrote:

> I completely agree.  So much even that I just committed it (r156306).
> Why are we storing the header fd in the disk object anyways?  I haven't
> gone through mod_disk_cache.c yet, but at least for store_headers() it
> doesn't seem to make any sense.

It's not really the 'disk' object per se - it's part of mod_disk_cache's 
private cache handle.  When we open the cache handle, we also open the 
associated file descriptors in open_entity to ensure that we have something 
valid cached.  We then use those open descriptors in recall_headers and 
recall_body later on.

Note that we're a little tricky and we partially read the header file in 
open_entity to get some meta-data via file_cache_recall_mydata, but we hold 
off parsing the full MIME headers from disk until recall_headers time.

> When it comes to store_headers(), shouldn't that be done using a temp
> file and moving it into place atomically just like store_body?  Are
> we relying on OS buffering and header size being small enough to pull
> that off?  Or am I just being paranoid? :)

Yes, there's probably a race condition here.  However, store_headers is only 
called once, so the window is relatively small compared to store_body which 
can be called multiple times.  Feel free to have at it.  ;-)  -- justin

Re: mod_cache

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Monday, March 7, 2005 7:47 AM +0100 Sander Striker <st...@apache.org> 
wrote:

> That's what I initially thought when I glanced over it.  Then I started
> wondering why headers are retrieved from h->req_hdrs, instead of
> r->headers_in.  I noticed we save the request headers of the request
> that got a resource into the cache.  Are we maybe mistakenly using
> those stored headers instead of the ones from the current request?
> I need to go over the code, but this is what popped into my head on
> first glance.

Doh.  req_hdrs shouldn't be used there at all.  Fixed in r156401.

Thanks!  -- justin

Re: mod_cache

Posted by Sander Striker <st...@apache.org>.
>Justin Erenkrantz wrote:
>> Sander Striker wrote:

[...]
>> Luckily for us, there is more work left even in mod_cache.  Right now,
>> whenever we hit a Cache-Control: no-cache in the request, the cache 
>> declines to handle the request, while it could be handling it (be it
>> with a required validation request to the backend).  That would be a
>> lot more efficient.  And within bounds of the spec.
> 
> I'm not sure what you mean.  Do you mean that if we get a Cache-Control: 
> no-cache, that we should attempt to treat it as mandatory revalidation 
> of the content?

Exactly.

> Interesting.  Right now, we just get out of the way.

Right.  We don't have to AFAUI, as long as we revalidate.
 
> One approach would be to remove the no-cache check and move it down into 
> ap_cache_check_freshness() and make our response always be stale in this 
> case.

*nod*

>> Furthermore, Cache-Control: max-age=0 or even max-age=X seems to be
>> completely ignored.  A response is given back with an Age header with
>> a larger value then what max-age was set to in the request.
> 
> This *should* be handled by ap_cache_check_freshness().  I'll admit that 
> I haven't spent a lot of time in there.  On a cursory look, it seems 
> that it should be handling this correctly.

That's what I initially thought when I glanced over it.  Then I started
wondering why headers are retrieved from h->req_hdrs, instead of
r->headers_in.  I noticed we save the request headers of the request
that got a resource into the cache.  Are we maybe mistakenly using
those stored headers instead of the ones from the current request?
I need to go over the code, but this is what popped into my head on
first glance.


Sander

Re: mod_cache

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Monday, March 7, 2005 2:03 AM +0100 Sander Striker <st...@apache.org> 
wrote:

> However, I do think you are right that ap_meets_conditions() doesn't do the
> right thing.  But that is in general, not just in this case.  It doesn't
> seem to take responses other than 2xx into account.  In those cases it
> shouldn't
> return a 304, yet it does.  We'll have to visit all the places where
> ap_meets_conditions() is called to make sure r->status is set, and check
> r->status in ap_meets_conditions() to fix this.

*nod*  This is what I was worried about.

> Luckily for us, there is more work left even in mod_cache.  Right now,
> whenever we hit a Cache-Control: no-cache in the request, the cache declines
> to handle the request, while it could be handling it (be it with a required
> validation request to the backend).  That would be a lot more efficient.
> And within bounds of the spec.

I'm not sure what you mean.  Do you mean that if we get a Cache-Control: 
no-cache, that we should attempt to treat it as mandatory revalidation of the 
content?  Interesting.  Right now, we just get out of the way.

One approach would be to remove the no-cache check and move it down into 
ap_cache_check_freshness() and make our response always be stale in this case.

> Furthermore, Cache-Control: max-age=0 or even max-age=X seems to be
> completely
> ignored.  A response is given back with an Age header with a larger value
> then
> what max-age was set to in the request.

This *should* be handled by ap_cache_check_freshness().  I'll admit that I 
haven't spent a lot of time in there.  On a cursory look, it seems that it 
should be handling this correctly.  -- justin


Re: mod_cache

Posted by Sander Striker <st...@apache.org>.
> Sander Striker wrote:
>> Justin Erenkrantz wrote:
>>> Sander Striker wrote:

>>> AIUI, we can cache "302 Found" (HTTP_MOVED_TEMPORARILY) when it has an
>>> Expires or Cache-Control indicating that the request can be cached.
>>
>>
>> Fair enough.  Feel free to add it, if you like.
> 
> 
> Well, I'm first going to check if we are doing the right thing with
> cached 301s (I saw some wonkiness earlier, but need to reverify).  If
> that is all good, I'll add 302 support.

I'm fairly sure this doesn't work too well yet.  See below.
 
> [...]
> 
>>>         /* Were we initially a conditional request? */
>>>         if (ap_cache_request_is_conditional(cache->stale_headers)) {
>>>             /* FIXME: We must ensure that the request meets 
>>> conditions. */
>>>
>>>             /* Set the status to be a 304. */
>>>             r->status = HTTP_NOT_MODIFIED;
>>>
>>> Is this as simple as clearing r->headers_in, overwriting with
>>> cache->stale_headers,
>>> and the calling ap_meets_conditions()?
>>
>> Yes, I *believe* so.  But, we should double-check that 
>> ap_meets_condition would do the 'right' thing.  I'm not 100% sure here.
> 
> I'm fairly sure it would.  Considering we have the final response headers,
> and the original request headers, this is just like a normal request.
> So ap_meets_condition should do the trick just fine when it comes to
> figuring out what to send back to the client.  I'll test, and if it works,
> I'll commit it.

It works (gave the correct results on cached, cached revalidating, not cached)
so I committed it (r156330).

However, I do think you are right that ap_meets_conditions() doesn't do the
right thing.  But that is in general, not just in this case.  It doesn't
seem to take responses other than 2xx into account.  In those cases it shouldn't
return a 304, yet it does.  We'll have to visit all the places where
ap_meets_conditions() is called to make sure r->status is set, and check
r->status in ap_meets_conditions() to fix this.

Luckily for us, there is more work left even in mod_cache.  Right now,
whenever we hit a Cache-Control: no-cache in the request, the cache declines
to handle the request, while it could be handling it (be it with a required
validation request to the backend).  That would be a lot more efficient.
And within bounds of the spec.

Furthermore, Cache-Control: max-age=0 or even max-age=X seems to be completely
ignored.  A response is given back with an Age header with a larger value then
what max-age was set to in the request.


Sander

Re: mod_cache

Posted by Sander Striker <st...@apache.org>.
Justin Erenkrantz wrote:
> --On Friday, March 4, 2005 11:55 PM +0100 Sander Striker 

[...]
>> What happens if the 'Cache-Control: no-store' header came in with a
>> 304 Not Modified and the original request wasn't conditional?
>> If I read the spec correctly a 304 can carry a Cache-Control header,
>> if it has a different value since a previous 200 (or 304).
> 
> 
> Hmm.  Good point.  What a goofy corner case though.  I won't lose much 
> sleep if we don't fix this.  Care to add a FIXME?  =)

Sure thing.  Done (r156304).

>> I haven't tracked cache->in_checked fully, so I wonder if it is
>> possible that this is set on a validating request?  That would
>> cause the cache not being updated, which is what I am trying to
>> track down FWIW. [...]

> Yes, it would be set.  in_checked is set after we know that we want to 
> save the response (i.e. all of the cacheability checks passed 
> successfully). However, note that if we are 'blocking' the response 
> (i.e. we revalidated the response succesfully with a 304), we set 
> block_response which is checked right before in_checked.

Ah, the coin is dropping here.  Since it is a filter, cache save can
be called multiple times in succession, with different brigades.  Makes
total sense.
 
[...]
>> AIUI, we can cache "302 Found" (HTTP_MOVED_TEMPORARILY) when it has an
>> Expires or Cache-Control indicating that the request can be cached.
> 
> Fair enough.  Feel free to add it, if you like.

Well, I'm first going to check if we are doing the right thing with
cached 301s (I saw some wonkiness earlier, but need to reverify).  If
that is all good, I'll add 302 support.

[...]
>>         /* Were we initially a conditional request? */
>>         if (ap_cache_request_is_conditional(cache->stale_headers)) {
>>             /* FIXME: We must ensure that the request meets conditions. */
>>
>>             /* Set the status to be a 304. */
>>             r->status = HTTP_NOT_MODIFIED;
>>
>> Is this as simple as clearing r->headers_in, overwriting with
>> cache->stale_headers,
>> and the calling ap_meets_conditions()?
> 
> Yes, I *believe* so.  But, we should double-check that 
> ap_meets_condition would do the 'right' thing.  I'm not 100% sure here.

I'm fairly sure it would.  Considering we have the final response headers,
and the original request headers, this is just like a normal request.
So ap_meets_condition should do the trick just fine when it comes to
figuring out what to send back to the client.  I'll test, and if it works,
I'll commit it.
 
> Okay, back to your real bug:

[...]

> Please correct me if I'm wrong.  My understanding is that you have an 
> expired cache entry which needs to be revalidated and the updated 
> headers aren't updating properly.

Correct.
 
> Now that I read the code I'm betting you are hitting that else block 
> with 'XXX log message' in mod_disk_cache's store_headers.

What is that doing there, second guessing its caller (mod_cache)?!

> The sequence that I'm envisioning is:
> 
> - We have a stale cached entity/handle.
> - We send an If-Modified-Since/etc request
> - We get the 304 Not modified on revalidation with an updated Expires 
> header.
> - At mod_cache.c:533, we restore the cached handle.
> - Around mod_cache.c:658, we merge in the updated response headers
> - Then at mod_cache.c:683, we call the provider to store the headers.
> - Then, we hit store_headers in mod_disk_cache.
> - Because mod_disk_cache *had* a stale response, it would have loaded 
> the stale headers into its handle.
> - Hence, the check at mod_disk_cache.c:532 will fail because hfd is 
> 'valid.'
> - We never update the cached headers with the new Expires header
> 
> Can you confirm this sequence?

Yes, I can.  Sheesh.

>  I'm thinking we shouldn't even check for 
> the !hfd case anyway.  That check has probably been there forever and 
> was likely to be completely bogus from the start.  What do you think?  

I completely agree.  So much even that I just committed it (r156306).
Why are we storing the header fd in the disk object anyways?  I haven't
gone through mod_disk_cache.c yet, but at least for store_headers() it
doesn't seem to make any sense.

When it comes to store_headers(), shouldn't that be done using a temp
file and moving it into place atomically just like store_body?  Are
we relying on OS buffering and header size being small enough to pull
that off?  Or am I just being paranoid? :)

Sander

Re: mod_cache

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Friday, March 4, 2005 11:55 PM +0100 Sander Striker <st...@apache.org> 
wrote:

> --
> modules/cache/mod_cache.c:271
>
>     /* If the request has Cache-Control: no-store from RFC 2616, don't store
>      * unless CacheStoreNoStore is active.
>      */
>     cc_in = apr_table_get(r->headers_in, "Cache-Control");
>     if (r->no_cache ||
>         (!conf->store_nostore &&
>          ap_cache_liststr(NULL, cc_in, "no-store", NULL))) {
>         ap_remove_output_filter(f);
>         return ap_pass_brigade(f->next, in);
>     }
>
> What happens if the 'Cache-Control: no-store' header came in with a
> 304 Not Modified and the original request wasn't conditional?
> If I read the spec correctly a 304 can carry a Cache-Control header,
> if it has a different value since a previous 200 (or 304).

Hmm.  Good point.  What a goofy corner case though.  I won't lose much sleep 
if we don't fix this.  Care to add a FIXME?  =)

> --
> modules/cache/mod_cache.c:308
>
>     /* have we already run the cachability check and set up the
>      * cached file handle?      */
>     if (cache->in_checked) {
>         /* pass the brigades into the cache, then pass them
>          * up the filter stack
>          */
>
> I haven't tracked cache->in_checked fully, so I wonder if it is
> possible that this is set on a validating request?  That would
> cause the cache not being updated, which is what I am trying to
> track down FWIW.  This is not 'my' bug though, since I am seeing
> the following line in the log:
>
>   [debug] mod_disk_cache.c(616): disk_cache: Stored headers for URL xxx
>
> However the cache files on disk don't change... I'm a bit puzzled why
> not from looking at the code.

Yes, it would be set.  in_checked is set after we know that we want to save 
the response (i.e. all of the cacheability checks passed successfully). 
However, note that if we are 'blocking' the response (i.e. we revalidated the 
response succesfully with a 304), we set block_response which is checked right 
before in_checked.

(I'll come back to your mod_disk_cache problem at the end.)

> --
> modules/cache/mod_cache.c:371
>
>     /*
>      * what responses should we not cache?
>      *
>      * At this point we decide based on the response headers whether it
>      * is appropriate _NOT_ to cache the data from the server. There are
>      * a whole lot of conditions that prevent us from caching this data.
>      * They are tested here one by one to be clear and unambiguous.      */
>     if (r->status != HTTP_OK && r->status != HTTP_NON_AUTHORITATIVE
>         && r->status != HTTP_MULTIPLE_CHOICES
>         && r->status != HTTP_MOVED_PERMANENTLY
>         && r->status != HTTP_NOT_MODIFIED) {
>         /* RFC2616 13.4 we are allowed to cache 200, 203, 206, 300, 301 or
> 410
>          * We don't cache 206, because we don't (yet) cache partial
> responses.
>          * We include 304 Not Modified here too as this is the origin server
>          * telling us to serve the cached copy.
>          */
>         reason = apr_psprintf(p, "Response status %d", r->status);
>     }
> AIUI, we can cache "302 Found" (HTTP_MOVED_TEMPORARILY) when it has an
> Expires
> or Cache-Control indicating that the request can be cached.

Fair enough.  Feel free to add it, if you like.

> --
> modules/cache/mod_cache.c:685
>
>     /* Did we just update the cached headers on a revalidated response?
>      *
>      * If so, we can now decide what to serve to the client:
>      * - If the original request was conditional and is satisified, send 304.
>      * - Otherwise, send the cached body.
>     */
>     if (rv == APR_SUCCESS && cache->stale_handle) {
>         apr_bucket_brigade *bb;
>         apr_bucket *bkt;
>
>         bb = apr_brigade_create(r->pool, r->connection->bucket_alloc);
>
>         /* Were we initially a conditional request? */
>         if (ap_cache_request_is_conditional(cache->stale_headers)) {
>             /* FIXME: We must ensure that the request meets conditions. */
>
>             /* Set the status to be a 304. */
>             r->status = HTTP_NOT_MODIFIED;
>
> Is this as simple as clearing r->headers_in, overwriting with
> cache->stale_headers,
> and the calling ap_meets_conditions()?

Yes, I *believe* so.  But, we should double-check that ap_meets_condition 
would do the 'right' thing.  I'm not 100% sure here.

Okay, back to your real bug:

> possible that this is set on a validating request?  That would
> cause the cache not being updated, which is what I am trying to
> track down FWIW.  This is not 'my' bug though, since I am seeing
> the following line in the log:
>
>   [debug] mod_disk_cache.c(616): disk_cache: Stored headers for URL xxx
>
> However the cache files on disk don't change... I'm a bit puzzled why
> not from looking at the code.

Please correct me if I'm wrong.  My understanding is that you have an expired 
cache entry which needs to be revalidated and the updated headers aren't 
updating properly.

Now that I read the code I'm betting you are hitting that else block with 'XXX 
log message' in mod_disk_cache's store_headers.  The sequence that I'm 
envisioning is:

- We have a stale cached entity/handle.
- We send an If-Modified-Since/etc request
- We get the 304 Not modified on revalidation with an updated Expires header.
- At mod_cache.c:533, we restore the cached handle.
- Around mod_cache.c:658, we merge in the updated response headers
- Then at mod_cache.c:683, we call the provider to store the headers.
- Then, we hit store_headers in mod_disk_cache.
- Because mod_disk_cache *had* a stale response, it would have loaded the 
stale headers into its handle.
- Hence, the check at mod_disk_cache.c:532 will fail because hfd is 'valid.'
- We never update the cached headers with the new Expires header

Can you confirm this sequence?  I'm thinking we shouldn't even check for the 
!hfd case anyway.  That check has probably been there forever and was likely 
to be completely bogus from the start.  What do you think?  -- justin