You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Paul Querna <ch...@force-elite.com> on 2005/05/12 10:29:52 UTC

[PATCH] mod_cache, don't always run as a quick handler.

Okay, because of the quirky behavior of a 'sometimes' cached page, this
one had me going in circles for a little while.

What this patch does, is add a new command to mod_cache,
'CacheRunAfterOthers_RenameThisCmd'[1]

The effect this has is to run mod_cache as a normal handler, instead of
a Quick Handler.  This means all the other Translate Name, Map to
Storage, and Fixup hooks are ran.  These include mod_rewrite and
mod_alias, among others.

What this allows, is configurations like the following:
CacheEnable disk /
CacheRoot /tmp/cacheroot
RewriteEngine On
CacheIgnoreNoLastMod on
CacheDefaultExpire 90
CacheMaxExpire 90
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*\sFirefox/1\.0\.[1234]\b
RewriteRule .* http://www.mozilla.org/products/firefox/upgrade/ [R=302,L]

Since mod_cache is ran as a quick handler, once the / page was cached,
the RewriteRule would never be ran.  So, after the first non-Firefox
Client connected, all clients, even those that are Firefox, would get a
cached page, not the redirect.

Add this:
CacheRunAfterOthers_RenameThisCmd on

And now it works. It does take a slight performance hit, but this is
nothing compared to using a mod_proxy backend.

Moving the Cache handler to be ran as an APR_HOOK_REALLY_FIRST means
mod_rewrite will get it's chance to rewrite it, but it will still run
before a really expensive handler, like a Proxy Backend or a PHP Script.
 This won't be useful to everyone, but for most configurations, the
initial hooks are insignificant compared to the Content Handler, so
allowing mod_cache to be called like this still makes sense.

I have lightly tested the attached patch, and it fixes my test cases.

Some details on this are also in PR 34885:
http://issues.apache.org/bugzilla/show_bug.cgi?id=34885

Thoughts/Comments/Code Review... all welcome.

-Paul

[1] Yes, I did name it 'CacheRunAfterOthers_RenameThisCmd'.  I hope that
before this patch is committed, someone thinks of a good name for this
behavior, because I can't think of one.

Re: [PATCH] mod_cache, don't always run as a quick handler.

Posted by Bill Stoddard <bi...@wstoddard.com>.
Paul Querna wrote:
> Okay, because of the quirky behavior of a 'sometimes' cached page, this
> one had me going in circles for a little while.
> 
> What this patch does, is add a new command to mod_cache,
> 'CacheRunAfterOthers_RenameThisCmd'[1]

+1 in concept (patch not reviewed). This was on my todo list from a couple of years back. No idea what to name 
the directive tho...

Bill


Re: [PATCH] mod_cache, don't always run as a quick handler.

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 11:35 AM 5/13/2005, Paul Querna wrote:
>Uhm, in my specific case yes, I needed it to get around not setting
>'Vary: User-Agent'.  This is not the only use.  Please don't -1 it based
>on my example of how I abused it in real life.

Please clarify how not setting that Vary header does not produce
erroneous results in other proxies?

Please also clarify a specific use case which would turn my vote
towards a +1 (or at least +/-0).

Bill

[A former Perl 'TMTOWTDI' who's becoming more of a Python believer.]



Re: [PATCH] mod_cache, don't always run as a quick handler.

Posted by Paul Querna <ch...@force-elite.com>.
William A. Rowe, Jr. wrote:
> At 04:55 AM 5/13/2005, Paul Querna wrote:
> 
>>Brian Akins wrote:
>>
>>>Paul Querna wrote:
>>>
>>>
>>>>CacheEnable disk /
>>>
>>>Maybe have it as an option to CacheEnable instead?
>>>
>>>CacheEnable disk /special_stuff normal
>>>CacheEnable disk / quick
> 
> 
> Unless I totally missed the point, any mistake in either mod_cache,
> or the content headers is masked by CacheEnable normal.  
> 
> What we do to 'hack around' broken content headers will hit other 
> proxies and cause exactly the same problems, no?
> 
> What we do to 'hack around' bugs in mod_cache prevents it from
> behaving as a proper proxy, no?
> 
> So I guess, I'm -1 on this option, seeing as the real issue was
> that mod_rewrite hadn't set the no-cache bits properly to prevent
> mod_cache from stashing away vary-by-browser content.


Uhm, in my specific case yes, I needed it to get around not setting
'Vary: User-Agent'.  This is not the only use.  Please don't -1 it based
on my example of how I abused it in real life.

-Paul

Re: [PATCH] mod_cache, don't always run as a quick handler.

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
At 04:55 AM 5/13/2005, Paul Querna wrote:
>Brian Akins wrote:
>> Paul Querna wrote:
>> 
>>> CacheEnable disk /
>> 
>> Maybe have it as an option to CacheEnable instead?
>> 
>> CacheEnable disk /special_stuff normal
>> CacheEnable disk / quick

Unless I totally missed the point, any mistake in either mod_cache,
or the content headers is masked by CacheEnable normal.  

What we do to 'hack around' broken content headers will hit other 
proxies and cause exactly the same problems, no?

What we do to 'hack around' bugs in mod_cache prevents it from
behaving as a proper proxy, no?

So I guess, I'm -1 on this option, seeing as the real issue was
that mod_rewrite hadn't set the no-cache bits properly to prevent
mod_cache from stashing away vary-by-browser content.

Bill





Re: [PATCH] mod_cache, don't always run as a quick handler.

Posted by Paul Querna <ch...@force-elite.com>.
Brian Akins wrote:
> Paul Querna wrote:
> 
>> CacheEnable disk /
> 
> Maybe have it as an option to CacheEnable instead?
> 
> CacheEnable disk /special_stuff normal
> CacheEnable disk / quick
> 
> with quick being the default.  That way, you would not have to do it
> globally, but could be more specific.
> 
>
> Also, have the "both" option where it will try in quick and normal
> handlers.

Attached is a patch that adds 'normal', 'both' and 'quick' to the
CacheEnable command.  It looks okay locally, but I haven't done
extensive testing.

FWIW, the last night's patch went live on update.mozilla.org, and
survived the morning release of Firefox 1.0.4.

Little propaganda on 2.1...

Basic Setup is two reverse proxy + cache machines in front of a single
PHP based Application Server.

Both cache/proxies were running Worker MPM and 2.1.3+patches.  Peaked at
about concurrent 3800 connections each. (only 4 gigs of ram on those
machines)  Once everything was configured correctly, the machines were
not even breaking a sweat.  Previously they were using several Squids,
but the features in 2.1 are very compelling for their uses...

-Paul

Re: [PATCH] mod_cache, don't always run as a quick handler.

Posted by Brian Akins <ba...@web.turner.com>.
Paul Querna wrote:

> CacheEnable disk /


Maybe have it as an option to CacheEnable instead?

CacheEnable disk /special_stuff normal
CacheEnable disk / quick

with quick being the default.  That way, you would not have to do it 
globally, but could be more specific.

Also, have the "both" option where it will try in quick and normal 
handlers.

-- 
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: [PATCH] mod_cache, don't always run as a quick handler.

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On Thu, May 12, 2005 at 11:34:38AM -0500, William A. Rowe, Jr. wrote:
> Let me suggest, instead, that every proxy between here and
> Timbuktu suffers the same problem from your example.
> 
> Better, methinks, is for mod_rewrite to toggle Vary: for the
> envvar that triggered its rewrite.  This is symptomatic of a
> more sinister side effect of mod_rewrite.

Exactly my thoughts as well.  If the Vary header were in the response, then
mod_cache would act accordingly.

Without that header, every HTTP cache in between the client and the server
would operate identically to mod_cache as a quick_handler.  So, this doesn't
actually fix the problem if mod_rewrite is lame.  In short, this patch doesn't
fix anything and only hides the underlying cause.  -- justin

Re: [PATCH] mod_cache, don't always run as a quick handler.

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.
Let me suggest, instead, that every proxy between here and
Timbuktu suffers the same problem from your example.

Better, methinks, is for mod_rewrite to toggle Vary: for the
envvar that triggered its rewrite.  This is symptomatic of a
more sinister side effect of mod_rewrite.

Bill

At 03:29 AM 5/12/2005, Paul Querna wrote:
>Okay, because of the quirky behavior of a 'sometimes' cached page, this
>one had me going in circles for a little while.
>
>What this patch does, is add a new command to mod_cache,
>'CacheRunAfterOthers_RenameThisCmd'[1]
>
>The effect this has is to run mod_cache as a normal handler, instead of
>a Quick Handler.  This means all the other Translate Name, Map to
>Storage, and Fixup hooks are ran.  These include mod_rewrite and
>mod_alias, among others.
>
>What this allows, is configurations like the following:
>CacheEnable disk /
>CacheRoot /tmp/cacheroot
>RewriteEngine On
>CacheIgnoreNoLastMod on
>CacheDefaultExpire 90
>CacheMaxExpire 90
>RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*\sFirefox/1\.0\.[1234]\b
>RewriteRule .* http://www.mozilla.org/products/firefox/upgrade/ [R=302,L]
>
>Since mod_cache is ran as a quick handler, once the / page was cached,
>the RewriteRule would never be ran.  So, after the first non-Firefox
>Client connected, all clients, even those that are Firefox, would get a
>cached page, not the redirect.
>
>Add this:
>CacheRunAfterOthers_RenameThisCmd on
>
>And now it works. It does take a slight performance hit, but this is
>nothing compared to using a mod_proxy backend.
>
>Moving the Cache handler to be ran as an APR_HOOK_REALLY_FIRST means
>mod_rewrite will get it's chance to rewrite it, but it will still run
>before a really expensive handler, like a Proxy Backend or a PHP Script.
> This won't be useful to everyone, but for most configurations, the
>initial hooks are insignificant compared to the Content Handler, so
>allowing mod_cache to be called like this still makes sense.
>
>I have lightly tested the attached patch, and it fixes my test cases.
>
>Some details on this are also in PR 34885:
>http://issues.apache.org/bugzilla/show_bug.cgi?id=34885
>
>Thoughts/Comments/Code Review... all welcome.
>
>-Paul
>
>[1] Yes, I did name it 'CacheRunAfterOthers_RenameThisCmd'.  I hope that
>before this patch is committed, someone thinks of a good name for this
>behavior, because I can't think of one.
>
>Index: modules/cache/mod_cache.c =================================================================== --- modules/cache/mod_cache.c      (revision 169788) +++ modules/cache/mod_cache.c (working copy) @@ -45,7 +45,8 @@   *     oh well.   */ -static int cache_url_handler(request_rec *r, int lookup) +static int cache_url_real_handler(request_rec *r, int lookup, +                                  cache_server_conf *conf, int is_quick) {      apr_status_t rv;      const char *auth; @@ -55,22 +56,13 @@      cache_provider_list *providers;      cache_info *info;      cache_request_rec *cache; -    cache_server_conf *conf;      apr_bucket_brigade *out; -    /* Delay initialization until we know we are handling a GET */ -    if (r->method_number != M_GET) { -        return DECLINED; -    } -      uri = r->parsed_uri;      url = r->unparsed_uri;      path = uri.path;      info = NULL; -    conf = (cache_server_conf *) ap_get_module_config(r->server->module_config, -                                                      &cache_module); -      /*       * Which cache module (if any) should handle this request?       */ @@ -156,9 +148,12 @@      /* We are in the quick handler hook, which means that no output       * filters have been set. So lets run the insert_filter hook.       */ -    ap_run_insert_filter(r); +    if (is_quick) { +        ap_run_insert_filter(r); +    } +      ap_add_output_filter_handle(cache_out_filter_handle, NULL, -                                r, r->connection); +                                    r, r->connection);      /* kick off the filter stack */      out = apr_brigade_create(r->pool, r->connection->bucket_alloc); @@ -174,6 +169,45 @@      return OK; } +static int cache_url_quick_handler(request_rec *r, int lookup) +{ +    cache_server_conf *conf; + +    /* Delay initialization until we know we are handling a GET */ +    if (r->method_number != M_GET) { +        return DECLINED; +    } + +    conf = (cache_server_conf *) ap_get_module_config(r->server->module_config, +                                                      &cache_module); + +    if (conf->run_handler_later) { +        return DECLINED; +    } + +    return cache_url_real_handler(r, lookup, conf, 1); +} + +static int cache_url_late_handler(request_rec *r) +{ +    cache_server_conf *conf; + +    /* Delay initialization until we know we are handling a GET */ +    if (r->method_number != M_GET) { +        return DECLINED; +    } + +    conf = (cache_server_conf *) ap_get_module_config(r->server->module_config, +                                                      &cache_module); + +    if (conf->run_handler_later) { +        return cache_url_real_handler(r, 0, conf, 0); +    } + +    return DECLINED; +} + + /*   * CACHE_OUT filter   * ---------------- @@ -742,6 +776,7 @@      /* array of headers that should not be stored in cache */      ps->ignore_headers = apr_array_make(p, 10, sizeof(char *));      ps->ignore_headers_set = CACHE_IGNORE_HEADERS_UNSET; +    ps->run_handler_later = 0;      return ps; } @@ -787,6 +822,10 @@          (overrides->ignore_headers_set == CACHE_IGNORE_HEADERS_UNSET)          ? base->ignore_headers          : overrides->ignore_headers; +    ps->run_handler_later = +        (overrides->run_handler_later == 0) +        ? base->run_handler_later +        : overrides->run_handler_later;      return ps; } static const char *set_cache_ignore_no_last_mod(cmd_parms *parms, void *dummy, @@ -829,6 +868,18 @@      return NULL; } +static const char *set_cache_run_later(cmd_parms *parms, void *dummy, +                                           int flag) +{ +    cache_server_conf *conf; + +    conf = +        (cache_server_conf *)ap_get_module_config(parms->server->module_config, +                                                  &cache_module); +    conf->run_handler_later = flag; +    return NULL; +} + static const char *set_cache_store_nostore(cmd_parms *parms, void *dummy,                                             int flag) { @@ -993,6 +1044,9 @@      AP_INIT_FLAG("CacheStorePrivate", set_cache_store_private,                   NULL, RSRC_CONF,                   "Ignore 'Cache-Control: private' and store private content"), +    AP_INIT_FLAG("CacheRunAfterOthers_RenameThisCmd", set_cache_run_later, +                 NULL, RSRC_CONF, +                 "Run the Cache Handler as a Normal Handler."),      AP_INIT_FLAG("CacheStoreNoStore", set_cache_store_nostore,                   NULL, RSRC_CONF,                   "Ignore 'Cache-Control: no-store' and store sensitive content"), @@ -1007,9 +1061,12 @@ static void register_hooks(apr_pool_t *p) { -    /* cache initializer */ -    /* cache handler */ -    ap_hook_quick_handler(cache_url_handler, NULL, NULL, APR_HOOK_FIRST); +    /* cache handlers */ +    ap_hook_quick_handler(cache_url_quick_handler, NULL, NULL, +                          APR_HOOK_FIRST); +    ap_hook_handler(cache_url_late_handler, NULL, NULL, +                    APR_HOOK_REALLY_FIRST); +      /* cache filters       * XXX The cache filters need to run right after the handlers and before       * any other filters. Consider creating AP_FTYPE_CACHE for this purpose. @@ -1030,6 +1087,8 @@                                    cache_out_filter,                                    NULL,                                    AP_FTYPE_CONTENT_SET-1); + +    /* cache initializer */      ap_hook_post_config(cache_post_config, NULL, NULL, APR_HOOK_REALLY_FIRST); } Index: modules/cache/mod_cache.h =================================================================== --- modules/cache/mod_cache.h       (revision 169788) +++ modules/cache/mod_cache.h (working copy) @@ -145,6 +145,8 @@      #define CACHE_IGNORE_HEADERS_SET   1      #define CACHE_IGNORE_HEADERS_UNSET 0      int ignore_headers_set; +    /** Flag to disable the Quick Handler, and run after meta-hooks */ +    int run_handler_later; } cache_server_conf; /* cache info information */ 



Re: [PATCH] mod_cache, don't always run as a quick handler.

Posted by Sander Striker <st...@apache.org>.
Paul Querna wrote:
> Okay, because of the quirky behavior of a 'sometimes' cached page, this
> one had me going in circles for a little while.
> 
> What this patch does, is add a new command to mod_cache,
> 'CacheRunAfterOthers_RenameThisCmd'[1]
> 
> The effect this has is to run mod_cache as a normal handler, instead of
> a Quick Handler.  This means all the other Translate Name, Map to
> Storage, and Fixup hooks are ran.  These include mod_rewrite and
> mod_alias, among others.

+1.  And FWIW, I like Brian Atkins' solution to add an extra flag
to CacheEnable.

Sander