You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Brian Akins <ba...@web.turner.com> on 2004/08/04 14:56:31 UTC

[PATCH] mod_disk cached fixed

Sorry about this, but the last patch had a mistake in the writev

-- 
Brian Akins
Senior Systems Engineer
CNN Internet Technologies


Re: [PATCH] mod_disk cached fixed

Posted by Brian Akins <ba...@web.turner.com>.
Graham Leggett wrote:

>
> How resilient is this to garbage data on the disk? A risk exists of 
> somebody getting write access to the headers cache file, and then 
> crafting a cache headers file which when read causes a takeover of the 
> webserver. Just want to check that it's covered.
>
>  


That exists for the current way as well.    You could do a quick check 
to make sure the numbers look resonable, I suppose.

-- 
Brian Akins
Senior Systems Engineer
CNN Internet Technologies


Re: [PATCH] mod_disk cached fixed

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Wednesday, August 4, 2004 5:26 PM +0200 Graham Leggett <mi...@sharp.fm> 
wrote:

> How resilient is this to garbage data on the disk? A risk exists of somebody
> getting write access to the headers cache file, and then crafting a cache
> headers file which when read causes a takeover of the webserver. Just want
> to check that it's covered.

It's only reading in integers not pointers.  So I don't see how it'd cause a 
security risk.  -- justin

Re: [PATCH] mod_disk cached fixed

Posted by Graham Leggett <mi...@sharp.fm>.
Brian Akins wrote:

> Sorry about this, but the last patch had a mistake in the writev

How resilient is this to garbage data on the disk? A risk exists of 
somebody getting write access to the headers cache file, and then 
crafting a cache headers file which when read causes a takeover of the 
webserver. Just want to check that it's covered.

Regards,
Graham
--

Re: [PATCH] mod_disk cached fixed

Posted by Brian Akins <ba...@web.turner.com>.
Justin Erenkrantz wrote:

> Looks okay - I'll take a look at incorporating it to my local changes 
> and see how it helps.  The one thing I'd change is the sizeof(char) to 
> sizeof(newline).  Since it's a constant that allows '\r\n' to be sized 
> accordingly.  -- justin
>
Ok.

It may not help you much since your limited by you bandwidth.  You 
should see lower disk usage and cpu.

Can you e-mail details about your setup (apache versions and patches) 
and I'll try to do some tests here.

-- 
Brian Akins
Senior Systems Engineer
CNN Internet Technologies


Re: [PATCH] mod_disk cached fixed

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Wednesday, August 4, 2004 8:56 AM -0400 Brian Akins 
<ba...@web.turner.com> wrote:

> Sorry about this, but the last patch had a mistake in the writev

Looks okay - I'll take a look at incorporating it to my local changes and see 
how it helps.  The one thing I'd change is the sizeof(char) to 
sizeof(newline).  Since it's a constant that allows '\r\n' to be sized 
accordingly.  -- justin

mod_cache filter priorities was Re: [PATCH] mod_disk cached fixed

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Wednesday, August 4, 2004 4:26 PM -0400 Brian Akins 
<ba...@web.turner.com> wrote:

> Notice the plus in the second.

I thought about that, too.  If you place it with the +1, then you'd be after 
mod_deflate.  I'm not yet fully sure what the implication of that would be.

Moving the filters around may have some benefits.  Ideally, we should come up 
with different strategies for moving the filter around...  The only thing I 
know for sure is that CACHE_SAVE and CACHE_OUT need to be aligned at the same 
level.  I guess you could have multiple variants: one if the client supports 
caching, the other if it doesn't.  I'd have to start looking at our Vary code 
in depth though.  -- justin

Re: [PATCH] mod_disk cached fixed

Posted by Brian Akins <ba...@web.turner.com>.
Justin Erenkrantz wrote:

> -
> Disk space is cheap.  ;-)
>
> I think the vary header would still be preserved in the cached copy, 
> so I'm not sure how down-stream caches would be affected.  -- justin
>

Here are some interesting stats from a large new site:

Sample time of 6 hours => 1,039,361 hits from a single box

20,493 distinct User-Agents

Here's the top few:
152260: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
127111: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
117145: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 
1.1.4322)
43571: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)
37977: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)
30698: Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)
29425: -
25695: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 
1.0.3705; .NET CLR 1.1.4322)
20288: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)
15862: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Fetch API Request
13413: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.0.3705)
11447: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
11172: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 4.0)
10759: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) 
Gecko/20030624 Netscape/7.1 (ax)
10264: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 
1.0.3705; .NET CLR 1.1.4322)
8743: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461)
7710: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0)
6635: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; T312461)
5938: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/125.2 
(KHTML, like Gecko) Safari/125.8
5539: Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90)
5487: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312461)
5327: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; FunWebProducts)
5269: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Q312461)
5156: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) 
Gecko/20030624 Netscape/7.1 (ax)
5019: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-us) AppleWebKit/125.2 
(KHTML, like Gecko) Safari/125.8
5014: Mozilla/3.01 (compatible;)

 

So, if you varied simple on the value of User-Agent, you wind up with 
more than a dozen or so entries for each "object."  Disk space isn't the 
proble, it's that you would have to regenerate each object for each 
User-Agent.  Coupl this with varying on accepting encoding, and it grows 
somewhate larger.  In some instances it may take a few database queries 
and parsing to produce the page.  Besides the regex is not that 
expensive, in my tests, its in 11th place.

-- 
Brian Akins
Senior Systems Engineer
CNN Internet Technologies


Re: [PATCH] mod_disk cached fixed

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Wednesday, August 4, 2004 5:18 PM -0400 Joshua Slive <jo...@slive.ca> 
wrote:

> But it couldn't be as expensive as caching a variant for every User-Agent
> that accesses your site.  What is probably needed is
> CacheVaryOn env-variable
> which would override the vary-matching decision to vary only on the content
> of env-variable (which could be just on/off in this case), but not touch the
> actual Vary: header, since down-stream caches wouldn't have the extra logic
> needed determine if they needed to vary.

Disk space is cheap.  ;-)

I think the vary header would still be preserved in the cached copy, so I'm 
not sure how down-stream caches would be affected.  -- justin

Re: [PATCH] mod_disk cached fixed

Posted by Brian Akins <ba...@web.turner.com>.
Joshua Slive wrote:

>
> But it couldn't be as expensive as caching a variant for every 
> User-Agent that accesses your site.  What is probably needed is
> CacheVaryOn env-variable
> which would override the vary-matching decision to vary only on the 
> content of env-variable (which could be just on/off in this case), but 
> not touch the actual Vary: header, since down-stream caches wouldn't 
> have the extra logic needed determine if they needed to vary.


Yes.  Or maybe per header:

CacheVary User-Agent browser-gzip

where "browser-gzip" is an environment variable.

-- 
Brian Akins
Senior Systems Engineer
CNN Internet Technologies


Re: [PATCH] mod_disk cached fixed

Posted by Joshua Slive <jo...@slive.ca>.
On Wed, 4 Aug 2004, Justin Erenkrantz wrote:

> --On Wednesday, August 4, 2004 4:52 PM -0400 Brian Akins 
> <ba...@web.turner.com> wrote:
>> The thing that sucks is if you vary on User-Agent.  You wind up with a ton
>> of entries per uri.  I cheated in another modules by "varying" on an
>> environmental variable.  Kind of like this:
>> 
>> BrowserMatch ".*MSIE [1-3]|MSIE [1-5].*Mac.*|^Mozilla/[1-4].*Nav" no-gzip
>> 
>> and just "vary" on no-gzip (1 or 0), but this may be hard to do just using
>> headers...
>
> Note that BrowserMatch with regexp's is ridiculously expensive.  Minimizing 
> the need for that would be goodness, I think.  -- justin

But it couldn't be as expensive as caching a variant for every User-Agent 
that accesses your site.  What is probably needed is
CacheVaryOn env-variable
which would override the vary-matching decision to vary only 
on the content of env-variable (which could be just on/off in this case), 
but not touch the actual Vary: header, since down-stream caches wouldn't 
have the extra logic needed determine if they needed to vary.

Joshua.


Re: [PATCH] mod_disk cached fixed

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Wednesday, August 4, 2004 4:52 PM -0400 Brian Akins 
<ba...@web.turner.com> wrote:

> Possible scenerio:
>
> Serving cached content:
>
> -  lookup uri in cache (via md5?).
> -  check varies - a list of headers to vary on
> - caculate new key (md5) based on uri and clients value of these headers
> - lookup new uri in cache
> - continue as normal
>
> Caching an object:
> -see if object has been cached before by looking up uri in cache
> -do the Vary's match?
>     -no, discard old entry(?) and create new uri entry
>     -yes, generate new key based on client values
> -continue as normal

I wouldn't discard the old entry, but store it as a variant to also cache. 
But, yes, a two-level scheme like this may make sense.

This allows us to cache the gzipped and non-gzipped versions - which is what 
we'd want.

> The thing that sucks is if you vary on User-Agent.  You wind up with a ton
> of entries per uri.  I cheated in another modules by "varying" on an
> environmental variable.  Kind of like this:
>
> BrowserMatch ".*MSIE [1-3]|MSIE [1-5].*Mac.*|^Mozilla/[1-4].*Nav" no-gzip
>
> and just "vary" on no-gzip (1 or 0), but this may be hard to do just using
> headers...

Note that BrowserMatch with regexp's is ridiculously expensive.  Minimizing 
the need for that would be goodness, I think.  -- justin

Re: [PATCH] mod_disk cached fixed

Posted by Brian Akins <ba...@web.turner.com>.
Bill Stoddard wrote:

> This is the area in which mod_cache is most broken. It does not handle 
> vary at all, thus the content needs to be stored before it is touched 
> by any filters. But that doesn't work either because some filters will 
> not properly run when serving content out of a quick_handler (ie, they 
> might rely on some special something happening in the fixups hook for 
> instance). Can't recall any exact scenarios right off the top of my 
> busy brain but I know they exist. Would be real good to get this fixed 
> in 2.2


So, basically, we would need to parse Vary headers and, possibly, store 
mulitple versions of the same uri.  The vary header should just be 
Header names.

Possible scenerio:

Serving cached content:

-  lookup uri in cache (via md5?).
-  check varies - a list of headers to vary on
- caculate new key (md5) based on uri and clients value of these headers
- lookup new uri in cache
- continue as normal

Caching an object:
-see if object has been cached before by looking up uri in cache
-do the Vary's match?
    -no, discard old entry(?) and create new uri entry
    -yes, generate new key based on client values
-continue as normal


Also, if there are no Vary's on an object, there is no second key/entry.

Sound reasonable?

The thing that sucks is if you vary on User-Agent.  You wind up with a 
ton of entries per uri.  I cheated in another modules by "varying" on an 
environmental variable.  Kind of like this:

BrowserMatch ".*MSIE [1-3]|MSIE [1-5].*Mac.*|^Mozilla/[1-4].*Nav" no-gzip

and just "vary" on no-gzip (1 or 0), but this may be hard to do just 
using headers...

-- 
Brian Akins
Senior Systems Engineer
CNN Internet Technologies


Re: [PATCH] mod_disk cached fixed

Posted by Bill Stoddard <bi...@wstoddard.com>.
Brian Akins wrote:

> Should this:
>   cache_in_filter_handle =
>        ap_register_output_filter("CACHE_IN",
>                                  cache_in_filter,
>                                  NULL,
>                                  AP_FTYPE_CONTENT_SET-1);
> 
> 
> 
> Actually be this:
> 
> cache_in_filter_handle =
>        ap_register_output_filter("CACHE_IN",
>                                  cache_in_filter,
>                                  NULL,
>                                  AP_FTYPE_CONTENT_SET+1);
> 
> 
> 
> Notice the plus in the second.
> 
> 
This is the area in which mod_cache is most broken. It does not handle vary at all, thus the content needs to 
be stored before it is touched by any filters. But that doesn't work either because some filters will not 
properly run when serving content out of a quick_handler (ie, they might rely on some special something 
happening in the fixups hook for instance). Can't recall any exact scenarios right off the top of my busy 
brain but I know they exist. Would be real good to get this fixed in 2.2

Bill

Re: [PATCH] mod_disk cached fixed

Posted by Brian Akins <ba...@web.turner.com>.
Should this:
   cache_in_filter_handle =
        ap_register_output_filter("CACHE_IN",
                                  cache_in_filter,
                                  NULL,
                                  AP_FTYPE_CONTENT_SET-1);



Actually be this:

cache_in_filter_handle =
        ap_register_output_filter("CACHE_IN",
                                  cache_in_filter,
                                  NULL,
                                  AP_FTYPE_CONTENT_SET+1);



Notice the plus in the second.


-- 
Brian Akins
Senior Systems Engineer
CNN Internet Technologies