You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Brian Akins <ba...@web.turner.com> on 2004/08/04 14:56:31 UTC
[PATCH] mod_disk cached fixed
Sorry about this, but the last patch had a mistake in the writev
--
Brian Akins
Senior Systems Engineer
CNN Internet Technologies
Re: [PATCH] mod_disk cached fixed
Posted by Brian Akins <ba...@web.turner.com>.
Graham Leggett wrote:
>
> How resilient is this to garbage data on the disk? A risk exists of
> somebody getting write access to the headers cache file, and then
> crafting a cache headers file which when read causes a takeover of the
> webserver. Just want to check that it's covered.
>
>
That exists for the current way as well. You could do a quick check
to make sure the numbers look resonable, I suppose.
--
Brian Akins
Senior Systems Engineer
CNN Internet Technologies
Re: [PATCH] mod_disk cached fixed
Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Wednesday, August 4, 2004 5:26 PM +0200 Graham Leggett <mi...@sharp.fm>
wrote:
> How resilient is this to garbage data on the disk? A risk exists of somebody
> getting write access to the headers cache file, and then crafting a cache
> headers file which when read causes a takeover of the webserver. Just want
> to check that it's covered.
It's only reading in integers not pointers. So I don't see how it'd cause a
security risk. -- justin
Re: [PATCH] mod_disk cached fixed
Posted by Graham Leggett <mi...@sharp.fm>.
Brian Akins wrote:
> Sorry about this, but the last patch had a mistake in the writev
How resilient is this to garbage data on the disk? A risk exists of
somebody getting write access to the headers cache file, and then
crafting a cache headers file which when read causes a takeover of the
webserver. Just want to check that it's covered.
Regards,
Graham
--
Re: [PATCH] mod_disk cached fixed
Posted by Brian Akins <ba...@web.turner.com>.
Justin Erenkrantz wrote:
> Looks okay - I'll take a look at incorporating it to my local changes
> and see how it helps. The one thing I'd change is the sizeof(char) to
> sizeof(newline). Since it's a constant that allows '\r\n' to be sized
> accordingly. -- justin
>
Ok.
It may not help you much since your limited by you bandwidth. You
should see lower disk usage and cpu.
Can you e-mail details about your setup (apache versions and patches)
and I'll try to do some tests here.
--
Brian Akins
Senior Systems Engineer
CNN Internet Technologies
Re: [PATCH] mod_disk cached fixed
Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Wednesday, August 4, 2004 8:56 AM -0400 Brian Akins
<ba...@web.turner.com> wrote:
> Sorry about this, but the last patch had a mistake in the writev
Looks okay - I'll take a look at incorporating it to my local changes and see
how it helps. The one thing I'd change is the sizeof(char) to
sizeof(newline). Since it's a constant that allows '\r\n' to be sized
accordingly. -- justin
mod_cache filter priorities was Re: [PATCH] mod_disk cached fixed
Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Wednesday, August 4, 2004 4:26 PM -0400 Brian Akins
<ba...@web.turner.com> wrote:
> Notice the plus in the second.
I thought about that, too. If you place it with the +1, then you'd be after
mod_deflate. I'm not yet fully sure what the implication of that would be.
Moving the filters around may have some benefits. Ideally, we should come up
with different strategies for moving the filter around... The only thing I
know for sure is that CACHE_SAVE and CACHE_OUT need to be aligned at the same
level. I guess you could have multiple variants: one if the client supports
caching, the other if it doesn't. I'd have to start looking at our Vary code
in depth though. -- justin
Re: [PATCH] mod_disk cached fixed
Posted by Brian Akins <ba...@web.turner.com>.
Justin Erenkrantz wrote:
> -
> Disk space is cheap. ;-)
>
> I think the vary header would still be preserved in the cached copy,
> so I'm not sure how down-stream caches would be affected. -- justin
>
Here are some interesting stats from a large new site:
Sample time of 6 hours => 1,039,361 hits from a single box
20,493 distinct User-Agents
Here's the top few:
152260: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
127111: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
117145: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR
1.1.4322)
43571: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)
37977: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)
30698: Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)
29425: -
25695: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR
1.0.3705; .NET CLR 1.1.4322)
20288: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)
15862: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0) Fetch API Request
13413: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.0.3705)
11447: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)
11172: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 4.0)
10759: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4)
Gecko/20030624 Netscape/7.1 (ax)
10264: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR
1.0.3705; .NET CLR 1.1.4322)
8743: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; T312461)
7710: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 4.0)
6635: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; T312461)
5938: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/125.2
(KHTML, like Gecko) Safari/125.8
5539: Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90)
5487: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312461)
5327: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; FunWebProducts)
5269: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Q312461)
5156: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4)
Gecko/20030624 Netscape/7.1 (ax)
5019: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-us) AppleWebKit/125.2
(KHTML, like Gecko) Safari/125.8
5014: Mozilla/3.01 (compatible;)
So, if you varied simple on the value of User-Agent, you wind up with
more than a dozen or so entries for each "object." Disk space isn't the
proble, it's that you would have to regenerate each object for each
User-Agent. Coupl this with varying on accepting encoding, and it grows
somewhate larger. In some instances it may take a few database queries
and parsing to produce the page. Besides the regex is not that
expensive, in my tests, its in 11th place.
--
Brian Akins
Senior Systems Engineer
CNN Internet Technologies
Re: [PATCH] mod_disk cached fixed
Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Wednesday, August 4, 2004 5:18 PM -0400 Joshua Slive <jo...@slive.ca>
wrote:
> But it couldn't be as expensive as caching a variant for every User-Agent
> that accesses your site. What is probably needed is
> CacheVaryOn env-variable
> which would override the vary-matching decision to vary only on the content
> of env-variable (which could be just on/off in this case), but not touch the
> actual Vary: header, since down-stream caches wouldn't have the extra logic
> needed determine if they needed to vary.
Disk space is cheap. ;-)
I think the vary header would still be preserved in the cached copy, so I'm
not sure how down-stream caches would be affected. -- justin
Re: [PATCH] mod_disk cached fixed
Posted by Brian Akins <ba...@web.turner.com>.
Joshua Slive wrote:
>
> But it couldn't be as expensive as caching a variant for every
> User-Agent that accesses your site. What is probably needed is
> CacheVaryOn env-variable
> which would override the vary-matching decision to vary only on the
> content of env-variable (which could be just on/off in this case), but
> not touch the actual Vary: header, since down-stream caches wouldn't
> have the extra logic needed determine if they needed to vary.
Yes. Or maybe per header:
CacheVary User-Agent browser-gzip
where "browser-gzip" is an environment variable.
--
Brian Akins
Senior Systems Engineer
CNN Internet Technologies
Re: [PATCH] mod_disk cached fixed
Posted by Joshua Slive <jo...@slive.ca>.
On Wed, 4 Aug 2004, Justin Erenkrantz wrote:
> --On Wednesday, August 4, 2004 4:52 PM -0400 Brian Akins
> <ba...@web.turner.com> wrote:
>> The thing that sucks is if you vary on User-Agent. You wind up with a ton
>> of entries per uri. I cheated in another modules by "varying" on an
>> environmental variable. Kind of like this:
>>
>> BrowserMatch ".*MSIE [1-3]|MSIE [1-5].*Mac.*|^Mozilla/[1-4].*Nav" no-gzip
>>
>> and just "vary" on no-gzip (1 or 0), but this may be hard to do just using
>> headers...
>
> Note that BrowserMatch with regexp's is ridiculously expensive. Minimizing
> the need for that would be goodness, I think. -- justin
But it couldn't be as expensive as caching a variant for every User-Agent
that accesses your site. What is probably needed is
CacheVaryOn env-variable
which would override the vary-matching decision to vary only
on the content of env-variable (which could be just on/off in this case),
but not touch the actual Vary: header, since down-stream caches wouldn't
have the extra logic needed determine if they needed to vary.
Joshua.
Re: [PATCH] mod_disk cached fixed
Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Wednesday, August 4, 2004 4:52 PM -0400 Brian Akins
<ba...@web.turner.com> wrote:
> Possible scenerio:
>
> Serving cached content:
>
> - lookup uri in cache (via md5?).
> - check varies - a list of headers to vary on
> - caculate new key (md5) based on uri and clients value of these headers
> - lookup new uri in cache
> - continue as normal
>
> Caching an object:
> -see if object has been cached before by looking up uri in cache
> -do the Vary's match?
> -no, discard old entry(?) and create new uri entry
> -yes, generate new key based on client values
> -continue as normal
I wouldn't discard the old entry, but store it as a variant to also cache.
But, yes, a two-level scheme like this may make sense.
This allows us to cache the gzipped and non-gzipped versions - which is what
we'd want.
> The thing that sucks is if you vary on User-Agent. You wind up with a ton
> of entries per uri. I cheated in another modules by "varying" on an
> environmental variable. Kind of like this:
>
> BrowserMatch ".*MSIE [1-3]|MSIE [1-5].*Mac.*|^Mozilla/[1-4].*Nav" no-gzip
>
> and just "vary" on no-gzip (1 or 0), but this may be hard to do just using
> headers...
Note that BrowserMatch with regexp's is ridiculously expensive. Minimizing
the need for that would be goodness, I think. -- justin
Re: [PATCH] mod_disk cached fixed
Posted by Brian Akins <ba...@web.turner.com>.
Bill Stoddard wrote:
> This is the area in which mod_cache is most broken. It does not handle
> vary at all, thus the content needs to be stored before it is touched
> by any filters. But that doesn't work either because some filters will
> not properly run when serving content out of a quick_handler (ie, they
> might rely on some special something happening in the fixups hook for
> instance). Can't recall any exact scenarios right off the top of my
> busy brain but I know they exist. Would be real good to get this fixed
> in 2.2
So, basically, we would need to parse Vary headers and, possibly, store
mulitple versions of the same uri. The vary header should just be
Header names.
Possible scenerio:
Serving cached content:
- lookup uri in cache (via md5?).
- check varies - a list of headers to vary on
- caculate new key (md5) based on uri and clients value of these headers
- lookup new uri in cache
- continue as normal
Caching an object:
-see if object has been cached before by looking up uri in cache
-do the Vary's match?
-no, discard old entry(?) and create new uri entry
-yes, generate new key based on client values
-continue as normal
Also, if there are no Vary's on an object, there is no second key/entry.
Sound reasonable?
The thing that sucks is if you vary on User-Agent. You wind up with a
ton of entries per uri. I cheated in another modules by "varying" on an
environmental variable. Kind of like this:
BrowserMatch ".*MSIE [1-3]|MSIE [1-5].*Mac.*|^Mozilla/[1-4].*Nav" no-gzip
and just "vary" on no-gzip (1 or 0), but this may be hard to do just
using headers...
--
Brian Akins
Senior Systems Engineer
CNN Internet Technologies
Re: [PATCH] mod_disk cached fixed
Posted by Bill Stoddard <bi...@wstoddard.com>.
Brian Akins wrote:
> Should this:
> cache_in_filter_handle =
> ap_register_output_filter("CACHE_IN",
> cache_in_filter,
> NULL,
> AP_FTYPE_CONTENT_SET-1);
>
>
>
> Actually be this:
>
> cache_in_filter_handle =
> ap_register_output_filter("CACHE_IN",
> cache_in_filter,
> NULL,
> AP_FTYPE_CONTENT_SET+1);
>
>
>
> Notice the plus in the second.
>
>
This is the area in which mod_cache is most broken. It does not handle vary at all, thus the content needs to
be stored before it is touched by any filters. But that doesn't work either because some filters will not
properly run when serving content out of a quick_handler (ie, they might rely on some special something
happening in the fixups hook for instance). Can't recall any exact scenarios right off the top of my busy
brain but I know they exist. Would be real good to get this fixed in 2.2
Bill
Re: [PATCH] mod_disk cached fixed
Posted by Brian Akins <ba...@web.turner.com>.
Should this:
cache_in_filter_handle =
ap_register_output_filter("CACHE_IN",
cache_in_filter,
NULL,
AP_FTYPE_CONTENT_SET-1);
Actually be this:
cache_in_filter_handle =
ap_register_output_filter("CACHE_IN",
cache_in_filter,
NULL,
AP_FTYPE_CONTENT_SET+1);
Notice the plus in the second.
--
Brian Akins
Senior Systems Engineer
CNN Internet Technologies