You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by Justin Erenkrantz <je...@ebuilt.com> on 2001/05/01 01:25:38 UTC

Re: Buckets destroy not cleaning up private structures?

On Mon, Apr 30, 2001 at 03:20:17PM -0400, Cliff Woolley wrote:
> > How do we decide that?  Doesn't that depend on your OS and the size of
> > the MMAP extents (imagine 2 2GB files in your cache - which may be fine
> > on certain OSes but not on others - imagine 64bit address spaces).  That
> > is my inherent problem with using MMAP.
> 
> To be more precise, "too many MMAPs" is not just a factor of sheer number,
> it's (size1+size2+size3+...+sizen > OVERALL_MMAP_BYTES_LIMIT).  I just
> made that constant up, obviously, but apr_buckets_file.c defines
> MMAP_LIMIT for its own purposes (if it's not already defined, that is), so
> I see no reason that mod_file_cache couldn't similarly define a global
> limit in the same vein.
> 
> The file buckets code has a default MMAP_LIMIT for a given file of 4MB,
> just to give you an idea.  Any file bigger than that won't get MMAPed in
> the first place, at least not by the buckets code.  Multiply that times
> some sane number of files you wish to have MMAPed at one time, and voila,
> you've got your global limit for mod_file_cache.
> 
> Right?

Correct, but wouldn't it be "fairly" expensive to calculate how much 
space you have allocated with MMAP?  Or, you try and keep a static 
"cached" value of how much space you've allocated.  I bet this might be
a good place to use a reader/writer lock (wonder if anyone has 
implemented that - <G>).  -- justin


Re: Buckets destroy not cleaning up private structures?

Posted by Cliff Woolley <cl...@yahoo.com>.
On Mon, 30 Apr 2001, Justin Erenkrantz wrote:

> Yes, even a linear scan scares me.  I'm not the performance expert Dean
> is, but that seems "fairly" expensive to me to do on *every* request.

I agree.

> Yup.  It really depends on the scale of what you are expecting
> mod_file_cache to handle.  I'd suppose that the more files you have, the
> *more* you would want to use mod_file_cache.  I doubt that a site with a
> hundred-odd pages would even think about caching it.

True.  But there is also a limit to the number of file descriptors that a
process can have open at one time, though that limit can usually be
tweaked.  Regardless, whatever that limit is, it puts a cap on how many
pages can be cached by mod_file_cache and therefore a cap on the amount of
address space we might be talking about here...

> > At any rate, even if we don't try to track how much address space we've
> > used up, it would still be way, way better after fixing the leak than what
> > we have now, which uses up address space something like:
> >
> > sum_foreach_file_cached(sizeof(file)*num_requests_where_file_read).
>
> I know *anything* is an improvement over what is there right now.  I'm
> just thinking of the extreme cases for the proposed solution.  This is a
> minor quibble - don't mind me.  =)  -- justin

=-)  It's useful quibble, no doubt... in the end, I'm guessing some very
conservative approximation will act as a kind of "soft" limit, as we
probably want to avoid locking and the other hoops necessary for an exact
answer.  Just what that approximation is I don't know.

Bill?  Thoughts on this?

--Cliff


--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA


Re: Buckets destroy not cleaning up private structures?

Posted by Justin Erenkrantz <je...@ebuilt.com>.
On Mon, Apr 30, 2001 at 07:57:52PM -0400, Cliff Woolley wrote:
> By "fairly expensive", I presume you mean this little block, which is
> linear with the number of files cached:
> 
<exactly what I had in mind>
> 
> It's certainly no worse than that.

Yes, even a linear scan scares me.  I'm not the performance expert Dean 
is, but that seems "fairly" expensive to me to do on *every* request.  
mod_file_cache should be fast as we can make it.  But, you could make the 
case that the linear-scan tradeoff is worth it.  My gut feeling is a
reader/writer lock implementation might scale better.  But, I'd like to
see the code to prove that...

> You can even make it constant time by assuming that none of the files are
> mmaped to start with.  Just before you serve a request, check to see if
> the file is MMAPed.  If it's not, but it is after the request, mmaped_size
> += b->length again.  But that might require some kind of locking, which is
> (I'm guessing) what you were getting at.  Yeah, it could be a bit hairy to
> get a precise answer.  An estimate might be sufficient and easier, I don't
> know for sure.

Yup.  It really depends on the scale of what you are expecting 
mod_file_cache to handle.  I'd suppose that the more files you have, the 
*more* you would want to use mod_file_cache.  I doubt that a site with a 
hundred-odd pages would even think about caching it.  They'd get such a low 
amount of traffic that almost any implementation of HTTP would do fine.  
High-volume sites (such as cnet.com, for example) are sensitive to such 
"fairly expensive" things, but can possibly get a big performance win by
leveraging mod_file_cache (although if they use SSIs that might not matter
much - since mod_file_cache wouldn't be involved...).

Didn't Ian just submit a patch to skip evaluation of some environment 
variables in mod_include that increased his numbers by 25%? - if so, 
that's big...

> At any rate, even if we don't try to track how much address space we've
> used up, it would still be way, way better after fixing the leak than what
> we have now, which uses up address space something like:
> 
> sum_foreach_file_cached(sizeof(file)*num_requests_where_file_read).
> 
> <shrug>

I know *anything* is an improvement over what is there right now.  I'm
just thinking of the extreme cases for the proposed solution.  This is a 
minor quibble - don't mind me.  =)  -- justin


Re: Buckets destroy not cleaning up private structures?

Posted by Justin Erenkrantz <je...@ebuilt.com>.
> Correct, but wouldn't it be "fairly" expensive to calculate how much 
> space you have allocated with MMAP?  Or, you try and keep a static 
> "cached" value of how much space you've allocated.  I bet this might be
> a good place to use a reader/writer lock (wonder if anyone has 
> implemented that - <G>).  -- justin

BTW, this would only catch what mod_file_cache has allocated.  Another
module could well do the same thing.  This just seems icky all over.  
Am I alone in thinking this?  -- justin


Re: Buckets destroy not cleaning up private structures?

Posted by Justin Erenkrantz <je...@ebuilt.com>.
> Correct, but wouldn't it be "fairly" expensive to calculate how much 
> space you have allocated with MMAP?  Or, you try and keep a static 
> "cached" value of how much space you've allocated.  I bet this might be
> a good place to use a reader/writer lock (wonder if anyone has 
> implemented that - <G>).  -- justin

BTW, this would only catch what mod_file_cache has allocated.  Another
module could well do the same thing.  This just seems icky all over.  
Am I alone in thinking this?  -- justin


Re: Buckets destroy not cleaning up private structures?

Posted by Cliff Woolley <cl...@yahoo.com>.
On Mon, 30 Apr 2001, Justin Erenkrantz wrote:

> > The file buckets code has a default MMAP_LIMIT for a given file of 4MB,
> > just to give you an idea.  Any file bigger than that won't get MMAPed in
> > the first place, at least not by the buckets code.  Multiply that times
> > some sane number of files you wish to have MMAPed at one time, and voila,
> > you've got your global limit for mod_file_cache.
> >
> > Right?
>
> Correct, but wouldn't it be "fairly" expensive to calculate how much
> space you have allocated with MMAP?  Or, you try and keep a static
> "cached" value of how much space you've allocated.  I bet this might be
> a good place to use a reader/writer lock (wonder if anyone has
> implemented that - <G>).  -- justin

By "fairly expensive", I presume you mean this little block, which is
linear with the number of files cached:

int mmaped_size = 0;
foreach apr_bucket b {
   apr_bucket_file *f = b->data;
   if (f->mmap) {
       mmaped_size += b->length;
   }
}

It's certainly no worse than that.

You can even make it constant time by assuming that none of the files are
mmaped to start with.  Just before you serve a request, check to see if
the file is MMAPed.  If it's not, but it is after the request, mmaped_size
+= b->length again.  But that might require some kind of locking, which is
(I'm guessing) what you were getting at.  Yeah, it could be a bit hairy to
get a precise answer.  An estimate might be sufficient and easier, I don't
know for sure.

In response to your followup message, yes, this only keeps track of what
mod_file_cache has done, not accounting for what any other code that uses
MMAP has done.  That's why the upper limit must be conservative (as is the
4MB per-file limit imposed by the buckets code).  It might be the case
that the 4MB per-file limit is enough by itself, since dividing up the
address space we're willing to use for MMAPs by 4MB potentially yields a
large number of files that might be MMAPed without running out of address
space.

At any rate, even if we don't try to track how much address space we've
used up, it would still be way, way better after fixing the leak than what
we have now, which uses up address space something like:

sum_foreach_file_cached(sizeof(file)*num_requests_where_file_read).

<shrug>

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA