You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Joe Orton <jo...@redhat.com> on 2009/02/18 11:16:11 UTC

FLUSH, filtering, setaside, etc (was Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.)

On Mon, Feb 16, 2009 at 03:12:11PM +0100, Ruediger Pluem wrote:
> On 02/16/2009 02:13 PM, Joe Orton wrote:
> > Why is it invalid use of the filtering/buckets API to close the file 
> > after sending the FILE-containing brigade up the filter stack?  
> > 
> > It seems counter-intuitive to me that *anything* you do after the 
> > ap_pass_brigade() call should make a difference to what was passed up 
> > the filter stack.  But I suppose this is the case; even most of the 
> > memory-backed bucket types don't duplicate referenced memory in the 
> > setaside method.
> 
> I guess this is not needed in most cases and is not done for performance
> reasons. But maybe we can change the setaside for file buckets to do
> an additional dup on the fd.

The choices seem to be:

1) declare the FLUSH to be be necessary in this case as you said 
originally ;) - and document appropriately

2) rewrite/reinvent all the setaside methods to dup/copy as appropriate

3) declare the core output filter broken and fix it

Of these (2) and (3) will presumably have some negative performance 
impact though I'm not sure to what degree.  Not sure (2) is even going 
to be possible, e.g. with MMAP buckets, scary stuff.

Looking at the cray-zee non-blocking core output filter on the trunk, it 
will setaside with wild abandon unless it hits a FLUSH.  I think all 
these options are pretty distasteful but (1) seems like the only 
feasible route.  What do you think?  

Would be good to get some more opinions on this.

Regards, Joe

Re: FLUSH, filtering, setaside, etc (was Re: Problems with EOSoptimisation in ap_core_output_filter() and file buckets.)

Posted by "Plüm, Rüdiger, VF-Group" <ru...@vodafone.com>.

 

> -----Ursprüngliche Nachricht-----
> Von: Joe Orton 
> Gesendet: Freitag, 20. Februar 2009 10:15
> An: dev@httpd.apache.org
> Betreff: Re: FLUSH, filtering, setaside, etc (was Re: 
> Problems with EOSoptimisation in ap_core_output_filter() and 
> file buckets.)
> 
> On Thu, Feb 19, 2009 at 10:00:50PM +0100, Ruediger Pluem wrote:
> > On 02/19/2009 12:32 PM, Joe Orton wrote:
> ...
> > > @@ -497,13 +500,17 @@
> > >                  next = APR_BUCKET_NEXT(bucket);
> > >              }
> > >              bytes_in_brigade += bucket->length;
> > > -            if (!APR_BUCKET_IS_FILE(bucket)) {
> > > +            if (APR_BUCKET_IS_FILE(bucket)) {
> > > +                num_files_in_brigade++;
> > > +            } 
> > > +            else {                
> > >                  non_file_bytes_in_brigade += bucket->length;
> > >              }
> > >          }
> > >      }
> > >  
> > > -    if (non_file_bytes_in_brigade >= THRESHOLD_MAX_BUFFER) {
> > > +    if (non_file_bytes_in_brigade >= THRESHOLD_MAX_BUFFER
> > > +        || num_files_in_brigade >= THRESHOLD_MAX_FILES) {
> > 
> > If the 16 FD's were split over more then one brigade and the
> > brigades before us were set aside the FD's belong already 
> to the wrong pool
> > (the connection pool). Deleting a file bucket doesn't close 
> the FD it uses.
> 
> Not sure what the concern is there - this loop is iterating over the 
> concatenation of the buffered brigade an the "new" brigade 
> (right?), so 
> it will count the total number of buckets which are potentially left 
> buffered after this c_o_f invocation terminates.

The problem is that originally the apr_file objects are created in the
request pool. If you now sent two brigades down the chain, let's say
the first one with 15 files and the second one with 1 file and the first
one gets set aside for later writing for whatever reason then the 15
apr_file objects from the first brigade move over to the connection
pool and thus do not get closed until the connection gets closed (which
may take a long time). This is where the leak happens.
So I guess for a final solution we must find a way to avoid set asides
in the connection pool.

Regards

Rüdiger

Re: FLUSH, filtering, setaside, etc (was Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.)

Posted by Joe Orton <jo...@redhat.com>.

On Thu, Feb 19, 2009 at 10:00:50PM +0100, Ruediger Pluem wrote:
> On 02/19/2009 12:32 PM, Joe Orton wrote:
...
> > @@ -497,13 +500,17 @@
> >                  next = APR_BUCKET_NEXT(bucket);
> >              }
> >              bytes_in_brigade += bucket->length;
> > -            if (!APR_BUCKET_IS_FILE(bucket)) {
> > +            if (APR_BUCKET_IS_FILE(bucket)) {
> > +                num_files_in_brigade++;
> > +            } 
> > +            else {                
> >                  non_file_bytes_in_brigade += bucket->length;
> >              }
> >          }
> >      }
> >  
> > -    if (non_file_bytes_in_brigade >= THRESHOLD_MAX_BUFFER) {
> > +    if (non_file_bytes_in_brigade >= THRESHOLD_MAX_BUFFER
> > +        || num_files_in_brigade >= THRESHOLD_MAX_FILES) {
> 
> If the 16 FD's were split over more then one brigade and the
> brigades before us were set aside the FD's belong already to the wrong pool
> (the connection pool). Deleting a file bucket doesn't close the FD it uses.

Not sure what the concern is there - this loop is iterating over the 
concatenation of the buffered brigade an the "new" brigade (right?), so 
it will count the total number of buckets which are potentially left 
buffered after this c_o_f invocation terminates.

w.r.t. MMAP buckets: the 64K bytes limit will apply here, since FILE 
only morphs into MMAP if the file size is > 8K.  (And no, theoretically, 
the fd from which an MMAP bucket was derived is not needed after the 
mmap() call, but I don't think the fds actually get closed earlier than 
pool closure, normally)

w.r.t. multiple FILE buckets for a single fd; failing to buffer as much 
as possible will not be the end of the world, but I suppose that case is 
common, so we could cope with it something like this.  (completely 
untested)

Index: server/core_filters.c
===================================================================
--- server/core_filters.c	(revision 734690)
+++ server/core_filters.c	(working copy)
@@ -367,6 +367,7 @@
 
 #define THRESHOLD_MIN_WRITE 4096
 #define THRESHOLD_MAX_BUFFER 65536
+#define THRESHOLD_MAX_FILES 16
 
 /* Optional function coming from mod_logio, used for logging of output
  * traffic
@@ -380,7 +381,8 @@
     core_output_filter_ctx_t *ctx = net->out_ctx;
     apr_bucket_brigade *bb;
     apr_bucket *bucket, *next;
-    apr_size_t bytes_in_brigade, non_file_bytes_in_brigade;
+    apr_size_t bytes_in_brigade, non_file_bytes_in_brigade, num_files_in_brigade;
+    apr_file_t *last_file_seen;
 
     /* Fail quickly if the connection has already been aborted. */
     if (c->aborted) {
@@ -466,6 +468,9 @@
 
     bytes_in_brigade = 0;
     non_file_bytes_in_brigade = 0;
+    num_files_in_brigade = 0;
+    last_file_seen = NULL;
+
     for (bucket = APR_BRIGADE_FIRST(bb); bucket != APR_BRIGADE_SENTINEL(bb);
          bucket = next) {
         next = APR_BUCKET_NEXT(bucket);
@@ -497,13 +502,20 @@
                 next = APR_BUCKET_NEXT(bucket);
             }
             bytes_in_brigade += bucket->length;
-            if (!APR_BUCKET_IS_FILE(bucket)) {
+            if (APR_BUCKET_IS_FILE(bucket) 
+                && (last_file_seen == NULL
+                    || last_file_seen != ((apr_bucket_file *)bucket->data)->fd)) {
+                num_files_in_brigade++;
+                last_file_seen = ((apr_bucket_file *)bucket->data)->fd;
+            } 
+            else {                
                 non_file_bytes_in_brigade += bucket->length;
             }
         }
     }
 
-    if (non_file_bytes_in_brigade >= THRESHOLD_MAX_BUFFER) {
+    if (non_file_bytes_in_brigade >= THRESHOLD_MAX_BUFFER
+        || num_files_in_brigade >= THRESHOLD_MAX_FILES) {
         /* ### Writing the entire brigade may be excessive; we really just
          * ### need to send enough data to be under THRESHOLD_MAX_BUFFER.
          */

Re: FLUSH, filtering, setaside, etc (was Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.)

Posted by Ruediger Pluem <rp...@apache.org>.


On 02/19/2009 12:32 PM, Joe Orton wrote:
> On Wed, Feb 18, 2009 at 09:39:31PM +0100, Ruediger Pluem wrote:
>> On 02/18/2009 11:16 AM, Joe Orton wrote:
>> There is still a nasty issue with the trunk code that can cause you to
>> run out of FD's as the new non blocking core output filter has some trouble
>> setting aside the file buckets to the correct pool (it puts them in the
>> connection pool which is too long living and bad).
>>
>> See thread starting at
>>
>> http://mail-archives.apache.org/mod_mbox/httpd-dev/200901.mbox/%3c495FF1B9.9070607@kippdata.de%3e
>>
>> I admit that I lost momentum on this, but this definitely needs to be fixed.
> 
> Yes, that looks broken.
> 
> Something like the patch below should fix it, though I'm not sure it's 
> really a good idea to be allowing a pathological case of 16 fds consumed 
> per client.  It might present some interesting tuning challenges; 
> hitting fd limits before hitting num(client) limits.
> 
> Index: core_filters.c
> ===================================================================
> --- core_filters.c	(revision 734690)
> +++ core_filters.c	(working copy)
> @@ -367,6 +367,7 @@
>  
>  #define THRESHOLD_MIN_WRITE 4096
>  #define THRESHOLD_MAX_BUFFER 65536
> +#define THRESHOLD_MAX_FILES 16
>  
>  /* Optional function coming from mod_logio, used for logging of output
>   * traffic
> @@ -380,7 +381,7 @@
>      core_output_filter_ctx_t *ctx = net->out_ctx;
>      apr_bucket_brigade *bb;
>      apr_bucket *bucket, *next;
> -    apr_size_t bytes_in_brigade, non_file_bytes_in_brigade;
> +    apr_size_t bytes_in_brigade, non_file_bytes_in_brigade, num_files_in_brigade;
>  
>      /* Fail quickly if the connection has already been aborted. */
>      if (c->aborted) {
> @@ -466,6 +467,8 @@
>  
>      bytes_in_brigade = 0;
>      non_file_bytes_in_brigade = 0;
> +    num_files_in_brigade = 0;
> +
>      for (bucket = APR_BRIGADE_FIRST(bb); bucket != APR_BRIGADE_SENTINEL(bb);
>           bucket = next) {
>          next = APR_BUCKET_NEXT(bucket);
> @@ -497,13 +500,17 @@
>                  next = APR_BUCKET_NEXT(bucket);
>              }
>              bytes_in_brigade += bucket->length;
> -            if (!APR_BUCKET_IS_FILE(bucket)) {
> +            if (APR_BUCKET_IS_FILE(bucket)) {
> +                num_files_in_brigade++;
> +            } 
> +            else {                
>                  non_file_bytes_in_brigade += bucket->length;
>              }
>          }
>      }
>  
> -    if (non_file_bytes_in_brigade >= THRESHOLD_MAX_BUFFER) {
> +    if (non_file_bytes_in_brigade >= THRESHOLD_MAX_BUFFER
> +        || num_files_in_brigade >= THRESHOLD_MAX_FILES) {

If the 16 FD's were split over more then one brigade and the
brigades before us were set aside the FD's belong already to the wrong pool
(the connection pool). Deleting a file bucket doesn't close the FD it uses.

Regards

Rüdiger

Re: FLUSH, filtering, setaside, etc (was Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.)

Posted by Ruediger Pluem <rp...@apache.org>.


On 02/19/2009 12:32 PM, Joe Orton wrote:
> On Wed, Feb 18, 2009 at 09:39:31PM +0100, Ruediger Pluem wrote:
>> On 02/18/2009 11:16 AM, Joe Orton wrote:
>> There is still a nasty issue with the trunk code that can cause you to
>> run out of FD's as the new non blocking core output filter has some trouble
>> setting aside the file buckets to the correct pool (it puts them in the
>> connection pool which is too long living and bad).
>>
>> See thread starting at
>>
>> http://mail-archives.apache.org/mod_mbox/httpd-dev/200901.mbox/%3c495FF1B9.9070607@kippdata.de%3e
>>
>> I admit that I lost momentum on this, but this definitely needs to be fixed.
> 
> Yes, that looks broken.
> 
> Something like the patch below should fix it, though I'm not sure it's 
> really a good idea to be allowing a pathological case of 16 fds consumed 
> per client.  It might present some interesting tuning challenges; 
> hitting fd limits before hitting num(client) limits.

Thanks for the patch and sorry for picking on the details :-)

> 
> Index: core_filters.c
> ===================================================================
> --- core_filters.c	(revision 734690)
> +++ core_filters.c	(working copy)
> @@ -367,6 +367,7 @@
>  
>  #define THRESHOLD_MIN_WRITE 4096
>  #define THRESHOLD_MAX_BUFFER 65536
> +#define THRESHOLD_MAX_FILES 16
>  
>  /* Optional function coming from mod_logio, used for logging of output
>   * traffic
> @@ -380,7 +381,7 @@
>      core_output_filter_ctx_t *ctx = net->out_ctx;
>      apr_bucket_brigade *bb;
>      apr_bucket *bucket, *next;
> -    apr_size_t bytes_in_brigade, non_file_bytes_in_brigade;
> +    apr_size_t bytes_in_brigade, non_file_bytes_in_brigade, num_files_in_brigade;
>  
>      /* Fail quickly if the connection has already been aborted. */
>      if (c->aborted) {
> @@ -466,6 +467,8 @@
>  
>      bytes_in_brigade = 0;
>      non_file_bytes_in_brigade = 0;
> +    num_files_in_brigade = 0;
> +
>      for (bucket = APR_BRIGADE_FIRST(bb); bucket != APR_BRIGADE_SENTINEL(bb);
>           bucket = next) {
>          next = APR_BUCKET_NEXT(bucket);
> @@ -497,13 +500,17 @@
>                  next = APR_BUCKET_NEXT(bucket);
>              }
>              bytes_in_brigade += bucket->length;
> -            if (!APR_BUCKET_IS_FILE(bucket)) {
> +            if (APR_BUCKET_IS_FILE(bucket)) {
> +                num_files_in_brigade++;

Hm, how do you know that the FD in this bucket is different from the one
in the last file bucket(s). It may be possible that we have file buckets
in the brigade that represent different parts of the same file as a filter
might have put something in between tehm in the form of heap buckets or something
like that.
Furthermore what about MMAP buckets? Don't they have an open FD as well?

Regards

Rüdiger

Re: FLUSH, filtering, setaside, etc (was Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.)

Posted by Joe Orton <jo...@redhat.com>.

On Wed, Feb 18, 2009 at 09:39:31PM +0100, Ruediger Pluem wrote:
> On 02/18/2009 11:16 AM, Joe Orton wrote:
> There is still a nasty issue with the trunk code that can cause you to
> run out of FD's as the new non blocking core output filter has some trouble
> setting aside the file buckets to the correct pool (it puts them in the
> connection pool which is too long living and bad).
> 
> See thread starting at
> 
> http://mail-archives.apache.org/mod_mbox/httpd-dev/200901.mbox/%3c495FF1B9.9070607@kippdata.de%3e
> 
> I admit that I lost momentum on this, but this definitely needs to be fixed.

Yes, that looks broken.

Something like the patch below should fix it, though I'm not sure it's 
really a good idea to be allowing a pathological case of 16 fds consumed 
per client.  It might present some interesting tuning challenges; 
hitting fd limits before hitting num(client) limits.

Index: core_filters.c
===================================================================
--- core_filters.c	(revision 734690)
+++ core_filters.c	(working copy)
@@ -367,6 +367,7 @@
 
 #define THRESHOLD_MIN_WRITE 4096
 #define THRESHOLD_MAX_BUFFER 65536
+#define THRESHOLD_MAX_FILES 16
 
 /* Optional function coming from mod_logio, used for logging of output
  * traffic
@@ -380,7 +381,7 @@
     core_output_filter_ctx_t *ctx = net->out_ctx;
     apr_bucket_brigade *bb;
     apr_bucket *bucket, *next;
-    apr_size_t bytes_in_brigade, non_file_bytes_in_brigade;
+    apr_size_t bytes_in_brigade, non_file_bytes_in_brigade, num_files_in_brigade;
 
     /* Fail quickly if the connection has already been aborted. */
     if (c->aborted) {
@@ -466,6 +467,8 @@
 
     bytes_in_brigade = 0;
     non_file_bytes_in_brigade = 0;
+    num_files_in_brigade = 0;
+
     for (bucket = APR_BRIGADE_FIRST(bb); bucket != APR_BRIGADE_SENTINEL(bb);
          bucket = next) {
         next = APR_BUCKET_NEXT(bucket);
@@ -497,13 +500,17 @@
                 next = APR_BUCKET_NEXT(bucket);
             }
             bytes_in_brigade += bucket->length;
-            if (!APR_BUCKET_IS_FILE(bucket)) {
+            if (APR_BUCKET_IS_FILE(bucket)) {
+                num_files_in_brigade++;
+            } 
+            else {                
                 non_file_bytes_in_brigade += bucket->length;
             }
         }
     }
 
-    if (non_file_bytes_in_brigade >= THRESHOLD_MAX_BUFFER) {
+    if (non_file_bytes_in_brigade >= THRESHOLD_MAX_BUFFER
+        || num_files_in_brigade >= THRESHOLD_MAX_FILES) {
         /* ### Writing the entire brigade may be excessive; we really just
          * ### need to send enough data to be under THRESHOLD_MAX_BUFFER.
          */

Re: FLUSH, filtering, setaside, etc (was Re: Problems with EOS optimisation in ap_core_output_filter() and file buckets.)

Posted by Ruediger Pluem <rp...@apache.org>.


On 02/18/2009 11:16 AM, Joe Orton wrote:
> On Mon, Feb 16, 2009 at 03:12:11PM +0100, Ruediger Pluem wrote:
>> On 02/16/2009 02:13 PM, Joe Orton wrote:
>>> Why is it invalid use of the filtering/buckets API to close the file 
>>> after sending the FILE-containing brigade up the filter stack?  
>>>
>>> It seems counter-intuitive to me that *anything* you do after the 
>>> ap_pass_brigade() call should make a difference to what was passed up 
>>> the filter stack.  But I suppose this is the case; even most of the 
>>> memory-backed bucket types don't duplicate referenced memory in the 
>>> setaside method.
>> I guess this is not needed in most cases and is not done for performance
>> reasons. But maybe we can change the setaside for file buckets to do
>> an additional dup on the fd.
> 
> The choices seem to be:
> 
> 1) declare the FLUSH to be be necessary in this case as you said 
> originally ;) - and document appropriately
> 
> 2) rewrite/reinvent all the setaside methods to dup/copy as appropriate
> 
> 3) declare the core output filter broken and fix it
> 
> Of these (2) and (3) will presumably have some negative performance 
> impact though I'm not sure to what degree.  Not sure (2) is even going 
> to be possible, e.g. with MMAP buckets, scary stuff.

Yes, I don't know how MMAP would work with this, so dup might be a bad
approach regarding this. In any case it looks like it would cause a lot
of trouble until working stable again :-)

> 
> Looking at the cray-zee non-blocking core output filter on the trunk, it 
> will setaside with wild abandon unless it hits a FLUSH.  I think all 

There is still a nasty issue with the trunk code that can cause you to
run out of FD's as the new non blocking core output filter has some trouble
setting aside the file buckets to the correct pool (it puts them in the
connection pool which is too long living and bad).

See thread starting at

http://mail-archives.apache.org/mod_mbox/httpd-dev/200901.mbox/%3c495FF1B9.9070607@kippdata.de%3e

I admit that I lost momentum on this, but this definitely needs to be fixed.

> these options are pretty distasteful but (1) seems like the only 
> feasible route.  What do you think?  

I guess at least for the moment (1) is the way to go exactly for the
reasons you stated.

Regards

Rüdiger