You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Rainer Jung <ra...@kippdata.de> on 2009/01/04 00:16:09 UTC

Problem with file descriptor handling in httpd 2.3.1

During testing 2.3.1 I noticed a lot of errors of type EMFILE: "Too many 
open files". I used strace and the problem looks like this:

- The test case is using ab with HTTP keep alive, concurrency 20 and a 
small file, so doing about 2000 requests per second. 
MaxKeepAliveRequests=100 (Default)

- the file leading to EMFILE is the static content file, which can be 
observed to be open more than 1000 times in parallel although ab 
concurrency is only 20

- From looking at the code it seems the file is closed during a cleanup 
function associated to the request pool, which is triggered by an EOR bucket

Now what happens under KeepAlive is that the content files are kept open 
longer than the handling of the request, more precisely until the 
closing of the connection. So when  MaxKeepAliveRequests*Concurrency > 
MaxNumberOfFDs we run out of file descriptors.

I observed the behaviour with 2.3.1 on Linux (SLES10 64Bit) with Event, 
Worker and Prefork. I didn't yet have the time to retest with 2.2.

For Event and Worker I get also crashes (more precisely httpd processes 
stopping) due to apr_socket_accept() also returning with EMFILE.

Regards,

Rainer

Re: Problem with file descriptor handling in httpd 2.3.1

Posted by Jim Jagielski <ji...@jaguNET.com>.

On Jan 4, 2009, at 11:57 AM, Rainer Jung wrote:

>
> Here's the gdb story:
>
> When the content file gets opened, its cleanup is correctly  
> registered with the request pool. Later in core_filters.c at the end  
> of function ap_core_output_filter() line 528 we call  
> setaside_remaining_output().
>
> This goes down the stack via ap_save_brigade(),  
> file_bucket_setaside() to apr_file_setaside(). This kills the  
> cleanup for the request pool and adds it instead to the transaction  
> (=connection) pool. There we are.
>
> 2.2.x has a different structure, although I can also see two calls  
> to ap_save_brigade() in ap_core_output_filter(), but they use  
> different pools as new targets, namely a deferred_write_pool resp.  
> input_pool.
>

Uggg... so we need to do the 'same' with the 2.3/2.4 arch
as well...

Re: Problem with file descriptor handling in httpd 2.3.1

Posted by Ruediger Pluem <rp...@apache.org>.


On 01/04/2009 06:28 PM, Rainer Jung wrote:
> On 04.01.2009 17:57, Rainer Jung wrote:
>> When the content file gets opened, its cleanup is correctly registered
>> with the request pool. Later in core_filters.c at the end of function
>> ap_core_output_filter() line 528 we call setaside_remaining_output().
> 
> ...
> 
>> 2.2.x has a different structure, although I can also see two calls to
>> ap_save_brigade() in ap_core_output_filter(), but they use different
>> pools as new targets, namely a deferred_write_pool resp. input_pool.
> 
> And the code already contains the appropriate hint:
> 
> static void setaside_remaining_output(...)
> {
> ...
>         if (make_a_copy) {
>             /* XXX should this use a separate deferred write pool, like
>              * the original ap_core_output_filter?
>              */
>             ap_save_brigade(f, &(ctx->buffered_bb), &bb, c->pool);
> ...
> }
> 

Thanks for the analysis and good catch. Maybe I have a look into this by
tomorrow.

Regards

Rüdiger

Re: Problem with file descriptor handling in httpd 2.3.1

Posted by Rainer Jung <ra...@kippdata.de>.

On 04.01.2009 17:57, Rainer Jung wrote:
> When the content file gets opened, its cleanup is correctly registered
> with the request pool. Later in core_filters.c at the end of function
> ap_core_output_filter() line 528 we call setaside_remaining_output().

...

> 2.2.x has a different structure, although I can also see two calls to
> ap_save_brigade() in ap_core_output_filter(), but they use different
> pools as new targets, namely a deferred_write_pool resp. input_pool.

And the code already contains the appropriate hint:

static void setaside_remaining_output(...)
{
...
         if (make_a_copy) {
             /* XXX should this use a separate deferred write pool, like
              * the original ap_core_output_filter?
              */
             ap_save_brigade(f, &(ctx->buffered_bb), &bb, c->pool);
...
}

Re: Problem with file descriptor handling in httpd 2.3.1

Posted by Rainer Jung <ra...@kippdata.de>.

On 04.01.2009 15:04, Ruediger Pluem wrote:
>
> On 01/04/2009 12:49 AM, Rainer Jung wrote:
>> On 04.01.2009 00:36, Paul Querna wrote:
>>> Rainer Jung wrote:
>>>> During testing 2.3.1 I noticed a lot of errors of type EMFILE: "Too
>>>> many open files". I used strace and the problem looks like this:
>>>>
>>>> - The test case is using ab with HTTP keep alive, concurrency 20 and a
>>>> small file, so doing about 2000 requests per second.
>>>> MaxKeepAliveRequests=100 (Default)
>>>>
>>>> - the file leading to EMFILE is the static content file, which can be
>>>> observed to be open more than 1000 times in parallel although ab
>>>> concurrency is only 20
>>>>
>>>> - From looking at the code it seems the file is closed during a
>>>> cleanup function associated to the request pool, which is triggered by
>>>> an EOR bucket
>>>>
>>>> Now what happens under KeepAlive is that the content files are kept
>>>> open longer than the handling of the request, more precisely until the
>>>> closing of the connection. So when MaxKeepAliveRequests*Concurrency>
>>>> MaxNumberOfFDs we run out of file descriptors.
>>>>
>>>> I observed the behaviour with 2.3.1 on Linux (SLES10 64Bit) with
>>>> Event, Worker and Prefork. I didn't yet have the time to retest with
>>>> 2.2.
>>> It should only happen in 2.3.x/trunk because the EOR bucket is a new
>>> feature to let MPMs do async writes once the handler has finished
>>> running.
>>>
>>> And yes, this sounds like a nasty bug.
>> I verified I can't reproduce with the same platform and 2.2.11.
>>
>> Not sure I understand the EOR asynchronicity good enough to analyze the
>> root cause.
>
> Can you try the following patch please?

Here's the gdb story:

When the content file gets opened, its cleanup is correctly registered 
with the request pool. Later in core_filters.c at the end of function 
ap_core_output_filter() line 528 we call setaside_remaining_output().

This goes down the stack via ap_save_brigade(), file_bucket_setaside() 
to apr_file_setaside(). This kills the cleanup for the request pool and 
adds it instead to the transaction (=connection) pool. There we are.

2.2.x has a different structure, although I can also see two calls to 
ap_save_brigade() in ap_core_output_filter(), but they use different 
pools as new targets, namely a deferred_write_pool resp. input_pool.

So now we know, how it happens, but I don't have an immediate idea how 
to solve it.

Regards,

Rainer

Re: Problem with file descriptor handling in httpd 2.3.1

Posted by Ruediger Pluem <rp...@apache.org>.


On 01/04/2009 12:49 AM, Rainer Jung wrote:
> On 04.01.2009 00:36, Paul Querna wrote:
>> Rainer Jung wrote:
>>> During testing 2.3.1 I noticed a lot of errors of type EMFILE: "Too
>>> many open files". I used strace and the problem looks like this:
>>>
>>> - The test case is using ab with HTTP keep alive, concurrency 20 and a
>>> small file, so doing about 2000 requests per second.
>>> MaxKeepAliveRequests=100 (Default)
>>>
>>> - the file leading to EMFILE is the static content file, which can be
>>> observed to be open more than 1000 times in parallel although ab
>>> concurrency is only 20
>>>
>>> - From looking at the code it seems the file is closed during a
>>> cleanup function associated to the request pool, which is triggered by
>>> an EOR bucket
>>>
>>> Now what happens under KeepAlive is that the content files are kept
>>> open longer than the handling of the request, more precisely until the
>>> closing of the connection. So when MaxKeepAliveRequests*Concurrency >
>>> MaxNumberOfFDs we run out of file descriptors.
>>>
>>> I observed the behaviour with 2.3.1 on Linux (SLES10 64Bit) with
>>> Event, Worker and Prefork. I didn't yet have the time to retest with
>>> 2.2.
>>
>> It should only happen in 2.3.x/trunk because the EOR bucket is a new
>> feature to let MPMs do async writes once the handler has finished
>> running.
>>
>> And yes, this sounds like a nasty bug.
> 
> I verified I can't reproduce with the same platform and 2.2.11.
> 
> Not sure I understand the EOR asynchronicity good enough to analyze the
> root cause.

Can you try the following patch please?

Index: server/core_filters.c
===================================================================
--- server/core_filters.c       (Revision 731238)
+++ server/core_filters.c       (Arbeitskopie)
@@ -367,6 +367,7 @@

 #define THRESHOLD_MIN_WRITE 4096
 #define THRESHOLD_MAX_BUFFER 65536
+#define MAX_REQUESTS_QUEUED 10

 /* Optional function coming from mod_logio, used for logging of output
  * traffic
@@ -381,6 +382,7 @@
     apr_bucket_brigade *bb;
     apr_bucket *bucket, *next;
     apr_size_t bytes_in_brigade, non_file_bytes_in_brigade;
+    int requests;

     /* Fail quickly if the connection has already been aborted. */
     if (c->aborted) {
@@ -466,6 +468,7 @@

     bytes_in_brigade = 0;
     non_file_bytes_in_brigade = 0;
+    requests = 0;
     for (bucket = APR_BRIGADE_FIRST(bb); bucket != APR_BRIGADE_SENTINEL(bb);
          bucket = next) {
         next = APR_BUCKET_NEXT(bucket);
@@ -501,11 +504,22 @@
                 non_file_bytes_in_brigade += bucket->length;
             }
         }
+        else if (bucket->type == &ap_bucket_type_eor) {
+            /*
+             * Count the number of requests still queued in the brigade.
+             * Pipelining of a high number of small files can cause
+             * a high number of open file descriptors, which if it happens
+             * on many threads in parallel can cause us to hit the OS limits.
+             */
+            requests++;
+        }
     }

-    if (non_file_bytes_in_brigade >= THRESHOLD_MAX_BUFFER) {
+    if ((non_file_bytes_in_brigade >= THRESHOLD_MAX_BUFFER)
+        || (requests > MAX_REQUESTS_QUEUED)) {
         /* ### Writing the entire brigade may be excessive; we really just
-         * ### need to send enough data to be under THRESHOLD_MAX_BUFFER.
+         * ### need to send enough data to be under THRESHOLD_MAX_BUFFER or
+         * ### under MAX_REQUESTS_QUEUED
          */
         apr_status_t rv = send_brigade_blocking(net->client_socket, bb,
                                                 &(ctx->bytes_written), c);


This is still some sort of a hack, but maybe helpful to understand if this is
the problem.

Regards

Rüdiger

Re: Problem with file descriptor handling in httpd 2.3.1

Posted by Rainer Jung <ra...@kippdata.de>.

On 04.01.2009 16:22, Rainer Jung wrote:
> On 04.01.2009 15:56, Ruediger Pluem wrote:
>>
>> On 01/04/2009 03:48 PM, Rainer Jung wrote:
>>> On 04.01.2009 15:40, Ruediger Pluem wrote:
>>>> On 01/04/2009 03:26 PM, Rainer Jung wrote:
>>>>> On 04.01.2009 14:14, Ruediger Pluem wrote:
>>>>>> On 01/04/2009 11:24 AM, Rainer Jung wrote:
>>>>>>> On 04.01.2009 01:51, Ruediger Pluem wrote:
>>>>>>>> On 01/04/2009 12:49 AM, Rainer Jung wrote:
>>>>>>>>> On 04.01.2009 00:36, Paul Querna wrote:
>>>>>>>>>> Rainer Jung wrote:
>>>>>>>>>>> During testing 2.3.1 I noticed a lot of errors of type EMFILE:
>>>>>>>>>>> "Too
>>>>>>>>>>> many open files". I used strace and the problem looks like this:
>>>>>>>>>>>
>>>>>>>>>>> - The test case is using ab with HTTP keep alive, concurrency 20
>>>>>>>>>>> and a
>>>>>>>>>>> small file, so doing about 2000 requests per second.
>>>>>>>> What is the exact size of the file?
>>>>>>> It is the index.html, via URL /, so size is 45 Bytes.
>>>>>> Can you try if you run in the same problem on 2.2.x with a file of
>>>>>> size 257 bytes?
>>>>> I tried on the same type of system with event MPM and 2.2.11. Can't
>>>>> reproduce even with content file of size 257 bytes.
>>>> Possibly you need to increase the number of threads per process with
>>>> event MPM
>>>> and the number of concurrent requests from ab.
>>> I increased the maximum KeepAlive Requests and the KeepAlive timeout a
>>> lot and during a longer running test I see always exactly as many open
>>> FDs for the content file in /proc/PID/fd as I had concurrency in ab. So
>>> it seems the FDs always get closed before handling the next request in
>>> the connection.
>>>
>>> After testing the patch, I'll try it again with 257 bytes on 2.2.11 with
>>> prefork or worker.
>>
>> IMHO this cannot happen with prefork on 2.2.x. So I guess it is not
>> worth testing.
>> It still confuses me that this happens on trunk as it looks like that
>> ab does not
>> do pipelining.
>
> ^The strace log shows, that the sequence really is
>
> - new connection
>
> - read request
> - open file
> - send response
> - log request
>
> repeat this triplet a lot of times (maybe as long as KeepAlive is
> active) and then there are a lot of close() for the content files. Not
> sure, about the exact thing that triggers the close.
>
> So I don't necessarily see pipelining (in the sense of sending more
> requests before responses return) being necessary.
>
> I tested your patch (worker, trunk): It does not help. I then added an
> error log statement directly after the requests++ and it shows this
> number is always "1".

I can now even reproduce without load. Simply open a connection and send 
hand crafted KeepAlive requests via telnet. The file descriptors are 
kept open as long as the connection is alive. I'll run under the 
debugger to see, how the stack looks like, when the file gets closed.

Since the logging is done much earlier (directly after eahc request) the 
problem does not seem to be directly related to EOR. It looks like 
somehow the close file cleanup does not run when the request pool is 
destroyed or maybe it is registered with the connection pool. gdb should 
help.

More later.

Rainer

Re: Problem with file descriptor handling in httpd 2.3.1

Posted by Rainer Jung <ra...@kippdata.de>.

On 04.01.2009 15:56, Ruediger Pluem wrote:
>
> On 01/04/2009 03:48 PM, Rainer Jung wrote:
>> On 04.01.2009 15:40, Ruediger Pluem wrote:
>>> On 01/04/2009 03:26 PM, Rainer Jung wrote:
>>>> On 04.01.2009 14:14, Ruediger Pluem wrote:
>>>>> On 01/04/2009 11:24 AM, Rainer Jung wrote:
>>>>>> On 04.01.2009 01:51, Ruediger Pluem wrote:
>>>>>>> On 01/04/2009 12:49 AM, Rainer Jung wrote:
>>>>>>>> On 04.01.2009 00:36, Paul Querna wrote:
>>>>>>>>> Rainer Jung wrote:
>>>>>>>>>> During testing 2.3.1 I noticed a lot of errors of type EMFILE:
>>>>>>>>>> "Too
>>>>>>>>>> many open files". I used strace and the problem looks like this:
>>>>>>>>>>
>>>>>>>>>> - The test case is using ab with HTTP keep alive, concurrency 20
>>>>>>>>>> and a
>>>>>>>>>> small file, so doing about 2000 requests per second.
>>>>>>> What is the exact size of the file?
>>>>>> It is the index.html, via URL /, so size is 45 Bytes.
>>>>> Can you try if you run in the same problem on 2.2.x with a file of
>>>>> size 257 bytes?
>>>> I tried on the same type of system with event MPM and 2.2.11. Can't
>>>> reproduce even with content file of size 257 bytes.
>>> Possibly you need to increase the number of threads per process with
>>> event MPM
>>> and the number of concurrent requests from ab.
>> I increased the maximum KeepAlive Requests and the KeepAlive timeout a
>> lot and during a longer running test I see always exactly as many open
>> FDs for the content file in /proc/PID/fd as I had concurrency in ab. So
>> it seems the FDs always get closed before handling the next request in
>> the connection.
>>
>> After testing the patch, I'll try it again with 257 bytes on 2.2.11 with
>> prefork or worker.
>
> IMHO this cannot happen with prefork on 2.2.x. So I guess it is not worth testing.
> It still confuses me that this happens on trunk as it looks like that ab does not
> do pipelining.

^The strace log shows, that the sequence really is

- new connection

- read request
- open file
- send response
- log request

repeat this triplet a lot of times (maybe as long as KeepAlive is 
active) and then there are a lot of close() for the content files. Not 
sure, about the exact thing that triggers the close.

So I don't necessarily see pipelining (in the sense of sending more 
requests before responses return) being necessary.

I tested your patch (worker, trunk): It does not help. I then added an 
error log statement directly after the requests++ and it shows this 
number is always "1".

Regards,

Rainer

Re: Problem with file descriptor handling in httpd 2.3.1

Posted by Ruediger Pluem <rp...@apache.org>.


On 01/04/2009 03:48 PM, Rainer Jung wrote:
> On 04.01.2009 15:40, Ruediger Pluem wrote:
>>
>> On 01/04/2009 03:26 PM, Rainer Jung wrote:
>>> On 04.01.2009 14:14, Ruediger Pluem wrote:
>>>> On 01/04/2009 11:24 AM, Rainer Jung wrote:
>>>>> On 04.01.2009 01:51, Ruediger Pluem wrote:
>>>>>> On 01/04/2009 12:49 AM, Rainer Jung wrote:
>>>>>>> On 04.01.2009 00:36, Paul Querna wrote:
>>>>>>>> Rainer Jung wrote:
>>>>>>>>> During testing 2.3.1 I noticed a lot of errors of type EMFILE:
>>>>>>>>> "Too
>>>>>>>>> many open files". I used strace and the problem looks like this:
>>>>>>>>>
>>>>>>>>> - The test case is using ab with HTTP keep alive, concurrency 20
>>>>>>>>> and a
>>>>>>>>> small file, so doing about 2000 requests per second.
>>>>>> What is the exact size of the file?
>>>>> It is the index.html, via URL /, so size is 45 Bytes.
>>>> Can you try if you run in the same problem on 2.2.x with a file of
>>>> size 257 bytes?
>>> I tried on the same type of system with event MPM and 2.2.11. Can't
>>> reproduce even with content file of size 257 bytes.
>>
>> Possibly you need to increase the number of threads per process with
>> event MPM
>> and the number of concurrent requests from ab.
> 
> I increased the maximum KeepAlive Requests and the KeepAlive timeout a
> lot and during a longer running test I see always exactly as many open
> FDs for the content file in /proc/PID/fd as I had concurrency in ab. So
> it seems the FDs always get closed before handling the next request in
> the connection.
> 
> After testing the patch, I'll try it again with 257 bytes on 2.2.11 with
> prefork or worker.

IMHO this cannot happen with prefork on 2.2.x. So I guess it is not worth testing.
It still confuses me that this happens on trunk as it looks like that ab does not
do pipelining.

Regards

Rüdiger

Re: Problem with file descriptor handling in httpd 2.3.1

Posted by Rainer Jung <ra...@kippdata.de>.

On 04.01.2009 15:40, Ruediger Pluem wrote:
>
> On 01/04/2009 03:26 PM, Rainer Jung wrote:
>> On 04.01.2009 14:14, Ruediger Pluem wrote:
>>> On 01/04/2009 11:24 AM, Rainer Jung wrote:
>>>> On 04.01.2009 01:51, Ruediger Pluem wrote:
>>>>> On 01/04/2009 12:49 AM, Rainer Jung wrote:
>>>>>> On 04.01.2009 00:36, Paul Querna wrote:
>>>>>>> Rainer Jung wrote:
>>>>>>>> During testing 2.3.1 I noticed a lot of errors of type EMFILE: "Too
>>>>>>>> many open files". I used strace and the problem looks like this:
>>>>>>>>
>>>>>>>> - The test case is using ab with HTTP keep alive, concurrency 20
>>>>>>>> and a
>>>>>>>> small file, so doing about 2000 requests per second.
>>>>> What is the exact size of the file?
>>>> It is the index.html, via URL /, so size is 45 Bytes.
>>> Can you try if you run in the same problem on 2.2.x with a file of
>>> size 257 bytes?
>> I tried on the same type of system with event MPM and 2.2.11. Can't
>> reproduce even with content file of size 257 bytes.
>
> Possibly you need to increase the number of threads per process with event MPM
> and the number of concurrent requests from ab.

I increased the maximum KeepAlive Requests and the KeepAlive timeout a 
lot and during a longer running test I see always exactly as many open 
FDs for the content file in /proc/PID/fd as I had concurrency in ab. So 
it seems the FDs always get closed before handling the next request in 
the connection.

After testing the patch, I'll try it again with 257 bytes on 2.2.11 with 
prefork or worker.

Regards,

Rainer

Re: Problem with file descriptor handling in httpd 2.3.1

Posted by Ruediger Pluem <rp...@apache.org>.


On 01/04/2009 03:26 PM, Rainer Jung wrote:
> On 04.01.2009 14:14, Ruediger Pluem wrote:
>>
>> On 01/04/2009 11:24 AM, Rainer Jung wrote:
>>> On 04.01.2009 01:51, Ruediger Pluem wrote:
>>>> On 01/04/2009 12:49 AM, Rainer Jung wrote:
>>>>> On 04.01.2009 00:36, Paul Querna wrote:
>>>>>> Rainer Jung wrote:
>>>>>>> During testing 2.3.1 I noticed a lot of errors of type EMFILE: "Too
>>>>>>> many open files". I used strace and the problem looks like this:
>>>>>>>
>>>>>>> - The test case is using ab with HTTP keep alive, concurrency 20
>>>>>>> and a
>>>>>>> small file, so doing about 2000 requests per second.
>>>> What is the exact size of the file?
>>> It is the index.html, via URL /, so size is 45 Bytes.
>>
>> Can you try if you run in the same problem on 2.2.x with a file of
>> size 257 bytes?
> 
> I tried on the same type of system with event MPM and 2.2.11. Can't
> reproduce even with content file of size 257 bytes.

Possibly you need to increase the number of threads per process with event MPM
and the number of concurrent requests from ab.

Regards

Rüdiger

Re: Problem with file descriptor handling in httpd 2.3.1

Posted by Rainer Jung <ra...@kippdata.de>.

On 04.01.2009 14:14, Ruediger Pluem wrote:
>
> On 01/04/2009 11:24 AM, Rainer Jung wrote:
>> On 04.01.2009 01:51, Ruediger Pluem wrote:
>>> On 01/04/2009 12:49 AM, Rainer Jung wrote:
>>>> On 04.01.2009 00:36, Paul Querna wrote:
>>>>> Rainer Jung wrote:
>>>>>> During testing 2.3.1 I noticed a lot of errors of type EMFILE: "Too
>>>>>> many open files". I used strace and the problem looks like this:
>>>>>>
>>>>>> - The test case is using ab with HTTP keep alive, concurrency 20 and a
>>>>>> small file, so doing about 2000 requests per second.
>>> What is the exact size of the file?
>> It is the index.html, via URL /, so size is 45 Bytes.
>
> Can you try if you run in the same problem on 2.2.x with a file of size 257 bytes?

I tried on the same type of system with event MPM and 2.2.11. Can't 
reproduce even with content file of size 257 bytes.

The same file with trunk immediately reproduces the problem.

Will try your patch/hack next.

Thanks

Rainer

Re: Problem with file descriptor handling in httpd 2.3.1

Posted by Ruediger Pluem <rp...@apache.org>.


On 01/04/2009 11:24 AM, Rainer Jung wrote:
> On 04.01.2009 01:51, Ruediger Pluem wrote:
>>
>> On 01/04/2009 12:49 AM, Rainer Jung wrote:
>>> On 04.01.2009 00:36, Paul Querna wrote:
>>>> Rainer Jung wrote:
>>>>> During testing 2.3.1 I noticed a lot of errors of type EMFILE: "Too
>>>>> many open files". I used strace and the problem looks like this:
>>>>>
>>>>> - The test case is using ab with HTTP keep alive, concurrency 20 and a
>>>>> small file, so doing about 2000 requests per second.
>>
>> What is the exact size of the file?
> 
> It is the index.html, via URL /, so size is 45 Bytes.

Can you try if you run in the same problem on 2.2.x with a file of size 257 bytes?

Regards

Rüdiger

Re: Problem with file descriptor handling in httpd 2.3.1

Posted by Rainer Jung <ra...@kippdata.de>.

On 04.01.2009 01:51, Ruediger Pluem wrote:
>
> On 01/04/2009 12:49 AM, Rainer Jung wrote:
>> On 04.01.2009 00:36, Paul Querna wrote:
>>> Rainer Jung wrote:
>>>> During testing 2.3.1 I noticed a lot of errors of type EMFILE: "Too
>>>> many open files". I used strace and the problem looks like this:
>>>>
>>>> - The test case is using ab with HTTP keep alive, concurrency 20 and a
>>>> small file, so doing about 2000 requests per second.
>
> What is the exact size of the file?

It is the index.html, via URL /, so size is 45 Bytes.

Configuration is very close to original, except for:

40c40
< Listen myhost:8000
---
 > Listen 80

455,456c455,456
< EnableMMAP off
< EnableSendfile off
---
 > #EnableMMAP off
 > #EnableSendfile off

(because installation is on NFS, but the problem also occurs with those 
switches on)

The following Modules are loaded:

LoadModule authn_file_module modules/mod_authn_file.so
LoadModule authn_anon_module modules/mod_authn_anon.so
LoadModule authn_core_module modules/mod_authn_core.so
LoadModule authz_host_module modules/mod_authz_host.so
LoadModule authz_groupfile_module modules/mod_authz_groupfile.so
LoadModule authz_user_module modules/mod_authz_user.so
LoadModule authz_owner_module modules/mod_authz_owner.so
LoadModule authz_core_module modules/mod_authz_core.so
LoadModule access_compat_module modules/mod_access_compat.so
LoadModule auth_basic_module modules/mod_auth_basic.so
LoadModule auth_digest_module modules/mod_auth_digest.so
LoadModule log_config_module modules/mod_log_config.so
LoadModule env_module modules/mod_env.so
LoadModule mime_magic_module modules/mod_mime_magic.so
LoadModule cern_meta_module modules/mod_cern_meta.so
LoadModule expires_module modules/mod_expires.so
LoadModule headers_module modules/mod_headers.so
LoadModule ident_module modules/mod_ident.so
LoadModule usertrack_module modules/mod_usertrack.so
LoadModule unique_id_module modules/mod_unique_id.so
LoadModule setenvif_module modules/mod_setenvif.so
LoadModule version_module modules/mod_version.so
LoadModule mime_module modules/mod_mime.so
LoadModule unixd_module modules/mod_unixd.so
LoadModule status_module modules/mod_status.so
LoadModule autoindex_module modules/mod_autoindex.so
LoadModule asis_module modules/mod_asis.so
LoadModule info_module modules/mod_info.so
LoadModule suexec_module modules/mod_suexec.so
LoadModule vhost_alias_module modules/mod_vhost_alias.so
LoadModule negotiation_module modules/mod_negotiation.so
LoadModule dir_module modules/mod_dir.so
LoadModule imagemap_module modules/mod_imagemap.so
LoadModule actions_module modules/mod_actions.so
LoadModule speling_module modules/mod_speling.so
LoadModule userdir_module modules/mod_userdir.so
LoadModule alias_module modules/mod_alias.so
LoadModule rewrite_module modules/mod_rewrite.so

To reproduce you must use KeepAlive and your MaxKeepAliveRequests 
(Default:100) times concurrency must exceed the maximum number of FDs. 
Even without exceeding, you can use "httpd -X" and look at /proc/PID/fd 
during the test run. You should be able to notice a huge number of fds, 
all pointing to the index.html.

Regards,

Rainer

Re: Problem with file descriptor handling in httpd 2.3.1

Posted by Ruediger Pluem <rp...@apache.org>.


On 01/04/2009 12:49 AM, Rainer Jung wrote:
> On 04.01.2009 00:36, Paul Querna wrote:
>> Rainer Jung wrote:
>>> During testing 2.3.1 I noticed a lot of errors of type EMFILE: "Too
>>> many open files". I used strace and the problem looks like this:
>>>
>>> - The test case is using ab with HTTP keep alive, concurrency 20 and a
>>> small file, so doing about 2000 requests per second.

What is the exact size of the file?

Regards

Rüdiger

Re: Problem with file descriptor handling in httpd 2.3.1

Posted by Rainer Jung <ra...@kippdata.de>.

On 04.01.2009 00:36, Paul Querna wrote:
> Rainer Jung wrote:
>> During testing 2.3.1 I noticed a lot of errors of type EMFILE: "Too
>> many open files". I used strace and the problem looks like this:
>>
>> - The test case is using ab with HTTP keep alive, concurrency 20 and a
>> small file, so doing about 2000 requests per second.
>> MaxKeepAliveRequests=100 (Default)
>>
>> - the file leading to EMFILE is the static content file, which can be
>> observed to be open more than 1000 times in parallel although ab
>> concurrency is only 20
>>
>> - From looking at the code it seems the file is closed during a
>> cleanup function associated to the request pool, which is triggered by
>> an EOR bucket
>>
>> Now what happens under KeepAlive is that the content files are kept
>> open longer than the handling of the request, more precisely until the
>> closing of the connection. So when MaxKeepAliveRequests*Concurrency >
>> MaxNumberOfFDs we run out of file descriptors.
>>
>> I observed the behaviour with 2.3.1 on Linux (SLES10 64Bit) with
>> Event, Worker and Prefork. I didn't yet have the time to retest with 2.2.
>
> It should only happen in 2.3.x/trunk because the EOR bucket is a new
> feature to let MPMs do async writes once the handler has finished running.
>
> And yes, this sounds like a nasty bug.

I verified I can't reproduce with the same platform and 2.2.11.

Not sure I understand the EOR asynchronicity good enough to analyze the 
root cause.

Rainer

Re: Problem with file descriptor handling in httpd 2.3.1

Posted by Paul Querna <ch...@force-elite.com>.

Rainer Jung wrote:
> During testing 2.3.1 I noticed a lot of errors of type EMFILE: "Too many 
> open files". I used strace and the problem looks like this:
> 
> - The test case is using ab with HTTP keep alive, concurrency 20 and a 
> small file, so doing about 2000 requests per second. 
> MaxKeepAliveRequests=100 (Default)
> 
> - the file leading to EMFILE is the static content file, which can be 
> observed to be open more than 1000 times in parallel although ab 
> concurrency is only 20
> 
> - From looking at the code it seems the file is closed during a cleanup 
> function associated to the request pool, which is triggered by an EOR 
> bucket
> 
> Now what happens under KeepAlive is that the content files are kept open 
> longer than the handling of the request, more precisely until the 
> closing of the connection. So when  MaxKeepAliveRequests*Concurrency > 
> MaxNumberOfFDs we run out of file descriptors.
> 
> I observed the behaviour with 2.3.1 on Linux (SLES10 64Bit) with Event, 
> Worker and Prefork. I didn't yet have the time to retest with 2.2.

It should only happen in 2.3.x/trunk because the EOR bucket is a new 
feature to let MPMs do async writes once the handler has finished running.

And yes, this sounds like a nasty bug.

-Paul