You are viewing a plain text version of this content. The canonical link for it is here.
Posted to bugs@httpd.apache.org by bu...@apache.org on 2008/08/09 09:40:35 UTC

DO NOT REPLY [Bug 45605] New: [Mon Aug 04 16:30:39 2008] [crit] [Mon Aug 04 16:30:39 2008] file fdqueue.c, line 293, assertion "!((queue)->nelts == (queue)->bounds)" failed

https://issues.apache.org/bugzilla/show_bug.cgi?id=45605

           Summary: [Mon Aug 04 16:30:39 2008] [crit] [Mon Aug 04 16:30:39
                    2008] file fdqueue.c, line 293, assertion "!((queue)-
                    >nelts == (queue)->bounds)" failed
           Product: Apache httpd-2
           Version: 2.2.8
          Platform: PC
        OS/Version: Windows Vista
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Core
        AssignedTo: bugs@httpd.apache.org
        ReportedBy: rmattison190@gmail.com


I have a mod_jk server, and the log shows this warning before the server dies:
 [Mon Aug 04 16:30:39 2008] [crit] [Mon Aug 04 16:30:39 2008] file fdqueue.c,
line 293, assertion "!((queue)->nelts == (queue)->bounds)" failed

I think when some (not all) of the backend servers are down (and the rest are
therefore slower) it happens.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 45605] [Mon Aug 04 16:30:39 2008] [crit] [Mon Aug 04 16:30:39 2008] file fdqueue.c, line 293, assertion "!((queue)->nelts == (queue)->bounds)" failed

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45605


Denis Ustimenko <de...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |denusk@gmail.com




--- Comment #4 from Denis Ustimenko <de...@gmail.com>  2008-10-07 07:08:42 PST ---
The same was reproduced several times under a heavy load with the 1 process / 2
thread configuration. Due to the data corruption by the worker_queue overflow
the core is dumped. Inspecting the core files, the underflow of
worker_queue_info.idlers is found.

Finally, that looks like the race between the condition signal and the atomic
update of the idlers variable.

The following scheduling scenario leads to the idlers underflow:

0. one listener + worker thread

1. listener got a connection, decreases idlers to 0, then context switch
2. worker does his job set the idlers from 0 to 1,
   then context switch before the condition signal
3. listener got a connection, sees that idlers is 1,
   so decreases is to 0, gets another connection,
   waits on the condition variable
4. worker remembering that the idlers was 0,
   does the cond_signal, then context switch
5. listener wakes up and set idlers to -1

The 2.2.9 patch is the following. The similar patch for 2.2.3 is currently
under the test.

--- server/mpm/worker/fdqueue.c.fdqueue-overflow    2006-07-12
07:38:44.000000000 +0400
+++ server/mpm/worker/fdqueue.c     2008-10-07 13:53:28.000000000 +0400
@@ -166,7 +166,7 @@
          *     now nonzero, it's safe for this function to
          *     return immediately.
          */
-        if (queue_info->idlers == 0) {
+        while (queue_info->idlers == 0) {
             rv = apr_thread_cond_wait(queue_info->wait_for_idler,
                                   queue_info->idlers_mutex);
             if (rv != APR_SUCCESS) {


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 45605] [Mon Aug 04 16:30:39 2008] [crit] [Mon Aug 04 16:30:39 2008] file fdqueue.c, line 293, assertion "!((queue)->nelts == (queue)->bounds)" failed

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45605


Ruediger Pluem <rp...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |RESOLVED
         Resolution|                            |FIXED




--- Comment #14 from Ruediger Pluem <rp...@apache.org>  2008-10-18 03:20:19 PST ---
Backported to 2.2.x as r705872
(http://svn.apache.org/viewvc?rev=705872&view=rev).


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 45605] [Mon Aug 04 16:30:39 2008] [crit] [Mon Aug 04 16:30:39 2008] file fdqueue.c, line 293, assertion "!((queue)->nelts == (queue)->bounds)" failed

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45605


Jeff Lawson <jl...@omniture.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jlawson@omniture.com
         OS/Version|Windows Vista               |All
            Version|2.2.8                       |2.2.9




--- Comment #1 from Jeff Lawson <jl...@omniture.com>  2008-09-18 13:11:09 PST ---
I was able to reproduce this bug under heavy load on Linux CentOS 4. Without
the maintainer-mode configured ap_queue_push() would simply write past the end
of the worker_queue->data[] array, and into the work_queue_info structure that
happened to be allocated directly past it in memory. The end result was that
the condition variable pointer at worker_queue_info->wait_for_idler would be
overwritten (as well as the other parts of worker_queue_info before that) and
the child process would hang when it attempted to shutdown
(queue_info_cleanup(), apr_thread_cond_destroy(), apr_pool_cleanup_run(),
thread_cond_cleanup(), pthread_cond_destroy() hung in here attempting to lock
the mutex embedded in the condition variable.

I also observer other problems (seg faults in two other places) which ins't
surprising since we are looking at a buffer overrun into who knows what memory.

Duplicated with 2.2.9. When maintainer mode was enable I got the same assert as
reported in this bug.

I would be willing to work on a fix for this, but haven't had time to dig deep
enough to know if there already is a mechanism to prevent this buffer array
overflow that simply is broken, or if one needs to be built. Any pointers would
be appreciated.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 45605] [Mon Aug 04 16:30:39 2008] [crit] [Mon Aug 04 16:30:39 2008] file fdqueue.c, line 293, assertion "!((queue)->nelts == (queue)->bounds)" failed

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45605


Chris Luby <cl...@omniture.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |cluby@omniture.com
          Component|Core                        |worker




-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 45605] [Mon Aug 04 16:30:39 2008] [crit] [Mon Aug 04 16:30:39 2008] file fdqueue.c, line 293, assertion "!((queue)->nelts == (queue)->bounds)" failed

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45605


Chris Luby <cl...@omniture.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #22614|0                           |1
        is obsolete|                            |




--- Comment #3 from Chris Luby <cl...@omniture.com>  2008-09-19 16:53:46 PST ---
Created an attachment (id=22615)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=22615)
worker_queue wait for not full patch for 2.2.6 - version 2 (added warning)

Slight update to the patch that I'm testing to include an error log warning to
help out validation


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 45605] [Mon Aug 04 16:30:39 2008] [crit] [Mon Aug 04 16:30:39 2008] file fdqueue.c, line 293, assertion "!((queue)->nelts == (queue)->bounds)" failed

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45605





--- Comment #7 from Ruediger Pluem <rp...@apache.org>  2008-10-08 06:33:05 PST ---
Committed to trunk as r702867
(http://svn.apache.org/viewvc?rev=702867&view=rev).


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 45605] [Mon Aug 04 16:30:39 2008] [crit] [Mon Aug 04 16:30:39 2008] file fdqueue.c, line 293, assertion "!((queue)->nelts == (queue)->bounds)" failed

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45605





--- Comment #8 from Jeff Lawson <jl...@omniture.com>  2008-10-08 08:41:36 PST ---
This latest report is a legitimate bug, as can be seen by following events
through the given scenario. However it is not the exactly the same bug as
originally reported. As can be seen by reviewing the assert condition that the
original bug reporter was having, the original condition was an overflow not an
underflow. Here is the scenario where the overflow can happen.

1. Listener thread is waiting for workers.
1. Worker threads are all busy but one that just finished.
2. That worker thread atomically increments idlers from 0 to 1 and awakens the
listener.
3. That worker thread context switches (before getting into ap_queue_pop())
4. Listener awakens and finds that there is an idle worker and begins to fill
the queue (ap_queue_push()), repeatedly until queue is overfilled.

Unfortunately the fix provided by Denis does not address this problem. 

The root of the problem is that there is no way for the listener to indicate
that it is idle, then execute code outside of a critical section, and then pick
up the work to be done without this timing window being present. 

The possible solutions are:
- Have the idler not indicate it is ready to process a request before it is
actually in the critical section where it will pick up the work to be
processed. This is not trivial using the current architecture of having the
queue and queueinfo structures being separate structures.
- Have the listener wait if the queue is full as Chris' patch does. This
introduces and an extra condition variable, and an extra mutex lock for each 
request. (Might be able to mitigate the mutex lock cost to almost zero by only
locking when the queue is full)
- To minimize code changes we could simply gracefully exit the child when the
queue is full allowing those requests to finish but no more requests to be
processed by this child. I don't like this solution because it makes more work
(child startup/shutdown) right when the system is already overloaded.

Comments? Other possibilities?


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 45605] [Mon Aug 04 16:30:39 2008] [crit] [Mon Aug 04 16:30:39 2008] file fdqueue.c, line 293, assertion "!((queue)->nelts == (queue)->bounds)" failed

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45605





--- Comment #9 from Jeff Lawson <jl...@omniture.com>  2008-10-08 09:07:29 PST ---
there is a mistake in the last comment. The paragraph:
The root of the problem is that there is no way for the listener to indicate
that it is idle, then execute code outside of a critical section, and then pick
up the work to be done without this timing window being present. 

should read:
The root of the problem is that there is no way for the worker (not listener)
to indicate that it is idle, then execute code outside of a critical section,
and then pick up the work to be done without this timing window being present. 


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 45605] [Mon Aug 04 16:30:39 2008] [crit] [Mon Aug 04 16:30:39 2008] file fdqueue.c, line 293, assertion "!((queue)->nelts == (queue)->bounds)" failed

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45605





--- Comment #12 from Jeff Lawson <jl...@omniture.com>  2008-10-10 12:48:08 PST ---
This patch also fixes ny scenario. No more assert.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 45605] [Mon Aug 04 16:30:39 2008] [crit] [Mon Aug 04 16:30:39 2008] file fdqueue.c, line 293, assertion "!((queue)->nelts == (queue)->bounds)" failed

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45605





--- Comment #11 from Jeff Lawson <jl...@omniture.com>  2008-10-08 14:36:12 PST ---
Yes I see that, thanks for the explanation.

I will test with the new patch. 

I see how the underflow of queue_info->idlers can cause on overflow of
queue->data[]:
1. Underflow explained by Denis happens.
2. Listener fills the queue, decrementing queue_info->idlers each insert making
it more and more negative until queue->data[] overflows and bad things happen.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 45605] [Mon Aug 04 16:30:39 2008] [crit] [Mon Aug 04 16:30:39 2008] file fdqueue.c, line 293, assertion "!((queue)->nelts == (queue)->bounds)" failed

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45605


Ruediger Pluem <rp...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEEDINFO




--- Comment #10 from Ruediger Pluem <rp...@apache.org>  2008-10-08 09:29:17 PST ---
(In reply to comment #8)
> This latest report is a legitimate bug, as can be seen by following events
> through the given scenario. However it is not the exactly the same bug as
> originally reported. As can be seen by reviewing the assert condition that the
> original bug reporter was having, the original condition was an overflow not an
> underflow. Here is the scenario where the overflow can happen.
> 
> 1. Listener thread is waiting for workers.
> 1. Worker threads are all busy but one that just finished.
> 2. That worker thread atomically increments idlers from 0 to 1 and awakens the
> listener.
> 3. That worker thread context switches (before getting into ap_queue_pop())
> 4. Listener awakens and finds that there is an idle worker and begins to fill
> the queue (ap_queue_push()), repeatedly until queue is overfilled.

I cannot follow this last point. Once the listerner awakes again from 
apr_thread_cond_wait in ap_queue_info_wait_for_idler it knows for sure that
there is at least one idle thread (after the patch from Denis is applied). But
if there is only one idle thread queue_info->idlers is decreased to zero again
by apr_atomic_dec32(&(queue_info->idlers)); in ap_queue_info_wait_for_idler.
After returning from ap_queue_info_wait_for_idler the listener thread tries to
accept *one* connection and pushes it to the queue. Afterwards it has to wait
again for an idle thread in ap_queue_info_wait_for_idler (exactly in the call
apr_thread_cond_wait). So the queue is not filled repeatedly by the listener
thread until overfilled.
Have you applied the patch and checked whether you still experience the same
kind of SegFaults as without?
Do you still see the assertion error message found by the original reporter?


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 45605] [Mon Aug 04 16:30:39 2008] [crit] [Mon Aug 04 16:30:39 2008] file fdqueue.c, line 293, assertion "!((queue)->nelts == (queue)->bounds)" failed

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45605





--- Comment #13 from Ruediger Pluem <rp...@apache.org>  2008-10-11 11:43:46 PST ---
Proposed for backport as r703707
(http://svn.apache.org/viewvc?rev=703707&view=rev).


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 45605] [Mon Aug 04 16:30:39 2008] [crit] [Mon Aug 04 16:30:39 2008] file fdqueue.c, line 293, assertion "!((queue)->nelts == (queue)->bounds)" failed

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45605





--- Comment #2 from Chris Luby <cl...@omniture.com>  2008-09-19 16:14:17 PST ---
Created an attachment (id=22614)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=22614)
worker_queue wait for not full patch for 2.2.6

This is the patch I'm currently testing for this against 2.2.6.  I just
finished it about an hour ago and I would love any feedback.  What I've done is
added a second condition to the queue structure that the listener thread waits
on when the queue is full until one of the worker threads signals it after
popping a socket off of the queue.  It's kind of the opposite of the not_empty
condition


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 45605] [Mon Aug 04 16:30:39 2008] [crit] [Mon Aug 04 16:30:39 2008] file fdqueue.c, line 293, assertion "!((queue)->nelts == (queue)->bounds)" failed

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45605


Denis Ustimenko <de...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEEDINFO                    |ASSIGNED




--- Comment #6 from Denis Ustimenko <de...@gmail.com>  2008-10-08 01:39:39 PST ---
(In reply to comment #5)
> (In reply to comment #4)
> > 
> > The 2.2.9 patch is the following. The similar patch for 2.2.3 is currently
> > under the test.
> 
> Very nice analysis. Just one question for clarification: After applying the
> patch you submitted the issue was gone and no longer reproducable?

Right! No segfault anymore.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


DO NOT REPLY [Bug 45605] [Mon Aug 04 16:30:39 2008] [crit] [Mon Aug 04 16:30:39 2008] file fdqueue.c, line 293, assertion "!((queue)->nelts == (queue)->bounds)" failed

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45605


Ruediger Pluem <rp...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO




--- Comment #5 from Ruediger Pluem <rp...@apache.org>  2008-10-07 08:52:54 PST ---
(In reply to comment #4)
> The same was reproduced several times under a heavy load with the 1 process / 2
> thread configuration. Due to the data corruption by the worker_queue overflow
> the core is dumped. Inspecting the core files, the underflow of
> worker_queue_info.idlers is found.
> 
> Finally, that looks like the race between the condition signal and the atomic
> update of the idlers variable.
> 
> The following scheduling scenario leads to the idlers underflow:
> 
> 0. one listener + worker thread
> 
> 1. listener got a connection, decreases idlers to 0, then context switch
> 2. worker does his job set the idlers from 0 to 1,
>    then context switch before the condition signal
> 3. listener got a connection, sees that idlers is 1,
>    so decreases is to 0, gets another connection,
>    waits on the condition variable
> 4. worker remembering that the idlers was 0,
>    does the cond_signal, then context switch
> 5. listener wakes up and set idlers to -1
> 
> The 2.2.9 patch is the following. The similar patch for 2.2.3 is currently
> under the test.

Very nice analysis. Just one question for clarification: After applying the
patch you submitted the issue was gone and no longer reproducable?


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org