You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Paul Querna <ch...@force-elite.com> on 2005/12/05 08:14:17 UTC

Event MPM: Spinning on cleanups?

I finally got around to upgrading to trunk w/ the Event MPM on one of my
machines. Within a couple hours of starting it, I had a process
spinning, and consuming 100% CPU.

Backtrace from the spinning thread:

(gdb) thread 6
[Switching to thread 6 (Thread 0x20450000 (LWP 100189))]#0
apr_allocator_free
    (allocator=0x2054ab80, node=0x205a2000) at memory/unix/apr_pools.c:334
334             if (max_free_index != APR_ALLOCATOR_MAX_FREE_UNLIMITED
(gdb) where
#0  apr_allocator_free (allocator=0x2054ab80, node=0x205a2000)
    at memory/unix/apr_pools.c:334
#1  0x280eb952 in apr_bucket_free (mem=0x0) at
buckets/apr_buckets_alloc.c:182
#2  0x280e9915 in heap_bucket_destroy (data=0x205a4090)
    at buckets/apr_buckets_heap.c:36
#3  0x280e9a54 in apr_brigade_cleanup (data=0x2059e6b0)
    at buckets/apr_brigade.c:44
#4  0x280e9a8b in brigade_cleanup (data=0x2059e6b0) at
buckets/apr_brigade.c:34
#5  0x282021bd in run_cleanups (cref=0x2059e028)
    at memory/unix/apr_pools.c:2044
#6  0x28202f39 in apr_pool_clear (pool=0x2059e018)
    at memory/unix/apr_pools.c:689
#7  0x08081063 in worker_thread (thd=0x81d1660, dummy=0x0) at event.c:682
#8  0x2820b3e4 in dummy_worker (opaque=0x0) at threadproc/unix/thread.c:138
#9  0x2823720b in pthread_create () from /usr/lib/libpthread.so.2
#10 0x282fa1ef in _ctx_start () from /lib/libc.so.6


OS: FreeBSD 6.0-STABLE
APR: Trunk
APR-Util: Trunk
HTTPD: Trunk

Best guess is that we corrupted a bucket brigade by double freeing it,
or something of that kind.  This is definitely a new behavior since the
async-write code was merged into trunk.

It is odd that we could of double-free'ed something on the connection
pool. Maybe it isn't a double-free issue at all...

I'm too tired to debug much of it tonight. Maybe later this week I will
dig deeper.

-Paul

Re: Event MPM: Spinning on cleanups?

Posted by Brian Pane <br...@apache.org>.

On Dec 5, 2005, at 10:30 AM, Brian Pane wrote:

> William A. Rowe, Jr. wrote:
>
>> Paul Querna wrote:
>>
>>>
>>> Best guess is that we corrupted a bucket brigade by double  
>>> freeing it,
>>> or something of that kind.  This is definitely a new behavior  
>>> since the
>>> async-write code was merged into trunk.
>>>
>>> It is odd that we could of double-free'ed something on the  
>>> connection
>>> pool. Maybe it isn't a double-free issue at all...
>>
>>
>> My other thought comes down to the thread locking.  Perhaps we are  
>> looking
>> at an issue of two threads touching the per-thread non-locking  
>> allocator
>> at once?  The reason I suspect is that it -seems- your process  
>> ended up
>> with a linked block list with a cyclic loop.
>
>
> That might well be the problem, since the thread that frees the  
> brigade
> isn't necessarily the same thread that allocated it.  I'll take a  
> look at the
> code later today.

No, on closer inspection, that's not it.  The event MPM creates a new
allocator whenever it creates a new ptrans pool (line 1000).  If the
ptrans pool ever gets destroyed, we might have problems, because it's
created as a child of pconf, and I don't think pconf is thread-safe.  It
doesn't appear that the ptrans pool is ever destroyed within the code,
though, so pconf thread-safety problems wouldn't explain the error
Paul saw.

Brian

Re: Event MPM: Spinning on cleanups?

Posted by Brian Pane <br...@apache.org>.

William A. Rowe, Jr. wrote:

> Paul Querna wrote:
>
>>
>> Best guess is that we corrupted a bucket brigade by double freeing it,
>> or something of that kind.  This is definitely a new behavior since the
>> async-write code was merged into trunk.
>>
>> It is odd that we could of double-free'ed something on the connection
>> pool. Maybe it isn't a double-free issue at all...
>
>
> My other thought comes down to the thread locking.  Perhaps we are 
> looking
> at an issue of two threads touching the per-thread non-locking allocator
> at once?  The reason I suspect is that it -seems- your process ended up
> with a linked block list with a cyclic loop.


That might well be the problem, since the thread that frees the brigade
isn't necessarily the same thread that allocated it.  I'll take a look 
at the
code later today.

Brian

Re: Event MPM: Spinning on cleanups?

Posted by "William A. Rowe, Jr." <wr...@rowe-clan.net>.

Paul Querna wrote:
> 
> Best guess is that we corrupted a bucket brigade by double freeing it,
> or something of that kind.  This is definitely a new behavior since the
> async-write code was merged into trunk.
> 
> It is odd that we could of double-free'ed something on the connection
> pool. Maybe it isn't a double-free issue at all...

My other thought comes down to the thread locking.  Perhaps we are looking
at an issue of two threads touching the per-thread non-locking allocator
at once?  The reason I suspect is that it -seems- your process ended up
with a linked block list with a cyclic loop.

Pools bug?, WAS: Re: Event MPM: Spinning on cleanups?

Posted by Sander Striker <st...@apache.org>.

Roy T. Fielding wrote:
> On Dec 30, 2005, at 5:51 PM, Brian Pane wrote:
> 
>> I haven't been able to find the bug yet.  As a next step, I'll try  using
>> valgrind on a build with pool debugging enabled.
> 
> 
> On entry to allocator_free, if
> 
>    (node == node->next && node->index > current_free_index)
> 
> is true, then the do { } while ((node = next) != NULL);
> will go into an infinite loop.  This is because
> 
>         if (max_free_index != APR_ALLOCATOR_MAX_FREE_UNLIMITED
>             && index > current_free_index) {
>             node->next = freelist;
>             freelist = node;
>         }
> 
> does not update current_free_index.  I don't know if that is the
> problem, but it may be the symptom.
> 
> ....Roy

Hmm, I'll go over the pools code once more to see if there really
is an obscure edge case bug hitting us.

Sander

Pools bug?, WAS: Re: Event MPM: Spinning on cleanups?

Posted by Sander Striker <st...@apache.org>.

Roy T. Fielding wrote:
> On Dec 30, 2005, at 5:51 PM, Brian Pane wrote:
> 
>> I haven't been able to find the bug yet.  As a next step, I'll try  using
>> valgrind on a build with pool debugging enabled.
> 
> 
> On entry to allocator_free, if
> 
>    (node == node->next && node->index > current_free_index)
> 
> is true, then the do { } while ((node = next) != NULL);
> will go into an infinite loop.  This is because
> 
>         if (max_free_index != APR_ALLOCATOR_MAX_FREE_UNLIMITED
>             && index > current_free_index) {
>             node->next = freelist;
>             freelist = node;
>         }
> 
> does not update current_free_index.  I don't know if that is the
> problem, but it may be the symptom.
> 
> ....Roy

Hmm, I'll go over the pools code once more to see if there really
is an obscure edge case bug hitting us.

Sander

apr_allocator Re: Event MPM: Spinning on cleanups?

Posted by Brian Pane <br...@apache.org>.

On Dec 30, 2005, at 6:48 PM, Roy T. Fielding wrote:

> On Dec 30, 2005, at 5:51 PM, Brian Pane wrote:
>
>> I haven't been able to find the bug yet.  As a next step, I'll try  
>> using
>> valgrind on a build with pool debugging enabled.
>
> On entry to allocator_free, if
>
>    (node == node->next && node->index > current_free_index)
>
> is true, then the do { } while ((node = next) != NULL);
> will go into an infinite loop.  This is because
>
>         if (max_free_index != APR_ALLOCATOR_MAX_FREE_UNLIMITED
>             && index > current_free_index) {
>             node->next = freelist;
>             freelist = node;
>         }
>
> does not update current_free_index.  I don't know if that is the
> problem, but it may be the symptom.

After reading and instrumenting the apr_allocator code, I just found
that odd things happen when
   max_free_index  == APR_ALLOCATOR_MAX_FREE_UNLIMITED.

The doesn't appear to be the cause of the problem in the Event
MPM, but it's worth noting.

APR_ALLOCATOR_MAX_FREE_UNLIMITED is zero, and it's used
as the initial value for allocator->current_free_index.  The problem is
that when apr_allocator_free() stores a block in one of its free  
lists, it
subtracts that block's index (a value from 1 to 20) from
allocator->current_free_index.

Note that allocator->current_free_index is unsigned.  Thus, when
it starts out at 0, subsequent calls to apr_allocator_free() turn it  
into
a number slightly smaller than 2^32.  On the next call to
apr_allocator_alloc() that recycles a block from one of the free
lists, this bit of defensive code ends up returning the value of
allocator->current_free_index to zero:

             allocator->current_free_index += node->index;
             if (allocator->current_free_index > allocator- 
 >max_free_index)
                 allocator->current_free_index = allocator- 
 >max_free_index;

We probably should have a check in apr_allocator_free() to
make sure that we're not causing an unsigned int overflow in
allocator->current_free_index, e.g.,
+    if (index > current_tree_index) {
+         current_tree_index = 0;
+    }
+    else {
           current_free_index -= index;
+    }

In cases where max_free_index is not set to
APR_ALLOCATOR_MAX_FREE_UNLIMITED, all the
current_free_index logic appears to work in a sane
manner: the values of current_free_index and
max_free_index are basically base-2 logarithms
of the amount of space currently in the
allocator->free[] lists and of the max amount
of space that may be stored in these lists.

These variables really could use clearer names
and some descriptive comments, however.  They're
not really indices: even without the unsigned
int overflow problem, they can have values much
larger than MAX_INDEX.  Maybe it's just me, but
the naming made it very tough to figure out the
underlying powers-of-two trick.

I'll commit a patch when I have time, hopefully
in the next couple of days.

Brian

apr_allocator Re: Event MPM: Spinning on cleanups?

Posted by Brian Pane <br...@apache.org>.

On Dec 30, 2005, at 6:48 PM, Roy T. Fielding wrote:

> On Dec 30, 2005, at 5:51 PM, Brian Pane wrote:
>
>> I haven't been able to find the bug yet.  As a next step, I'll try  
>> using
>> valgrind on a build with pool debugging enabled.
>
> On entry to allocator_free, if
>
>    (node == node->next && node->index > current_free_index)
>
> is true, then the do { } while ((node = next) != NULL);
> will go into an infinite loop.  This is because
>
>         if (max_free_index != APR_ALLOCATOR_MAX_FREE_UNLIMITED
>             && index > current_free_index) {
>             node->next = freelist;
>             freelist = node;
>         }
>
> does not update current_free_index.  I don't know if that is the
> problem, but it may be the symptom.

After reading and instrumenting the apr_allocator code, I just found
that odd things happen when
   max_free_index  == APR_ALLOCATOR_MAX_FREE_UNLIMITED.

The doesn't appear to be the cause of the problem in the Event
MPM, but it's worth noting.

APR_ALLOCATOR_MAX_FREE_UNLIMITED is zero, and it's used
as the initial value for allocator->current_free_index.  The problem is
that when apr_allocator_free() stores a block in one of its free  
lists, it
subtracts that block's index (a value from 1 to 20) from
allocator->current_free_index.

Note that allocator->current_free_index is unsigned.  Thus, when
it starts out at 0, subsequent calls to apr_allocator_free() turn it  
into
a number slightly smaller than 2^32.  On the next call to
apr_allocator_alloc() that recycles a block from one of the free
lists, this bit of defensive code ends up returning the value of
allocator->current_free_index to zero:

             allocator->current_free_index += node->index;
             if (allocator->current_free_index > allocator- 
 >max_free_index)
                 allocator->current_free_index = allocator- 
 >max_free_index;

We probably should have a check in apr_allocator_free() to
make sure that we're not causing an unsigned int overflow in
allocator->current_free_index, e.g.,
+    if (index > current_tree_index) {
+         current_tree_index = 0;
+    }
+    else {
           current_free_index -= index;
+    }

In cases where max_free_index is not set to
APR_ALLOCATOR_MAX_FREE_UNLIMITED, all the
current_free_index logic appears to work in a sane
manner: the values of current_free_index and
max_free_index are basically base-2 logarithms
of the amount of space currently in the
allocator->free[] lists and of the max amount
of space that may be stored in these lists.

These variables really could use clearer names
and some descriptive comments, however.  They're
not really indices: even without the unsigned
int overflow problem, they can have values much
larger than MAX_INDEX.  Maybe it's just me, but
the naming made it very tough to figure out the
underlying powers-of-two trick.

I'll commit a patch when I have time, hopefully
in the next couple of days.

Brian

Re: Event MPM: Spinning on cleanups?

Posted by "Roy T. Fielding" <fi...@gbiv.com>.

On Dec 30, 2005, at 5:51 PM, Brian Pane wrote:

> I haven't been able to find the bug yet.  As a next step, I'll try  
> using
> valgrind on a build with pool debugging enabled.

On entry to allocator_free, if

    (node == node->next && node->index > current_free_index)

is true, then the do { } while ((node = next) != NULL);
will go into an infinite loop.  This is because

         if (max_free_index != APR_ALLOCATOR_MAX_FREE_UNLIMITED
             && index > current_free_index) {
             node->next = freelist;
             freelist = node;
         }

does not update current_free_index.  I don't know if that is the
problem, but it may be the symptom.

....Roy

Re: Event MPM: Spinning on cleanups?

Posted by Brian Pane <br...@apache.org>.

I haven't been able to find the bug yet.  As a next step, I'll try using
valgrind on a build with pool debugging enabled.

Brian

On Dec 4, 2005, at 11:14 PM, Paul Querna wrote:

> I finally got around to upgrading to trunk w/ the Event MPM on one  
> of my
> machines. Within a couple hours of starting it, I had a process
> spinning, and consuming 100% CPU.
>
> Backtrace from the spinning thread:
>
> (gdb) thread 6
> [Switching to thread 6 (Thread 0x20450000 (LWP 100189))]#0
> apr_allocator_free
>     (allocator=0x2054ab80, node=0x205a2000) at memory/unix/ 
> apr_pools.c:334
> 334             if (max_free_index != APR_ALLOCATOR_MAX_FREE_UNLIMITED
> (gdb) where
> #0  apr_allocator_free (allocator=0x2054ab80, node=0x205a2000)
>     at memory/unix/apr_pools.c:334
> #1  0x280eb952 in apr_bucket_free (mem=0x0) at
> buckets/apr_buckets_alloc.c:182
> #2  0x280e9915 in heap_bucket_destroy (data=0x205a4090)
>     at buckets/apr_buckets_heap.c:36
> #3  0x280e9a54 in apr_brigade_cleanup (data=0x2059e6b0)
>     at buckets/apr_brigade.c:44
> #4  0x280e9a8b in brigade_cleanup (data=0x2059e6b0) at
> buckets/apr_brigade.c:34
> #5  0x282021bd in run_cleanups (cref=0x2059e028)
>     at memory/unix/apr_pools.c:2044
> #6  0x28202f39 in apr_pool_clear (pool=0x2059e018)
>     at memory/unix/apr_pools.c:689
> #7  0x08081063 in worker_thread (thd=0x81d1660, dummy=0x0) at  
> event.c:682
> #8  0x2820b3e4 in dummy_worker (opaque=0x0) at threadproc/unix/ 
> thread.c:138
> #9  0x2823720b in pthread_create () from /usr/lib/libpthread.so.2
> #10 0x282fa1ef in _ctx_start () from /lib/libc.so.6
>
>
> OS: FreeBSD 6.0-STABLE
> APR: Trunk
> APR-Util: Trunk
> HTTPD: Trunk
>
> Best guess is that we corrupted a bucket brigade by double freeing it,
> or something of that kind.  This is definitely a new behavior since  
> the
> async-write code was merged into trunk.
>
> It is odd that we could of double-free'ed something on the connection
> pool. Maybe it isn't a double-free issue at all...
>
> I'm too tired to debug much of it tonight. Maybe later this week I  
> will
> dig deeper.
>
> -Paul
>

Re: Event MPM: Spinning on cleanups?

Posted by Brian Pane <br...@apache.org>.

On Dec 31, 2005, at 3:01 AM, Paul Querna wrote:

> I haven't been able to reproduce this since reporting it, and I am  
> still running the Event MPM, but I did recently upgrade from  
> FreeBSD 5.4 -> 6.0... If its a rare race condition, that might be  
> enough to hide it.

The bucket allocator cleanup problem might have been a contributing  
factor.
Here's the patch I checked in to the trunk late yesterday:

http://svn.apache.org/viewcvs?rev=360257&view=rev

The async write logic may have increased the probability of buckets
remaining in brigades in the connection pool after the last request has
been processed on that connection (e.g., if a client drops the  
connection
before the response is completely sent).  That would be enough to
trigger the "freeing buckets allocated from an allocator that has
already been destroyed" situation.

Brian

Re: Event MPM: Spinning on cleanups?

Posted by Paul Querna <ch...@force-elite.com>.

Brian Pane wrote:
> On Dec 4, 2005, at 11:14 PM, Paul Querna wrote:
> 
>> I finally got around to upgrading to trunk w/ the Event MPM on one of my
>> machines. Within a couple hours of starting it, I had a process
>> spinning, and consuming 100% CPU.
>>
>> Backtrace from the spinning thread:
>>
>> (gdb) thread 6
>> [Switching to thread 6 (Thread 0x20450000 (LWP 100189))]#0
>> apr_allocator_free
>>     (allocator=0x2054ab80, node=0x205a2000) at 
>> memory/unix/apr_pools.c:334
>> 334             if (max_free_index != APR_ALLOCATOR_MAX_FREE_UNLIMITED
>> (gdb) where
>> #0  apr_allocator_free (allocator=0x2054ab80, node=0x205a2000)
>>     at memory/unix/apr_pools.c:334
> 
> Paul, if you're able to reproduce this condition, can you post a
> dump of the free lists in the allocator?  If something allocated
> from the apr_allocator_t is being freed twice, it may be detectable
> as a loop in one of the allocator->free[] lists.

I haven't been able to reproduce this since reporting it, and I am still 
running the Event MPM, but I did recently upgrade from FreeBSD 5.4 -> 
6.0... If its a rare race condition, that might be enough to hide it.

-Paul

Re: Event MPM: Spinning on cleanups?

Posted by Brian Pane <br...@apache.org>.

On Dec 4, 2005, at 11:14 PM, Paul Querna wrote:

> I finally got around to upgrading to trunk w/ the Event MPM on one  
> of my
> machines. Within a couple hours of starting it, I had a process
> spinning, and consuming 100% CPU.
>
> Backtrace from the spinning thread:
>
> (gdb) thread 6
> [Switching to thread 6 (Thread 0x20450000 (LWP 100189))]#0
> apr_allocator_free
>     (allocator=0x2054ab80, node=0x205a2000) at memory/unix/ 
> apr_pools.c:334
> 334             if (max_free_index != APR_ALLOCATOR_MAX_FREE_UNLIMITED
> (gdb) where
> #0  apr_allocator_free (allocator=0x2054ab80, node=0x205a2000)
>     at memory/unix/apr_pools.c:334

Paul, if you're able to reproduce this condition, can you post a
dump of the free lists in the allocator?  If something allocated
from the apr_allocator_t is being freed twice, it may be detectable
as a loop in one of the allocator->free[] lists.

Thanks,
Brian