You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Cliff Woolley <jw...@virginia.edu> on 2002/04/08 20:33:21 UTC

[PATCH] Re: the most common seg fault on daedalus

On Mon, 8 Apr 2002, Cliff Woolley wrote:

> Ha. ;-)  Actually I think I see a number of potential problems in
> apr/mmap/unix/mmap.c.  I'm working on a patch and I'll post it here.

Okay, there are two or three issues or potential issues addressed here.

(1) apr_mmap_dup() would register a cleanup on the *new_mmap even if the
new one was not the owner.  That turned out to be okay though because
mmap_cleanup would detect the !mm->is_owner case and just pretend
everything was cool.  It's just extra work that doesn't need to be done.

(2) In mmap_cleanup(), if munmap() fails for some reason, we would not set
mm->mm to -1, meaning if we ever tried to run the cleanup a second time
for some reason, we would not know we'd already tried to munmap() the
thing and we'd do it again.  This is the same kind of bug that killed us a
while back on directory cleanups if you'll recall.

(3) Finally (and this is, I believe, the source of our current segfaults):
okay, here's a weird sequence of events:

   - create an mmap
   - dup it (with or without transferring ownership)
   - cleanup or delete one copy
   - delete the other one

Because apr_mmap_delete() would assume that any APR_SUCCESS from
mmap_cleanup() meant "OK, go ahead and kill the cleanup", and
!mm->is_owner would return APR_SUCCESS, we would kill the cleanup even if
there wasn't one to kill.  Boom.  There are two or three related sequences
of events that cause this, but they all boil down to essentially the
above.

There are parallel problems in the win32 mmap code that will need patching
as well.

As a side note, the buckets code is okay because (and it even has a
comment to this effect), it assumes that apr_mmap_delete() will do the
Right Thing, and if the mmap is not owned or already deleted, it will just
be a no-op.

Greg, can you try this patch on daedalus for me? (sorry if there were line
wraps)

Thanks,
Cliff

Index: mmap.c
===================================================================
RCS file: /home/cvs/apr/mmap/unix/mmap.c,v
retrieving revision 1.40
diff -u -d -r1.40 mmap.c
--- mmap.c      13 Mar 2002 20:39:24 -0000      1.40
+++ mmap.c      8 Apr 2002 18:26:36 -0000
@@ -85,25 +85,21 @@
     apr_mmap_t *mm = themmap;
     int rv;

-    if (!mm->is_owner) {
-        return APR_SUCCESS;
+    if ((!mm->is_owner) || (mm->mm == (void *)-1)) {
+        /* XXX: we shouldn't ever get here */
+        return APR_ENOENT;
     }

 #ifdef BEOS
     rv = delete_area(mm->area);
-
-    if (rv == 0) {
-        mm->mm = (void *)-1;
-        return APR_SUCCESS;
-    }
 #else
     rv = munmap(mm->mm, mm->size);
+#endif
+    mm->mm = (void *)-1;

     if (rv == 0) {
-        mm->mm = (void *)-1;
         return APR_SUCCESS;
     }
-#endif
     return errno;
 }

@@ -189,24 +185,21 @@
             (*new_mmap)->is_owner = 1;
             old_mmap->is_owner = 0;
             apr_pool_cleanup_kill(old_mmap->cntxt, old_mmap, mmap_cleanup);
+            apr_pool_cleanup_register(p, *new_mmap, mmap_cleanup,
+                                      apr_pool_cleanup_null);
         }
         else {
             (*new_mmap)->is_owner = 0;
         }
-        apr_pool_cleanup_register(p, *new_mmap, mmap_cleanup,
-                                  apr_pool_cleanup_null);
     }
     return APR_SUCCESS;
 }

 APR_DECLARE(apr_status_t) apr_mmap_delete(apr_mmap_t *mm)
 {
-    apr_status_t rv;
+    apr_status_t rv = APR_SUCCESS;

-    if (mm->mm == (void *)-1)
-        return APR_ENOENT;
-
-    if ((rv = mmap_cleanup(mm)) == APR_SUCCESS) {
+    if (mm->is_owner && ((rv = mmap_cleanup(mm)) == APR_SUCCESS)) {
         apr_pool_cleanup_kill(mm->cntxt, mm, mmap_cleanup);
         return APR_SUCCESS;
     }

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: [PATCH] Re: the most common seg fault on daedalus

Posted by Cliff Woolley <jw...@virginia.edu>.

On Mon, 8 Apr 2002, Greg Ames wrote:

> hmmm, I wonder what changed since 2.0.32 to trigger the failure then?
> Maybe it's some of the internal_fast_redirect stuff, which affects this
> URL.

Could be.  It's most likely some combination of setaside being called and
then a brigade being cleaned up without all of the buckets being emptied
from it before-hand.  If the internal_fast_redirect stuff could cause that
(I suppose it possibly could), then....

> > Greg, can you try this patch on daedalus for me?
>
> Sure, no problem.  But it will probably be a while before it goes live.
> Last time I checked we had 415 active requests, and we've hit 600
> (ServerLimit) today.  Mondays typically are heavy load days, plus I
> think we're getting /.ed .

'kay.  In the meanwhile I'll try to pin down an exact request that will
cause it to segfault.

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: [PATCH] Re: the most common seg fault on daedalus

Posted by Greg Ames <gr...@remulak.net>.

Cliff Woolley wrote:

> As a side note, the buckets code is okay because (and it even has a
> comment to this effect), it assumes that apr_mmap_delete() will do the
> Right Thing, and if the mmap is not owned or already deleted, it will just
> be a no-op.

Sorry, I didn't notice before you pointed it out that the code involved left
bucket-land. 

hmmm, I wonder what changed since 2.0.32 to trigger the failure then?  Maybe
it's some of the internal_fast_redirect stuff, which affects this URL. 

> Greg, can you try this patch on daedalus for me? 

Sure, no problem.  But it will probably be a while before it goes live.  Last
time I checked we had 415 active requests, and we've hit 600 (ServerLimit)
today.  Mondays typically are heavy load days, plus I think we're getting /.ed
.  

Greg

Re: [PATCH] Re: the most common seg fault on daedalus

Posted by Cliff Woolley <jw...@virginia.edu>.

On Tue, 9 Apr 2002, Greg Ames wrote:

> But this morning we have a core dump
> (/usr/local/apache2.0.35c/corefiles/httpd.core.1).  The good news is
> that it doesn't look anything like the mmap cleanup dumps, so I think
> this patch is good.

Cool.  Yeah, I was pretty confident I had that one pinned down.  I'll go
ahead and do the corresponding Win32 patch.

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: [PATCH] Re: the most common seg fault on daedalus

Posted by Greg Ames <gr...@remulak.net>.

Greg Ames wrote:
> 
> Cliff Woolley wrote:
> 
> > Greg, can you try this patch on daedalus for me? (sorry if there were line
> > wraps)
> 
> OK, it's running on port 8092, and passes everything I know how/remember to
> throw at it in test mode.  I'll put it into production after band practice
> tonight when I have time to watch it for a while.  Hopefully it will be another
> Maytag Man experience.

We did go live with 2.0.35 + this patch last night.  I posted a message to that
effect from an unusual mail server but don't see it this AM.  It may have been
eaten by a spam filter or is waiting for the moderator.  The biggest excitement
while I was watching it was to see that the 2.0.35 download rate is about 33%
higher than 1.3.24.

But this morning we have a core dump
(/usr/local/apache2.0.35c/corefiles/httpd.core.1).  The good news is that it
doesn't look anything like the mmap cleanup dumps, so I think this patch is
good.  I'll post the info from the new dump in a separate thread.

Greg

Re: [PATCH] Re: the most common seg fault on daedalus

Posted by Greg Ames <gr...@remulak.net>.

Cliff Woolley wrote:

> Greg, can you try this patch on daedalus for me? (sorry if there were line
> wraps)

OK, it's running on port 8092, and passes everything I know how/remember to
throw at it in test mode.  I'll put it into production after band practice
tonight when I have time to watch it for a while.  Hopefully it will be another
Maytag Man experience.

Greg

p.s. I did have to hand apply the second half of the patch.  I couldn't see what
the mismatch was via eyeball.  It wasn't line wrap; it may have been differences
in whitespace; diff'ing the patches did highlight what looked like blank lines. 
No big deal.

Re: [PATCH] Re: the most common seg fault on daedalus

Posted by Cliff Woolley <jw...@virginia.edu>.

On Wed, 10 Apr 2002, Greg Ames wrote:

> We have another similar looking dump, with this patch on.  It's
> /usr/local/apache2.0.35c/corefiles/httpd.core.2
>
> #0  mmap_cleanup (themmap=0x8153ac0) at mmap.c:98
> #1  0x280cc8f0 in apr_mmap_delete (mm=0x8153ac0) at mmap.c:202

Ahh, this is different.

> Didn't I see a refcount inside the object that's being deleted?  If so,
> how can we trust it after the thing is deleted?

Nope, no refcount.  The bucket is refcounted, but not the mmap.  And the
is_owner should guarantee us that only _one_ bucket will ever try to
delete the thing and succeed.  The check for -1 should be saving us the
rest of the time... don't know what's going on.  Will investigate.

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: [PATCH] Re: the most common seg fault on daedalus

Posted by Justin Erenkrantz <je...@apache.org>.

On Mon, Apr 15, 2002 at 01:15:50PM -0400, Jeff Trawick wrote:
> What happens if we kill the cleanup on the apr_mmap_t when we create
> an mmap bucket?

FWIW, this is exactly the solution I mentioned to Cliff on IRC.

I think we need some way to kill *all* cleanups related to a
data variable?  Perhaps that would work without exposing each
cleanup function?  -- justin

Re: [PATCH] Re: the most common seg fault on daedalus

Posted by Cliff Woolley <jw...@virginia.edu>.

On Mon, 15 Apr 2002, Brian Pane wrote:

> I like Justin's suggestion: a generic function that removes all cleanups
> registered for a given object.
>
> In fact, we could even do this by overloading apr_pool_cleanup_kill() to
> allow NULL as the cleanup pointer, where NULL means "unregister all cleanups
> that match this object."

It could help in this case, but watch out though; when the cleanup gets
called, we have little to no time left before the apr_foo_t itself goes
away and we lose access to whatever resource it represented.  In some
cases we can make assumptions based on our knowledge of how pool cleanups
work internally, but especially with the changes proposed for
apr_allocator_* to have a freelist size limit and so forth, we could get
ourselves into trouble easily with a feature like this.

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: [PATCH] Re: the most common seg fault on daedalus

Posted by Brian Pane <br...@cnet.com>.

Jeff Trawick wrote:

>Cliff Woolley <jw...@virginia.edu> writes:
>
>>On 15 Apr 2002, Jeff Trawick wrote:
>>
>>>What happens if we kill the cleanup on the apr_mmap_t when we create
>>>an mmap bucket?
>>>
>>That would work [and would be the preferable solution as far as I'm
>>concerned], but there's currently no API to do it with.  To kill the
>>cleanup, you need access to the cleanup function itself, but it's static
>>to apr/mmap/*/mmap.c.
>>
>
>I know; I didn't want to clutter the attempt to find the right
>solution with such details :)
>
>I'm having a hard time thinking of a reasonable way to expose enough
>information so that the cleanup can be killed.  Some ugly helper
>function could be exported by APR.  Or maybe force the address of the
>cleanup function to be stored at offset 4 of the apr_mmap_t :)
>
>(I actually prefer the latter...  It buys us time until the
>hypothetical point where there are similar cleanup ordering problems
>with other parts of APR and we have to come up with a solution that
>solves today's problem and some new problem.)
>

I like Justin's suggestion: a generic function that removes all cleanups
registered for a given object.

In fact, we could even do this by overloading apr_pool_cleanup_kill() to
allow NULL as the cleanup pointer, where NULL means "unregister all cleanups
that match this object."

--Brian

Re: [PATCH] Re: the most common seg fault on daedalus

Posted by Cliff Woolley <jw...@virginia.edu>.

On 15 Apr 2002, Jeff Trawick wrote:

> I know; I didn't want to clutter the attempt to find the right
> solution with such details :)

Oh.  Oops.  :)

> I'm having a hard time thinking of a reasonable way to expose enough
> information so that the cleanup can be killed.  Some ugly helper
> function could be exported by APR.  Or maybe force the address of the
> cleanup function to be stored at offset 4 of the apr_mmap_t :)

I've been pondering that myself.  I can't say I'm a fan of the magic
address idea... this time.  =-]  I was wondering if we could find some
sensible addition to the semantics of apr_mmap_dup that would do the trick
for us.  Right now we have a flag that tells it whether to maintain
ownership or not.  We could have another value there that means "external
owner" or something like that.  I suppose that doesn't make any more
sense, though, than exporting a totally separate function.

BUT........................

None of this really gets to the bottom of the trouble.  Because in the
corefile I looked at, it wasn't just that we had already munmapped, it was
that the apr_mmap_t *itself* was not longer accessible.  I still haven't
figured out how that's possible with today's pool implementation, but
nevertheless, it happened.  Or so the corefile said.  No amount of cleanup
jiggery will fix that.  Perhaps this is indicative of something else
entirely: maybe a subrequest was run and the results of the subrequest
were not setaside into r->main's pool?  Could that explain any of this?

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: [PATCH] Re: the most common seg fault on daedalus

Posted by Jeff Trawick <tr...@attglobal.net>.

Cliff Woolley <jw...@virginia.edu> writes:

> On 15 Apr 2002, Jeff Trawick wrote:
> 
> > What happens if we kill the cleanup on the apr_mmap_t when we create
> > an mmap bucket?
> 
> That would work [and would be the preferable solution as far as I'm
> concerned], but there's currently no API to do it with.  To kill the
> cleanup, you need access to the cleanup function itself, but it's static
> to apr/mmap/*/mmap.c.

I know; I didn't want to clutter the attempt to find the right
solution with such details :)

I'm having a hard time thinking of a reasonable way to expose enough
information so that the cleanup can be killed.  Some ugly helper
function could be exported by APR.  Or maybe force the address of the
cleanup function to be stored at offset 4 of the apr_mmap_t :)

(I actually prefer the latter...  It buys us time until the
hypothetical point where there are similar cleanup ordering problems
with other parts of APR and we have to come up with a solution that
solves today's problem and some new problem.)

-- 
Jeff Trawick | trawick@attglobal.net
Born in Roswell... married an alien...

Re: [PATCH] Re: the most common seg fault on daedalus

Posted by Cliff Woolley <jw...@virginia.edu>.

On 15 Apr 2002, Jeff Trawick wrote:

> What happens if we kill the cleanup on the apr_mmap_t when we create
> an mmap bucket?

That would work [and would be the preferable solution as far as I'm
concerned], but there's currently no API to do it with.  To kill the
cleanup, you need access to the cleanup function itself, but it's static
to apr/mmap/*/mmap.c.

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: [PATCH] Re: the most common seg fault on daedalus

Posted by Jeff Trawick <tr...@attglobal.net>.

Cliff Woolley <jw...@virginia.edu> writes:

> Okay, so this is a totally different problem.  Tangentially related at
> best.
> 
> Here's the situation:
> 
>  1) create a brigade in pool p (call it r->pool)
>  2) create an mmap in the same pool or a subpool
>       (note: it might have to be a subpool, but I'm not sure)
>  3) put the mmap in an mmap bucket
>  4) clear pool p
> 
> Because the mmap was created *after* the brigade and pool cleanups get run
> in LIFO order, #4 implies the following:
> 
>   4a) mmap_cleanup in mmap.c will munmap the region and set mm->mm to -1
>       (note: the -1 is what was *supposed* to save us here)
>   4b) brigade_cleanup will destroy the mmap bucket via mmap_destroy from
>       apr_buckets_mmap.c.
>   4c) mmap_destroy discovers it's the last remaining reference to the
>       mmap, and so calls apr_mmap_delete
>   4d) boom

What happens if we kill the cleanup on the apr_mmap_t when we create
an mmap bucket?

-- 
Jeff Trawick | trawick@attglobal.net
Born in Roswell... married an alien...

Re: [PATCH] Re: the most common seg fault on daedalus

Posted by Cliff Woolley <jw...@virginia.edu>.

On Wed, 10 Apr 2002, Greg Ames wrote:

> We have another similar looking dump, with this patch on.  It's
> /usr/local/apache2.0.35c/corefiles/httpd.core.2
>
> #0  mmap_cleanup (themmap=0x8153ac0) at mmap.c:98
> #1  0x280cc8f0 in apr_mmap_delete (mm=0x8153ac0) at mmap.c:202
> #2  0x280ad926 in mmap_destroy (data=0x8131088) at apr_buckets_mmap.c:82
> #3  0x280adf08 in apr_brigade_cleanup (data=0x814c4f8) at apr_brigade.c:86
> #4  0x280adebe in brigade_cleanup (data=0x814c4f8) at apr_brigade.c:72
> #5  0x280cdd3b in run_cleanups (c=0x81393f8) at apr_pools.c:1713
> #6  0x280cd51c in apr_pool_destroy (pool=0x813f010) at apr_pools.c:638
> #7  0x805ea1c in ap_process_http_connection (c=0x812c120) at http_core.c:300

Okay, so this is a totally different problem.  Tangentially related at
best.

Here's the situation:

 1) create a brigade in pool p (call it r->pool)
 2) create an mmap in the same pool or a subpool
      (note: it might have to be a subpool, but I'm not sure)
 3) put the mmap in an mmap bucket
 4) clear pool p

Because the mmap was created *after* the brigade and pool cleanups get run
in LIFO order, #4 implies the following:

  4a) mmap_cleanup in mmap.c will munmap the region and set mm->mm to -1
      (note: the -1 is what was *supposed* to save us here)
  4b) brigade_cleanup will destroy the mmap bucket via mmap_destroy from
      apr_buckets_mmap.c.
  4c) mmap_destroy discovers it's the last remaining reference to the
      mmap, and so calls apr_mmap_delete
  4d) boom

The question is, on 4c, why does it fail?  Because even though we call
apr_mmap_delete, the mm->mm==-1 should cause apr_mmap_delete to fail with
APR_ENOENT rather than crashing.  But the kicker: mm itself is no longer
in accessible memory:

Program terminated with signal 11, Segmentation fault.
(gdb) frame 0
#0  mmap_cleanup (themmap=0x8153ac0) at mmap.c:98
98          mm->mm = (void *)-1;
(gdb) p mm
$1 = (apr_mmap_t *) 0x8153ac0
(gdb) p *mm
Cannot access memory at address 0x8153ac0.

Ouch.

So what can we do?

 Option 1: go back to having mmap_destroy in apr_buckets_mmap.c *not* call
           apr_mmap_delete().  But the reason we put it in there in the
           first place was to accommodate large file buckets that get
           apr_bucket_read(); we mmap the thing in like 4MB increments,
           but we only want 4MB mapped at a time, not the whole file.

    Note that having mmap_destroy call apr_mmap_delete works
    fine as long as you delete the mmap bucket before cleaning up
    the pool, or at least call apr_brigade_cleanup when you're done
    with the brigade rather than leaving it to the pool cleanup.
    But that defeats the purpose.  Ideally mmap_destroy would be
    able to figure out whether the apr_mmap_t had already been cleaned
    up, and if not, skip the apr_mmap_delete.  Which brings me to:

 Option 2: apr_bucket_mmap_make() could register a cleanup on the
           apr_mmap_t's pool that, when run, would do the Right Thing
           to account for the mmap having been cleaned up.  But what
           is the Right Thing?

    Option 2a: The Right Thing might be to just set a flag in the
               mmap bucket that says "this mmap is no longer valid,
               never try to access it again."  (But that's inconsistent
               with what pool buckets do, so...)

    Option 2b: The Right Thing might be to copy the data in the mmap
               to the heap and morph ourselves to a heap bucket just
               as the pool buckets do.  (Ouch. *)

    Option 2c: The Right Thing might be to morph ourselves to an
               immortal bucket containing the empty string [or an
               error metadata bucket of some type? that might be
               better]

    Note that an option that might come to mind that would NOT work is
    to just have the cleanup delete the mmap bucket from whatever brigade
    it's in.  That won't work simply because it's not safe -- the bucket
    has no way to know whether any functions might still be holding
    a pointer to it.


* I say ouch because that also means anytime we cleanup a brigade's pool
and the brigade still has pool buckets in it, if the data in the pool
bucket was allocated from the same pool as the brigade, we morph the pool
bucket to a heap bucket (copy the data) and immediately turn around and
destroy it again.  I can't think of an easy solution for this one.
Granted, it's not killer like the problem above, just suboptimal.


--Cliff


--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: [PATCH] Re: the most common seg fault on daedalus

Posted by Greg Ames <gr...@remulak.net>.

Cliff Woolley wrote:

> Greg, can you try this patch on daedalus for me? 

:-( 

We have another similar looking dump, with this patch on.  It's
/usr/local/apache2.0.35c/corefiles/httpd.core.2

#0  mmap_cleanup (themmap=0x8153ac0) at mmap.c:98
#1  0x280cc8f0 in apr_mmap_delete (mm=0x8153ac0) at mmap.c:202
#2  0x280ad926 in mmap_destroy (data=0x8131088) at apr_buckets_mmap.c:82
#3  0x280adf08 in apr_brigade_cleanup (data=0x814c4f8) at apr_brigade.c:86
#4  0x280adebe in brigade_cleanup (data=0x814c4f8) at apr_brigade.c:72
#5  0x280cdd3b in run_cleanups (c=0x81393f8) at apr_pools.c:1713
#6  0x280cd51c in apr_pool_destroy (pool=0x813f010) at apr_pools.c:638
#7  0x805ea1c in ap_process_http_connection (c=0x812c120) at http_core.c:300
#8  0x806ecf3 in ap_run_process_connection (c=0x812c120) at connection.c:85
#9  0x806f008 in ap_process_connection (c=0x812c120, csd=0x812c048)
    at connection.c:207
#10 0x80648f0 in child_main (child_num_arg=547) at prefork.c:671

Didn't I see a refcount inside the object that's being deleted?  If so, how can
we trust it after the thing is deleted?

Greg