You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Brian Pane <bp...@pacbell.net> on 2001/11/15 05:08:34 UTC

segv after client closes connection

I'm seeing a repeatable crash with the current CVS head.
Test case:
  * Prefork mpm on Linux
  * Run ab -c1 -n {some large number} {url}
  * While ab is running, kill it to cause a SIGPIPE
    in the httpd.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1024 (LWP 30860)]
0x4003cc96 in apr_pool_clear (a=0x80fea44) at apr_pools.c:957
957        free_blocks(a->first->h.next);

(gdb) where
#0  0x4003cc96 in apr_pool_clear (a=0x80fea44) at apr_pools.c:957
#1  0x0808c3c8 in core_output_filter (f=0x80f8d4c, b=0x0) at core.c:3220
#2  0x08085654 in ap_pass_brigade (next=0x80f8d4c, bb=0x80f909c)
    at util_filter.c:276
#3  0x08084083 in ap_flush_conn (c=0x80f8b24) at connection.c:142
#4  0x080840d5 in ap_lingering_close (dummy=0x80f8b14) at connection.c:179
#5  0x4003cb24 in run_cleanups (c=0x80f908c) at apr_pools.c:833
#6  0x4003cc7c in apr_pool_clear (a=0x80f8a14) at apr_pools.c:949
#7  0x080799df in child_main (child_num_arg=0) at prefork.c:598
#8  0x08079cc5 in make_child (s=0x80b0a2c, slot=0) at prefork.c:770
#9  0x08079f6a in perform_idle_server_maintenance (p=0x80af7cc)
    at prefork.c:911
#10 0x0807a27e in ap_mpm_run (_pconf=0x80af7cc, plog=0x80e396c, s=0x80b0a2c)
    at prefork.c:1069
#11 0x0807f21c in main (argc=1, argv=0xbffffa1c) at main.c:432
#12 0x40114177 in __libc_start_main (main=0x807ecdc <main>, argc=1,
    ubp_av=0xbffffa1c, init=0x805c950 <_init>, fini=0x8096440 <_fini>,
    rtld_fini=0x4000e184 <_dl_fini>, stack_end=0xbffffa0c)
    at ../sysdeps/generic/libc-start.c:129

(gdb) print *a
$1 = {first = 0x0, last = 0x80fea38, cleanups = 0x0, subprocesses = 0x0,
  sub_pools = 0x0, sub_next = 0x813bb94, sub_prev = 0x0, parent = 
0x80f8a14,
  free_first_avail = 0x80fea74 "Dê\017\bxê\017\bxê\017\b", apr_abort = 0,
  prog_data = 0x0}

(gdb)  print *a->parent
$2 = {first = 0x80f8a08, last = 0x80f8a08, cleanups = 0x80f908c,
  subprocesses = 0x0, sub_pools = 0x0, sub_next = 0x0, sub_prev = 0x0,
  parent = 0x80e798c, free_first_avail = 0x80f8a44 "\024\212\017\b\t",
  apr_abort = 0, prog_data = 0x0}
(gdb)

Re: segv after client closes connection

Posted by Ryan Bloom <rb...@covalent.net>.

On Wednesday 14 November 2001 10:25 pm, Cliff Woolley wrote:
> On Wed, 14 Nov 2001, Ryan Bloom wrote:
> > Okay, I found a patch that solves the problem for me.  Cliff, please
> > test and let me know if it fixes your problem.  I am also not sure
> > that I like this patch.  But, I will post it here and let people
> > comment.
>
> Sorry... it fixes that problem, but other stuff is acting weird now.
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x401408c0 in mem_ctrl () from /usr/lib/libcrypto.so.0.9.6
> (gdb) bt
> #0  0x401408c0 in mem_ctrl () from /usr/lib/libcrypto.so.0.9.6
> #1  0x401affd4 in __DTOR_END__ () from /usr/lib/libcrypto.so.0.9.6
> Cannot access memory at address 0x1
> (gdb)
>
> This during one of the mod_ssl tests, obviously.

Okay.  I have a few other thoughts that I will look into first thing tomorrow.

Ryan

______________________________________________________________
Ryan Bloom				rbb@apache.org
Covalent Technologies			rbb@covalent.net
--------------------------------------------------------------

Re: segv after client closes connection

Posted by Cliff Woolley <cl...@yahoo.com>.

On Wed, 14 Nov 2001, Ryan Bloom wrote:

> Okay, I found a patch that solves the problem for me.  Cliff, please
> test and let me know if it fixes your problem.  I am also not sure
> that I like this patch.  But, I will post it here and let people
> comment.

Sorry... it fixes that problem, but other stuff is acting weird now.

Program received signal SIGSEGV, Segmentation fault.
0x401408c0 in mem_ctrl () from /usr/lib/libcrypto.so.0.9.6
(gdb) bt
#0  0x401408c0 in mem_ctrl () from /usr/lib/libcrypto.so.0.9.6
#1  0x401affd4 in __DTOR_END__ () from /usr/lib/libcrypto.so.0.9.6
Cannot access memory at address 0x1
(gdb)

This during one of the mod_ssl tests, obviously.

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: segv after client closes connection

Posted by Cliff Woolley <cl...@yahoo.com>.

On Wed, 14 Nov 2001, Ryan Bloom wrote:

> I couldn't decide if it was bad mojo or not, which is why I didn't
> just commit it.  The structures will still be valid BTW, the only real
> difference is that we run cleanups before we destroy memory.

Not exactly.  Before, the subpools were being destroyed before the current
pool's cleanups would be run.  So if you allocate some thing (foo) out of
the subpool that depends upon a thing (bar) that is allocated out of the
parent pool, when you run the cleanup on foo it probably needs bar to
still exist for the cleanup to work.  With the way it was before, that
guarantee would be met, since we recurse before we cleanup ourselves.
But with this patch, we clean up ourselves before we recurse, so bar would
be cleaned up before foo, and foo's cleanup could quite likely barf.

> The other solution may be to remove the sub-pool from the
> core_output_filter. I need to understand why we have it first though.

I'll have to look more at that, too... it does seem like a possible
alternative.

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: segv after client closes connection

Posted by Ryan Bloom <rb...@covalent.net>.

On Wednesday 14 November 2001 09:52 pm, Cliff Woolley wrote:
> On Wed, 14 Nov 2001, Ryan Bloom wrote:
> > Okay, I found a patch that solves the problem for me.  Cliff, please
> > test and let me know if it fixes your problem.  I am also not sure
> > that I like this patch.  But, I will post it here and let people
> > comment.
>
> I'm running the tests on this.  But doesn't it break some of the
> guarantees that pools make?  Normally, you'd expect in a subpool cleanup
> that structures allocated in the parent pool would still be valid.  With
> this patch, that's no longer the case.  Seems like bad mojo to me...

I couldn't decide if it was bad mojo or not, which is why I didn't just commit
it.  The structures will still be valid BTW, the only real difference is that we
run cleanups before we destroy memory.

The other solution may be to remove the sub-pool from the core_output_filter.
I need to understand why we have it first though.

Ryan

______________________________________________________________
Ryan Bloom				rbb@apache.org
Covalent Technologies			rbb@covalent.net
--------------------------------------------------------------

Re: segv after client closes connection

Posted by Cliff Woolley <cl...@yahoo.com>.

On Wed, 14 Nov 2001, Ryan Bloom wrote:

> Okay, I found a patch that solves the problem for me.  Cliff, please
> test and let me know if it fixes your problem.  I am also not sure
> that I like this patch.  But, I will post it here and let people
> comment.

I'm running the tests on this.  But doesn't it break some of the
guarantees that pools make?  Normally, you'd expect in a subpool cleanup
that structures allocated in the parent pool would still be valid.  With
this patch, that's no longer the case.  Seems like bad mojo to me...

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: segv after client closes connection

Posted by Ryan Bloom <rb...@covalent.net>.

Okay, I found a patch that solves the problem for me.  Cliff, please test and
let me know if it fixes your problem.  I am also not sure that I like this
patch.  But, I will post it here and let people comment.

Index: memory/unix/apr_pools.c
===================================================================
RCS file: /home/cvs/apr/memory/unix/apr_pools.c,v
retrieving revision 1.115
diff -u -d -b -w -u -r1.115 apr_pools.c
--- memory/unix/apr_pools.c	2001/10/29 14:54:19	1.115
+++ memory/unix/apr_pools.c	2001/11/15 04:52:01
@@ -939,15 +939,15 @@
  */
 APR_DECLARE(void) apr_pool_clear(apr_pool_t *a)
 {
+    /* run cleanups and free any subprocesses. */
+    run_cleanups(a->cleanups);
+    a->cleanups = NULL;
     /* free the subpools. we can just loop -- the subpools will detach
        themselve from us, so this is easy. */
     while (a->sub_pools) {
 	apr_pool_destroy(a->sub_pools);
     }
 
-    /* run cleanups and free any subprocesses. */
-    run_cleanups(a->cleanups);
-    a->cleanups = NULL;
     free_proc_chain(a->subprocesses);
     a->subprocesses = NULL;
 

Ryan

On Wednesday 14 November 2001 08:53 pm, Cliff Woolley wrote:
> On Wed, 14 Nov 2001, Ryan Bloom wrote:
> > Essentially, what is happening, is that we register a cleanup on the
> > connection pool to call lingering_close.  We have a sub-pool from the
> > connection pool in the core_output_filter.  The bug happens anytime we
> > have not written all of the data when we call lingering_close.  Because
> > we clear all sub-pools before calling the cleanups, we end up calling
> > into the core_output_filter, looking for a sub-pool that doesn't exist
> > anymore.
>
> Yeah, that's what I reported two days ago.  ;)
>
> > I'm still looking for solutions to the bug.
>
> There probably needs to be some way for the core_output_filter to know its
> ctx is no longer valid.  Maybe have the core_output_filter register a
> cleanup on the subpool that will NULL out the ctx->subpool pointer when
> the subpool goes away?
>
> > > telnet localhost 80
> > > CONNECT www.google.com HTTP/1.0
>
> FYI, you'll also run into this bug if you have the latest LWP perl package
> installed when running test #2 of limits.t from the test suite.

______________________________________________________________
Ryan Bloom				rbb@apache.org
Covalent Technologies			rbb@covalent.net
--------------------------------------------------------------

Re: segv after client closes connection

Posted by Ryan Bloom <rb...@covalent.net>.

On Wednesday 14 November 2001 08:53 pm, Cliff Woolley wrote:
> On Wed, 14 Nov 2001, Ryan Bloom wrote:
> > Essentially, what is happening, is that we register a cleanup on the
> > connection pool to call lingering_close.  We have a sub-pool from the
> > connection pool in the core_output_filter.  The bug happens anytime we
> > have not written all of the data when we call lingering_close.  Because
> > we clear all sub-pools before calling the cleanups, we end up calling
> > into the core_output_filter, looking for a sub-pool that doesn't exist
> > anymore.
>
> Yeah, that's what I reported two days ago.  ;)
>
> > I'm still looking for solutions to the bug.
>
> There probably needs to be some way for the core_output_filter to know its
> ctx is no longer valid.  Maybe have the core_output_filter register a
> cleanup on the subpool that will NULL out the ctx->subpool pointer when
> the subpool goes away?

The thing is, we can't just NULL out the ctx->subpool pointer.  Therer is real
data there.  I may have a solution within the hour.

Ryan
______________________________________________________________
Ryan Bloom				rbb@apache.org
Covalent Technologies			rbb@covalent.net
--------------------------------------------------------------

Re: segv after client closes connection

Posted by Ryan Bloom <rb...@covalent.net>.

Okay, I found a patch that solves the problem for me.  Cliff, please test and
let me know if it fixes your problem.  I am also not sure that I like this
patch.  But, I will post it here and let people comment.

Index: memory/unix/apr_pools.c
===================================================================
RCS file: /home/cvs/apr/memory/unix/apr_pools.c,v
retrieving revision 1.115
diff -u -d -b -w -u -r1.115 apr_pools.c
--- memory/unix/apr_pools.c	2001/10/29 14:54:19	1.115
+++ memory/unix/apr_pools.c	2001/11/15 04:52:01
@@ -939,15 +939,15 @@
  */
 APR_DECLARE(void) apr_pool_clear(apr_pool_t *a)
 {
+    /* run cleanups and free any subprocesses. */
+    run_cleanups(a->cleanups);
+    a->cleanups = NULL;
     /* free the subpools. we can just loop -- the subpools will detach
        themselve from us, so this is easy. */
     while (a->sub_pools) {
 	apr_pool_destroy(a->sub_pools);
     }
 
-    /* run cleanups and free any subprocesses. */
-    run_cleanups(a->cleanups);
-    a->cleanups = NULL;
     free_proc_chain(a->subprocesses);
     a->subprocesses = NULL;
 

Ryan

On Wednesday 14 November 2001 08:53 pm, Cliff Woolley wrote:
> On Wed, 14 Nov 2001, Ryan Bloom wrote:
> > Essentially, what is happening, is that we register a cleanup on the
> > connection pool to call lingering_close.  We have a sub-pool from the
> > connection pool in the core_output_filter.  The bug happens anytime we
> > have not written all of the data when we call lingering_close.  Because
> > we clear all sub-pools before calling the cleanups, we end up calling
> > into the core_output_filter, looking for a sub-pool that doesn't exist
> > anymore.
>
> Yeah, that's what I reported two days ago.  ;)
>
> > I'm still looking for solutions to the bug.
>
> There probably needs to be some way for the core_output_filter to know its
> ctx is no longer valid.  Maybe have the core_output_filter register a
> cleanup on the subpool that will NULL out the ctx->subpool pointer when
> the subpool goes away?
>
> > > telnet localhost 80
> > > CONNECT www.google.com HTTP/1.0
>
> FYI, you'll also run into this bug if you have the latest LWP perl package
> installed when running test #2 of limits.t from the test suite.

______________________________________________________________
Ryan Bloom				rbb@apache.org
Covalent Technologies			rbb@covalent.net
--------------------------------------------------------------

Re: segv after client closes connection

Posted by Cliff Woolley <cl...@yahoo.com>.

On Wed, 14 Nov 2001, Ryan Bloom wrote:

> Essentially, what is happening, is that we register a cleanup on the
> connection pool to call lingering_close.  We have a sub-pool from the
> connection pool in the core_output_filter.  The bug happens anytime we
> have not written all of the data when we call lingering_close.  Because
> we clear all sub-pools before calling the cleanups, we end up calling
> into the core_output_filter, looking for a sub-pool that doesn't exist
> anymore.

Yeah, that's what I reported two days ago.  ;)

> I'm still looking for solutions to the bug.

There probably needs to be some way for the core_output_filter to know its
ctx is no longer valid.  Maybe have the core_output_filter register a
cleanup on the subpool that will NULL out the ctx->subpool pointer when
the subpool goes away?

> > telnet localhost 80
> > CONNECT www.google.com HTTP/1.0

FYI, you'll also run into this bug if you have the latest LWP perl package
installed when running test #2 of limits.t from the test suite.

--Cliff

--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA

Re: segv after client closes connection

Posted by Ryan Bloom <rb...@covalent.net>.

On Wednesday 14 November 2001 08:22 pm, Ryan Bloom wrote:

Okay, I have isolated this bug finally.  This is a bit of a stickler.

Essentially, what is happening, is that we register a cleanup on the
connection pool to call lingering_close.  We have a sub-pool from the
connection pool in the core_output_filter.  The bug happens anytime we
have not written all of the data when we call lingering_close.  Because
we clear all sub-pools before calling the cleanups, we end up calling
into the core_output_filter, looking for a sub-pool that doesn't exist
anymore.

I'm still looking for solutions to the bug.

Ryan

> On Wednesday 14 November 2001 08:08 pm, Brian Pane wrote:
>
> I have an even more repeatable case.  :-)
>
> telnet localhost 80
> CONNECT www.google.com HTTP/1.0
>
>
> This always produces the same segfault.  This is on my list for tonight.
>
> Ryan
>
> > I'm seeing a repeatable crash with the current CVS head.
> > Test case:
> >   * Prefork mpm on Linux
> >   * Run ab -c1 -n {some large number} {url}
> >   * While ab is running, kill it to cause a SIGPIPE
> >     in the httpd.
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > [Switching to Thread 1024 (LWP 30860)]
> > 0x4003cc96 in apr_pool_clear (a=0x80fea44) at apr_pools.c:957
> > 957        free_blocks(a->first->h.next);
> >
> > (gdb) where
> > #0  0x4003cc96 in apr_pool_clear (a=0x80fea44) at apr_pools.c:957
> > #1  0x0808c3c8 in core_output_filter (f=0x80f8d4c, b=0x0) at core.c:3220
> > #2  0x08085654 in ap_pass_brigade (next=0x80f8d4c, bb=0x80f909c)
> >     at util_filter.c:276
> > #3  0x08084083 in ap_flush_conn (c=0x80f8b24) at connection.c:142
> > #4  0x080840d5 in ap_lingering_close (dummy=0x80f8b14) at
> > connection.c:179 #5  0x4003cb24 in run_cleanups (c=0x80f908c) at
> > apr_pools.c:833
> > #6  0x4003cc7c in apr_pool_clear (a=0x80f8a14) at apr_pools.c:949
> > #7  0x080799df in child_main (child_num_arg=0) at prefork.c:598
> > #8  0x08079cc5 in make_child (s=0x80b0a2c, slot=0) at prefork.c:770
> > #9  0x08079f6a in perform_idle_server_maintenance (p=0x80af7cc)
> >     at prefork.c:911
> > #10 0x0807a27e in ap_mpm_run (_pconf=0x80af7cc, plog=0x80e396c,
> > s=0x80b0a2c) at prefork.c:1069
> > #11 0x0807f21c in main (argc=1, argv=0xbffffa1c) at main.c:432
> > #12 0x40114177 in __libc_start_main (main=0x807ecdc <main>, argc=1,
> >     ubp_av=0xbffffa1c, init=0x805c950 <_init>, fini=0x8096440 <_fini>,
> >     rtld_fini=0x4000e184 <_dl_fini>, stack_end=0xbffffa0c)
> >     at ../sysdeps/generic/libc-start.c:129
> >
> > (gdb) print *a
> > $1 = {first = 0x0, last = 0x80fea38, cleanups = 0x0, subprocesses = 0x0,
> >   sub_pools = 0x0, sub_next = 0x813bb94, sub_prev = 0x0, parent =
> > 0x80f8a14,
> >   free_first_avail = 0x80fea74 "Dê\017\bxê\017\bxê\017\b", apr_abort = 0,
> >   prog_data = 0x0}
> >
> > (gdb)  print *a->parent
> > $2 = {first = 0x80f8a08, last = 0x80f8a08, cleanups = 0x80f908c,
> >   subprocesses = 0x0, sub_pools = 0x0, sub_next = 0x0, sub_prev = 0x0,
> >   parent = 0x80e798c, free_first_avail = 0x80f8a44 "\024\212\017\b\t",
> >   apr_abort = 0, prog_data = 0x0}
> > (gdb)

-- 

______________________________________________________________
Ryan Bloom				rbb@apache.org
Covalent Technologies			rbb@covalent.net
--------------------------------------------------------------

Re: segv after client closes connection

Posted by Ryan Bloom <rb...@covalent.net>.

On Wednesday 14 November 2001 08:08 pm, Brian Pane wrote:

I have an even more repeatable case.  :-)

telnet localhost 80
CONNECT www.google.com HTTP/1.0


This always produces the same segfault.  This is on my list for tonight.

Ryan

> I'm seeing a repeatable crash with the current CVS head.
> Test case:
>   * Prefork mpm on Linux
>   * Run ab -c1 -n {some large number} {url}
>   * While ab is running, kill it to cause a SIGPIPE
>     in the httpd.
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 1024 (LWP 30860)]
> 0x4003cc96 in apr_pool_clear (a=0x80fea44) at apr_pools.c:957
> 957        free_blocks(a->first->h.next);
>
> (gdb) where
> #0  0x4003cc96 in apr_pool_clear (a=0x80fea44) at apr_pools.c:957
> #1  0x0808c3c8 in core_output_filter (f=0x80f8d4c, b=0x0) at core.c:3220
> #2  0x08085654 in ap_pass_brigade (next=0x80f8d4c, bb=0x80f909c)
>     at util_filter.c:276
> #3  0x08084083 in ap_flush_conn (c=0x80f8b24) at connection.c:142
> #4  0x080840d5 in ap_lingering_close (dummy=0x80f8b14) at connection.c:179
> #5  0x4003cb24 in run_cleanups (c=0x80f908c) at apr_pools.c:833
> #6  0x4003cc7c in apr_pool_clear (a=0x80f8a14) at apr_pools.c:949
> #7  0x080799df in child_main (child_num_arg=0) at prefork.c:598
> #8  0x08079cc5 in make_child (s=0x80b0a2c, slot=0) at prefork.c:770
> #9  0x08079f6a in perform_idle_server_maintenance (p=0x80af7cc)
>     at prefork.c:911
> #10 0x0807a27e in ap_mpm_run (_pconf=0x80af7cc, plog=0x80e396c,
> s=0x80b0a2c) at prefork.c:1069
> #11 0x0807f21c in main (argc=1, argv=0xbffffa1c) at main.c:432
> #12 0x40114177 in __libc_start_main (main=0x807ecdc <main>, argc=1,
>     ubp_av=0xbffffa1c, init=0x805c950 <_init>, fini=0x8096440 <_fini>,
>     rtld_fini=0x4000e184 <_dl_fini>, stack_end=0xbffffa0c)
>     at ../sysdeps/generic/libc-start.c:129
>
> (gdb) print *a
> $1 = {first = 0x0, last = 0x80fea38, cleanups = 0x0, subprocesses = 0x0,
>   sub_pools = 0x0, sub_next = 0x813bb94, sub_prev = 0x0, parent =
> 0x80f8a14,
>   free_first_avail = 0x80fea74 "Dê\017\bxê\017\bxê\017\b", apr_abort = 0,
>   prog_data = 0x0}
>
> (gdb)  print *a->parent
> $2 = {first = 0x80f8a08, last = 0x80f8a08, cleanups = 0x80f908c,
>   subprocesses = 0x0, sub_pools = 0x0, sub_next = 0x0, sub_prev = 0x0,
>   parent = 0x80e798c, free_first_avail = 0x80f8a44 "\024\212\017\b\t",
>   apr_abort = 0, prog_data = 0x0}
> (gdb)

-- 

______________________________________________________________
Ryan Bloom				rbb@apache.org
Covalent Technologies			rbb@covalent.net
--------------------------------------------------------------