You are viewing a plain text version of this content. The canonical link for it is here.

Posted to modules-dev@httpd.apache.org by Michael B Allen <io...@gmail.com> on 2009/10/16 18:28:33 UTC

Debugging: child process 14446 still did not exit, sending a SIGTERM

I have a customer who very occasionally sees apache workers hang. I'm
pretty sure this is caused by an errant module but I don't know which
one.

Is there any way to determine which module is causing Apache workers to hang?

Can I temporarily disable that SIGTERM so that I can have enough time
to attach GDB to the hanging processes?

Mike

Re: Debugging: child process 14446 still did not exit, sending a SIGTERM

Posted by Michael B Allen <io...@gmail.com>.

On Fri, Oct 16, 2009 at 8:04 PM, Chris Kukuchka <ch...@sequoiagroup.com> wrote:
> Michael B Allen wrote:
>>
>> Can I temporarily disable that SIGTERM so that I can have enough time
>> to attach GDB to the hanging processes?
>>
>
> Mike,
>
> The code which sends the SIGTERM is in mpm_common.c:
>
> static int reclaim_one_pid(pid_t pid, action_t action)
> {
> ...
> case SEND_SIGTERM:
>   /* ok, now it's being annoying */
>   ap_log_error(APLOG_MARK, APLOG_WARNING,
>                0, ap_server_conf,
>                "child process %" APR_PID_T_FMT
>                " still did not exit, "
>                "sending a SIGTERM",
>                pid);
>   kill(pid, SIGTERM);
>   break;
> ...
> }
>
> The time delay is in this table (also in mpm_common.c):
>
> void ap_reclaim_child_processes(int terminate)
> {
> ...
> struct {
>   action_t action;
>   apr_time_t action_time;
> } action_table[] = {
>   {DO_NOTHING, 0}, /* dummy entry for iterations where we re
>                     * children but take no action against
>                     * stragglers
>                     */
>   {SEND_SIGTERM, apr_time_from_sec(3)},
>   {SEND_SIGTERM, apr_time_from_sec(5)},
>   {SEND_SIGTERM, apr_time_from_sec(7)},
>   {SEND_SIGKILL, apr_time_from_sec(9)},
>   {GIVEUP,       apr_time_from_sec(10)}
> };
> ...
> }
>
> I am not certain, but I would guess changing mpm_common.c would require
> recompiling the full package.  Rather than go through that, you might first
> try using gdb to attach to a running lead Apache process and suppress that
> function.

Hi Chris,

Thanks for the references. At least I know it's hard coded and that
trying to backtrace a worker proc in 3 seconds would be completely
hopeless. And I don't see the customer recompiling anything. I'll just
have to find a different angle.

Mike

-- 
Michael B Allen
Java Active Directory Integration
http://www.ioplex.com/

Re: Debugging: child process 14446 still did not exit, sending a SIGTERM

Posted by Chris Kukuchka <ch...@sequoiagroup.com>.

Michael B Allen wrote:
> Can I temporarily disable that SIGTERM so that I can have enough time
> to attach GDB to the hanging processes?
>   

Mike,

The code which sends the SIGTERM is in mpm_common.c:

static int reclaim_one_pid(pid_t pid, action_t action)
{
...
case SEND_SIGTERM:
    /* ok, now it's being annoying */
    ap_log_error(APLOG_MARK, APLOG_WARNING,
                 0, ap_server_conf,
                 "child process %" APR_PID_T_FMT
                 " still did not exit, "
                 "sending a SIGTERM",
                 pid);
    kill(pid, SIGTERM);
    break;
...
}

The time delay is in this table (also in mpm_common.c):

void ap_reclaim_child_processes(int terminate)
{
...
struct {
    action_t action;
    apr_time_t action_time;
} action_table[] = {
    {DO_NOTHING, 0}, /* dummy entry for iterations where we re
                      * children but take no action against
                      * stragglers
                      */
    {SEND_SIGTERM, apr_time_from_sec(3)},
    {SEND_SIGTERM, apr_time_from_sec(5)},
    {SEND_SIGTERM, apr_time_from_sec(7)},
    {SEND_SIGKILL, apr_time_from_sec(9)},
    {GIVEUP,       apr_time_from_sec(10)}
};
...
}

I am not certain, but I would guess changing mpm_common.c would require 
recompiling the full package.  Rather than go through that, you might 
first try using gdb to attach to a running lead Apache process and 
suppress that function.

Regards,

Chris Kukuchka
Sequoia Group, Inc.


-------------------------------------------------------------------------------------------------------------------
REGISTER NOW! | Virtual Inforum 2009 | Online at a desktop near you. | October 20 - 21 | http://www.inforum2009.com
-------------------------------------------------------------------------------------------------------------------

Re: Debugging: child process 14446 still did not exit, sending a SIGTERM

Posted by Joe Lewis <jo...@joe-lewis.com>.

Michael B Allen wrote:
> On Fri, Oct 16, 2009 at 2:42 PM, Joe Lewis <jo...@joe-lewis.com> wrote:
>   
>> Michael B Allen wrote:
>>     
>>> On Fri, Oct 16, 2009 at 1:10 PM, Joe Lewis <jo...@joe-lewis.com> wrote:
>>>
>>>       
>>>> Michael B Allen wrote:
>>>>
>>>>         
>>>>> I have a customer who very occasionally sees apache workers hang. I'm
>>>>> pretty sure this is caused by an errant module but I don't know which
>>>>> one.
>>>>>
>>>>> Is there any way to determine which module is causing Apache workers to
>>>>> hang?
>>>>>
>>>>> Can I temporarily disable that SIGTERM so that I can have enough time
>>>>> to attach GDB to the hanging processes?
>>>>>
>>>>> Mike
>>>>>
>>>>>
>>>>>           
>>>> Perhaps run it in a non-forking mode (httpd -X -k start) inside of gdb
>>>> and
>>>> see what it hangs on?
>>>>
>>>>         
>>> If I run it in gdb like you suggest:
>>>
>>>  # gdb httpd
>>>  (gdb) run -X -k start
>>>
>>> I cannot get httpd to run module deinitialization. Meaning if I do
>>> apachectl stop or httpd -X -k stop or graceful-stop in another
>>> terminal, it just kills the whole process group. Since the problem is
>>> hanging during module deinitialization I don't think this is going to
>>> help me. How do I shutdown httpd so that it runs the module
>>> deinitialization routines?
>>>
>>> Otherwise does anyone have a web-svn pointer to the code that's
>>> calling the SIGTERM? Maybe I can find a way to disable it.
>>>
>>> Mike
>>>
>>>       
>> Disabling SIGTERM for apache would be akin to leaving the landing gear of
>> your airplane on the ground when you take off.  How are you going to
>> properly shutdown apache if you completely kill the SIGTERM signals?
>>     
>
> SIGTERM should not be used to stop processes. A process should
> complete gracefully and call exit(2). Normally, this is what httpd
> does. However if a child process takes too long, something is sending
> a SIGTERM to *kill* the process. I assume this is Apache since it's
> writing a message in error_log to that effect. This is what I want to
> disable. Meaning, if a child process hangs, I want it to just sit
> there stuck forever until an operator can login and attach gdb to it.
>
> If I could find that part of the code, I might find a directive that
> controls how long Apache waits before it sends the SIGTERM.
>
>   
>> The "deinitialization" - are you just not seeing the messages you'd normally
>> see?  Or did apache just terminate (which is normal in gdb, which causes the
>> gdb session to terminate as well).
>>     
>
> Right. I have an Apache module that writes to a separate log. When the
> module is deinitialized, information is written to the log. Without
> gdb, that information is correctly written to the log. When running in
> gdb, nothing is written to the log. It seems the entire process group
> is simply being killed. And thus the part of interest is not
> accessible.
>
> Mike
>   

The SIGTERMS are occurring because apache has already attempted to stop 
a process gracefully, and it isn't stopping.  Rather than endlessly try 
and "gracefully" shutdown a child process, apache will presume that the 
process is just not going to respond.

You can always try the worker MPM rather than the prefork MPM.

As it stands, from the sound of the problem and the rarity of it (your 
previous descriptions), you are going to be "hit and miss" on tracking 
it down.  You could potentially recompile all of the modules and apache 
itself (placing debug log lines in each one), but the problems may 
actually go away in that case.  Especially if you switch versions.

I do know that some distributions' versions of apache exhibited behavior 
similar to what you have described (specifically, SuSE), so I don't know 
if compiling a new version would alleviate the customer gripe.

I only have two real suggestions : strace the processes, and hope the 
hard drive is big enough to capture the output from strace until the 
problems are encountered, or try upgrading the version of Apache.

Joe
-- 
Joe Lewis
Chief Nerd 	SILVERHAWK <http://www.silverhawk.net/> 	

------------------------------------------------------------------------
/With every passing hour our solar system comes forty-three thousand 
miles closer to globular cluster 13 in the constellation Hercules, and 
still there are some misfits who continue to insist that there is no 
such thing as progress.
    --Ransom K. Ferm/

Re: Debugging: child process 14446 still did not exit, sending a SIGTERM

Posted by Michael B Allen <io...@gmail.com>.

On Fri, Oct 16, 2009 at 2:42 PM, Joe Lewis <jo...@joe-lewis.com> wrote:
> Michael B Allen wrote:
>>
>> On Fri, Oct 16, 2009 at 1:10 PM, Joe Lewis <jo...@joe-lewis.com> wrote:
>>
>>>
>>> Michael B Allen wrote:
>>>
>>>>
>>>> I have a customer who very occasionally sees apache workers hang. I'm
>>>> pretty sure this is caused by an errant module but I don't know which
>>>> one.
>>>>
>>>> Is there any way to determine which module is causing Apache workers to
>>>> hang?
>>>>
>>>> Can I temporarily disable that SIGTERM so that I can have enough time
>>>> to attach GDB to the hanging processes?
>>>>
>>>> Mike
>>>>
>>>>
>>>
>>> Perhaps run it in a non-forking mode (httpd -X -k start) inside of gdb
>>> and
>>> see what it hangs on?
>>>
>>
>> If I run it in gdb like you suggest:
>>
>>  # gdb httpd
>>  (gdb) run -X -k start
>>
>> I cannot get httpd to run module deinitialization. Meaning if I do
>> apachectl stop or httpd -X -k stop or graceful-stop in another
>> terminal, it just kills the whole process group. Since the problem is
>> hanging during module deinitialization I don't think this is going to
>> help me. How do I shutdown httpd so that it runs the module
>> deinitialization routines?
>>
>> Otherwise does anyone have a web-svn pointer to the code that's
>> calling the SIGTERM? Maybe I can find a way to disable it.
>>
>> Mike
>>
>
> Disabling SIGTERM for apache would be akin to leaving the landing gear of
> your airplane on the ground when you take off.  How are you going to
> properly shutdown apache if you completely kill the SIGTERM signals?

SIGTERM should not be used to stop processes. A process should
complete gracefully and call exit(2). Normally, this is what httpd
does. However if a child process takes too long, something is sending
a SIGTERM to *kill* the process. I assume this is Apache since it's
writing a message in error_log to that effect. This is what I want to
disable. Meaning, if a child process hangs, I want it to just sit
there stuck forever until an operator can login and attach gdb to it.

If I could find that part of the code, I might find a directive that
controls how long Apache waits before it sends the SIGTERM.

> The "deinitialization" - are you just not seeing the messages you'd normally
> see?  Or did apache just terminate (which is normal in gdb, which causes the
> gdb session to terminate as well).

Right. I have an Apache module that writes to a separate log. When the
module is deinitialized, information is written to the log. Without
gdb, that information is correctly written to the log. When running in
gdb, nothing is written to the log. It seems the entire process group
is simply being killed. And thus the part of interest is not
accessible.

Mike

Re: Debugging: child process 14446 still did not exit, sending a SIGTERM

Posted by Joe Lewis <jo...@joe-lewis.com>.

Michael B Allen wrote:
> On Fri, Oct 16, 2009 at 1:10 PM, Joe Lewis <jo...@joe-lewis.com> wrote:
>   
>> Michael B Allen wrote:
>>     
>>> I have a customer who very occasionally sees apache workers hang. I'm
>>> pretty sure this is caused by an errant module but I don't know which
>>> one.
>>>
>>> Is there any way to determine which module is causing Apache workers to
>>> hang?
>>>
>>> Can I temporarily disable that SIGTERM so that I can have enough time
>>> to attach GDB to the hanging processes?
>>>
>>> Mike
>>>
>>>       
>> Perhaps run it in a non-forking mode (httpd -X -k start) inside of gdb and
>> see what it hangs on?
>>     
>
> If I run it in gdb like you suggest:
>
>   # gdb httpd
>   (gdb) run -X -k start
>
> I cannot get httpd to run module deinitialization. Meaning if I do
> apachectl stop or httpd -X -k stop or graceful-stop in another
> terminal, it just kills the whole process group. Since the problem is
> hanging during module deinitialization I don't think this is going to
> help me. How do I shutdown httpd so that it runs the module
> deinitialization routines?
>
> Otherwise does anyone have a web-svn pointer to the code that's
> calling the SIGTERM? Maybe I can find a way to disable it.
>
> Mike
>   
Disabling SIGTERM for apache would be akin to leaving the landing gear 
of your airplane on the ground when you take off.  How are you going to 
properly shutdown apache if you completely kill the SIGTERM signals?

The "deinitialization" - are you just not seeing the messages you'd 
normally see?  Or did apache just terminate (which is normal in gdb, 
which causes the gdb session to terminate as well).

Two possibilities - gdb and attach to a currently running child (and 
hope you get lucky), or strace the processes.

-- 
Joe Lewis
Chief Nerd 	SILVERHAWK <http://www.silverhawk.net/> 	

------------------------------------------------------------------------
/The folly of intelligent people, clear-headed and narrow-visioned, has 
precipitated many catastrophies.
    --Alfred North Whitehead/

Re: Debugging: child process 14446 still did not exit, sending a SIGTERM

Posted by Michael B Allen <io...@gmail.com>.

On Fri, Oct 16, 2009 at 1:10 PM, Joe Lewis <jo...@joe-lewis.com> wrote:
> Michael B Allen wrote:
>>
>> I have a customer who very occasionally sees apache workers hang. I'm
>> pretty sure this is caused by an errant module but I don't know which
>> one.
>>
>> Is there any way to determine which module is causing Apache workers to
>> hang?
>>
>> Can I temporarily disable that SIGTERM so that I can have enough time
>> to attach GDB to the hanging processes?
>>
>> Mike
>>
>
> Perhaps run it in a non-forking mode (httpd -X -k start) inside of gdb and
> see what it hangs on?

If I run it in gdb like you suggest:

  # gdb httpd
  (gdb) run -X -k start

I cannot get httpd to run module deinitialization. Meaning if I do
apachectl stop or httpd -X -k stop or graceful-stop in another
terminal, it just kills the whole process group. Since the problem is
hanging during module deinitialization I don't think this is going to
help me. How do I shutdown httpd so that it runs the module
deinitialization routines?

Otherwise does anyone have a web-svn pointer to the code that's
calling the SIGTERM? Maybe I can find a way to disable it.

Mike

Re: Debugging: child process 14446 still did not exit, sending a SIGTERM

Posted by Joe Lewis <jo...@joe-lewis.com>.

Michael B Allen wrote:
> I have a customer who very occasionally sees apache workers hang. I'm
> pretty sure this is caused by an errant module but I don't know which
> one.
>
> Is there any way to determine which module is causing Apache workers to hang?
>
> Can I temporarily disable that SIGTERM so that I can have enough time
> to attach GDB to the hanging processes?
>
> Mike
>   
Perhaps run it in a non-forking mode (httpd -X -k start) inside of gdb 
and see what it hangs on?

Joe

-- 
Joe Lewis
Chief Nerd 	SILVERHAWK <http://www.silverhawk.net/> 	(801) 660-1900

------------------------------------------------------------------------
/Computers in the future may weigh no more than 1.5 tons."
    --JPopular Mechanics, forecasting the relentless march of science, 
1949/