You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modules-dev@httpd.apache.org by Michael B Allen <io...@gmail.com> on 2009/10/16 18:28:33 UTC
Debugging: child process 14446 still did not exit, sending a SIGTERM
I have a customer who very occasionally sees apache workers hang. I'm
pretty sure this is caused by an errant module but I don't know which
one.
Is there any way to determine which module is causing Apache workers to hang?
Can I temporarily disable that SIGTERM so that I can have enough time
to attach GDB to the hanging processes?
Mike
Re: Debugging: child process 14446 still did not exit, sending a
SIGTERM
Posted by Michael B Allen <io...@gmail.com>.
On Fri, Oct 16, 2009 at 8:04 PM, Chris Kukuchka <ch...@sequoiagroup.com> wrote:
> Michael B Allen wrote:
>>
>> Can I temporarily disable that SIGTERM so that I can have enough time
>> to attach GDB to the hanging processes?
>>
>
> Mike,
>
> The code which sends the SIGTERM is in mpm_common.c:
>
> static int reclaim_one_pid(pid_t pid, action_t action)
> {
> ...
> case SEND_SIGTERM:
> /* ok, now it's being annoying */
> ap_log_error(APLOG_MARK, APLOG_WARNING,
> 0, ap_server_conf,
> "child process %" APR_PID_T_FMT
> " still did not exit, "
> "sending a SIGTERM",
> pid);
> kill(pid, SIGTERM);
> break;
> ...
> }
>
> The time delay is in this table (also in mpm_common.c):
>
> void ap_reclaim_child_processes(int terminate)
> {
> ...
> struct {
> action_t action;
> apr_time_t action_time;
> } action_table[] = {
> {DO_NOTHING, 0}, /* dummy entry for iterations where we re
> * children but take no action against
> * stragglers
> */
> {SEND_SIGTERM, apr_time_from_sec(3)},
> {SEND_SIGTERM, apr_time_from_sec(5)},
> {SEND_SIGTERM, apr_time_from_sec(7)},
> {SEND_SIGKILL, apr_time_from_sec(9)},
> {GIVEUP, apr_time_from_sec(10)}
> };
> ...
> }
>
> I am not certain, but I would guess changing mpm_common.c would require
> recompiling the full package. Rather than go through that, you might first
> try using gdb to attach to a running lead Apache process and suppress that
> function.
Hi Chris,
Thanks for the references. At least I know it's hard coded and that
trying to backtrace a worker proc in 3 seconds would be completely
hopeless. And I don't see the customer recompiling anything. I'll just
have to find a different angle.
Mike
--
Michael B Allen
Java Active Directory Integration
http://www.ioplex.com/
Re: Debugging: child process 14446 still did not exit, sending a
SIGTERM
Posted by Chris Kukuchka <ch...@sequoiagroup.com>.
Michael B Allen wrote:
> Can I temporarily disable that SIGTERM so that I can have enough time
> to attach GDB to the hanging processes?
>
Mike,
The code which sends the SIGTERM is in mpm_common.c:
static int reclaim_one_pid(pid_t pid, action_t action)
{
...
case SEND_SIGTERM:
/* ok, now it's being annoying */
ap_log_error(APLOG_MARK, APLOG_WARNING,
0, ap_server_conf,
"child process %" APR_PID_T_FMT
" still did not exit, "
"sending a SIGTERM",
pid);
kill(pid, SIGTERM);
break;
...
}
The time delay is in this table (also in mpm_common.c):
void ap_reclaim_child_processes(int terminate)
{
...
struct {
action_t action;
apr_time_t action_time;
} action_table[] = {
{DO_NOTHING, 0}, /* dummy entry for iterations where we re
* children but take no action against
* stragglers
*/
{SEND_SIGTERM, apr_time_from_sec(3)},
{SEND_SIGTERM, apr_time_from_sec(5)},
{SEND_SIGTERM, apr_time_from_sec(7)},
{SEND_SIGKILL, apr_time_from_sec(9)},
{GIVEUP, apr_time_from_sec(10)}
};
...
}
I am not certain, but I would guess changing mpm_common.c would require
recompiling the full package. Rather than go through that, you might
first try using gdb to attach to a running lead Apache process and
suppress that function.
Regards,
Chris Kukuchka
Sequoia Group, Inc.
-------------------------------------------------------------------------------------------------------------------
REGISTER NOW! | Virtual Inforum 2009 | Online at a desktop near you. | October 20 - 21 | http://www.inforum2009.com
-------------------------------------------------------------------------------------------------------------------
Re: Debugging: child process 14446 still did not exit, sending a
SIGTERM
Posted by Joe Lewis <jo...@joe-lewis.com>.
Michael B Allen wrote:
> On Fri, Oct 16, 2009 at 2:42 PM, Joe Lewis <jo...@joe-lewis.com> wrote:
>
>> Michael B Allen wrote:
>>
>>> On Fri, Oct 16, 2009 at 1:10 PM, Joe Lewis <jo...@joe-lewis.com> wrote:
>>>
>>>
>>>> Michael B Allen wrote:
>>>>
>>>>
>>>>> I have a customer who very occasionally sees apache workers hang. I'm
>>>>> pretty sure this is caused by an errant module but I don't know which
>>>>> one.
>>>>>
>>>>> Is there any way to determine which module is causing Apache workers to
>>>>> hang?
>>>>>
>>>>> Can I temporarily disable that SIGTERM so that I can have enough time
>>>>> to attach GDB to the hanging processes?
>>>>>
>>>>> Mike
>>>>>
>>>>>
>>>>>
>>>> Perhaps run it in a non-forking mode (httpd -X -k start) inside of gdb
>>>> and
>>>> see what it hangs on?
>>>>
>>>>
>>> If I run it in gdb like you suggest:
>>>
>>> # gdb httpd
>>> (gdb) run -X -k start
>>>
>>> I cannot get httpd to run module deinitialization. Meaning if I do
>>> apachectl stop or httpd -X -k stop or graceful-stop in another
>>> terminal, it just kills the whole process group. Since the problem is
>>> hanging during module deinitialization I don't think this is going to
>>> help me. How do I shutdown httpd so that it runs the module
>>> deinitialization routines?
>>>
>>> Otherwise does anyone have a web-svn pointer to the code that's
>>> calling the SIGTERM? Maybe I can find a way to disable it.
>>>
>>> Mike
>>>
>>>
>> Disabling SIGTERM for apache would be akin to leaving the landing gear of
>> your airplane on the ground when you take off. How are you going to
>> properly shutdown apache if you completely kill the SIGTERM signals?
>>
>
> SIGTERM should not be used to stop processes. A process should
> complete gracefully and call exit(2). Normally, this is what httpd
> does. However if a child process takes too long, something is sending
> a SIGTERM to *kill* the process. I assume this is Apache since it's
> writing a message in error_log to that effect. This is what I want to
> disable. Meaning, if a child process hangs, I want it to just sit
> there stuck forever until an operator can login and attach gdb to it.
>
> If I could find that part of the code, I might find a directive that
> controls how long Apache waits before it sends the SIGTERM.
>
>
>> The "deinitialization" - are you just not seeing the messages you'd normally
>> see? Or did apache just terminate (which is normal in gdb, which causes the
>> gdb session to terminate as well).
>>
>
> Right. I have an Apache module that writes to a separate log. When the
> module is deinitialized, information is written to the log. Without
> gdb, that information is correctly written to the log. When running in
> gdb, nothing is written to the log. It seems the entire process group
> is simply being killed. And thus the part of interest is not
> accessible.
>
> Mike
>
The SIGTERMS are occurring because apache has already attempted to stop
a process gracefully, and it isn't stopping. Rather than endlessly try
and "gracefully" shutdown a child process, apache will presume that the
process is just not going to respond.
You can always try the worker MPM rather than the prefork MPM.
As it stands, from the sound of the problem and the rarity of it (your
previous descriptions), you are going to be "hit and miss" on tracking
it down. You could potentially recompile all of the modules and apache
itself (placing debug log lines in each one), but the problems may
actually go away in that case. Especially if you switch versions.
I do know that some distributions' versions of apache exhibited behavior
similar to what you have described (specifically, SuSE), so I don't know
if compiling a new version would alleviate the customer gripe.
I only have two real suggestions : strace the processes, and hope the
hard drive is big enough to capture the output from strace until the
problems are encountered, or try upgrading the version of Apache.
Joe
--
Joe Lewis
Chief Nerd SILVERHAWK <http://www.silverhawk.net/>
------------------------------------------------------------------------
/With every passing hour our solar system comes forty-three thousand
miles closer to globular cluster 13 in the constellation Hercules, and
still there are some misfits who continue to insist that there is no
such thing as progress.
--Ransom K. Ferm/
Re: Debugging: child process 14446 still did not exit, sending a
SIGTERM
Posted by Michael B Allen <io...@gmail.com>.
On Fri, Oct 16, 2009 at 2:42 PM, Joe Lewis <jo...@joe-lewis.com> wrote:
> Michael B Allen wrote:
>>
>> On Fri, Oct 16, 2009 at 1:10 PM, Joe Lewis <jo...@joe-lewis.com> wrote:
>>
>>>
>>> Michael B Allen wrote:
>>>
>>>>
>>>> I have a customer who very occasionally sees apache workers hang. I'm
>>>> pretty sure this is caused by an errant module but I don't know which
>>>> one.
>>>>
>>>> Is there any way to determine which module is causing Apache workers to
>>>> hang?
>>>>
>>>> Can I temporarily disable that SIGTERM so that I can have enough time
>>>> to attach GDB to the hanging processes?
>>>>
>>>> Mike
>>>>
>>>>
>>>
>>> Perhaps run it in a non-forking mode (httpd -X -k start) inside of gdb
>>> and
>>> see what it hangs on?
>>>
>>
>> If I run it in gdb like you suggest:
>>
>> # gdb httpd
>> (gdb) run -X -k start
>>
>> I cannot get httpd to run module deinitialization. Meaning if I do
>> apachectl stop or httpd -X -k stop or graceful-stop in another
>> terminal, it just kills the whole process group. Since the problem is
>> hanging during module deinitialization I don't think this is going to
>> help me. How do I shutdown httpd so that it runs the module
>> deinitialization routines?
>>
>> Otherwise does anyone have a web-svn pointer to the code that's
>> calling the SIGTERM? Maybe I can find a way to disable it.
>>
>> Mike
>>
>
> Disabling SIGTERM for apache would be akin to leaving the landing gear of
> your airplane on the ground when you take off. How are you going to
> properly shutdown apache if you completely kill the SIGTERM signals?
SIGTERM should not be used to stop processes. A process should
complete gracefully and call exit(2). Normally, this is what httpd
does. However if a child process takes too long, something is sending
a SIGTERM to *kill* the process. I assume this is Apache since it's
writing a message in error_log to that effect. This is what I want to
disable. Meaning, if a child process hangs, I want it to just sit
there stuck forever until an operator can login and attach gdb to it.
If I could find that part of the code, I might find a directive that
controls how long Apache waits before it sends the SIGTERM.
> The "deinitialization" - are you just not seeing the messages you'd normally
> see? Or did apache just terminate (which is normal in gdb, which causes the
> gdb session to terminate as well).
Right. I have an Apache module that writes to a separate log. When the
module is deinitialized, information is written to the log. Without
gdb, that information is correctly written to the log. When running in
gdb, nothing is written to the log. It seems the entire process group
is simply being killed. And thus the part of interest is not
accessible.
Mike
Re: Debugging: child process 14446 still did not exit, sending a
SIGTERM
Posted by Joe Lewis <jo...@joe-lewis.com>.
Michael B Allen wrote:
> On Fri, Oct 16, 2009 at 1:10 PM, Joe Lewis <jo...@joe-lewis.com> wrote:
>
>> Michael B Allen wrote:
>>
>>> I have a customer who very occasionally sees apache workers hang. I'm
>>> pretty sure this is caused by an errant module but I don't know which
>>> one.
>>>
>>> Is there any way to determine which module is causing Apache workers to
>>> hang?
>>>
>>> Can I temporarily disable that SIGTERM so that I can have enough time
>>> to attach GDB to the hanging processes?
>>>
>>> Mike
>>>
>>>
>> Perhaps run it in a non-forking mode (httpd -X -k start) inside of gdb and
>> see what it hangs on?
>>
>
> If I run it in gdb like you suggest:
>
> # gdb httpd
> (gdb) run -X -k start
>
> I cannot get httpd to run module deinitialization. Meaning if I do
> apachectl stop or httpd -X -k stop or graceful-stop in another
> terminal, it just kills the whole process group. Since the problem is
> hanging during module deinitialization I don't think this is going to
> help me. How do I shutdown httpd so that it runs the module
> deinitialization routines?
>
> Otherwise does anyone have a web-svn pointer to the code that's
> calling the SIGTERM? Maybe I can find a way to disable it.
>
> Mike
>
Disabling SIGTERM for apache would be akin to leaving the landing gear
of your airplane on the ground when you take off. How are you going to
properly shutdown apache if you completely kill the SIGTERM signals?
The "deinitialization" - are you just not seeing the messages you'd
normally see? Or did apache just terminate (which is normal in gdb,
which causes the gdb session to terminate as well).
Two possibilities - gdb and attach to a currently running child (and
hope you get lucky), or strace the processes.
--
Joe Lewis
Chief Nerd SILVERHAWK <http://www.silverhawk.net/>
------------------------------------------------------------------------
/The folly of intelligent people, clear-headed and narrow-visioned, has
precipitated many catastrophies.
--Alfred North Whitehead/
Re: Debugging: child process 14446 still did not exit, sending a
SIGTERM
Posted by Michael B Allen <io...@gmail.com>.
On Fri, Oct 16, 2009 at 1:10 PM, Joe Lewis <jo...@joe-lewis.com> wrote:
> Michael B Allen wrote:
>>
>> I have a customer who very occasionally sees apache workers hang. I'm
>> pretty sure this is caused by an errant module but I don't know which
>> one.
>>
>> Is there any way to determine which module is causing Apache workers to
>> hang?
>>
>> Can I temporarily disable that SIGTERM so that I can have enough time
>> to attach GDB to the hanging processes?
>>
>> Mike
>>
>
> Perhaps run it in a non-forking mode (httpd -X -k start) inside of gdb and
> see what it hangs on?
If I run it in gdb like you suggest:
# gdb httpd
(gdb) run -X -k start
I cannot get httpd to run module deinitialization. Meaning if I do
apachectl stop or httpd -X -k stop or graceful-stop in another
terminal, it just kills the whole process group. Since the problem is
hanging during module deinitialization I don't think this is going to
help me. How do I shutdown httpd so that it runs the module
deinitialization routines?
Otherwise does anyone have a web-svn pointer to the code that's
calling the SIGTERM? Maybe I can find a way to disable it.
Mike
Re: Debugging: child process 14446 still did not exit, sending a
SIGTERM
Posted by Joe Lewis <jo...@joe-lewis.com>.
Michael B Allen wrote:
> I have a customer who very occasionally sees apache workers hang. I'm
> pretty sure this is caused by an errant module but I don't know which
> one.
>
> Is there any way to determine which module is causing Apache workers to hang?
>
> Can I temporarily disable that SIGTERM so that I can have enough time
> to attach GDB to the hanging processes?
>
> Mike
>
Perhaps run it in a non-forking mode (httpd -X -k start) inside of gdb
and see what it hangs on?
Joe
--
Joe Lewis
Chief Nerd SILVERHAWK <http://www.silverhawk.net/> (801) 660-1900
------------------------------------------------------------------------
/Computers in the future may weigh no more than 1.5 tons."
--JPopular Mechanics, forecasting the relentless march of science,
1949/