You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modules-dev@httpd.apache.org by miim <xx...@yahoo.com.INVALID> on 2021/10/23 00:49:13 UTC

Chasing a segfault

I have a relatively simple module which is nonetheless causing Apache to intermittently segfault.

I've added debugging trace messages to be sent to the error log, but the lack of anything in the log at the time of the segfault leads me to think that the error log is not flushed when a message is sent.  For example, a segfault occurs at 00:18:04, last previous request was at 00:15:36, so clearly the new request caused the segfault.   But not even the "Here I am at the handler entry point" (see below) gets into the logfile before the server log reports a segfault taking down Apache.


  /* Retrieve the per-server configuration */
  mod_bc_config *bc_scfg = ap_get_module_config(r->server->module_config,
                                          &bridcheck_module);
  if (bc_scfg->bc_logdebug & 0x0020000000000)
     ap_log_rerror(APLOG_MARK, APLOG_NOTICE, 0, r,
                   "mod_bridcheck: Enter bridcheck_handler");


I could turn on core dumping but (a) I am no expert at decoding core dumps and (b) I don't want to dump this problem on somebody else.

So ... is there a way to force Apache to flush the error log before proceeding?


Re: Chasing a segfault

Posted by Eric Covener <co...@gmail.com>.
On Fri, Oct 22, 2021 at 8:49 PM miim <xx...@yahoo.com.invalid> wrote:
>
> I have a relatively simple module which is nonetheless causing Apache to intermittently segfault.
>
> I've added debugging trace messages to be sent to the error log, but the lack of anything in the log at the time of the segfault leads me to think that the error log is not flushed when a message is sent.  For example, a segfault occurs at 00:18:04, last previous request was at 00:15:36, so clearly the new request caused the segfault.   But not even the "Here I am at the handler entry point" (see below) gets into the logfile before the server log reports a segfault taking down Apache.
>
>
>   /* Retrieve the per-server configuration */
>   mod_bc_config *bc_scfg = ap_get_module_config(r->server->module_config,
>                                           &bridcheck_module);
>   if (bc_scfg->bc_logdebug & 0x0020000000000)
>      ap_log_rerror(APLOG_MARK, APLOG_NOTICE, 0, r,
>                    "mod_bridcheck: Enter bridcheck_handler");
>
>
> I could turn on core dumping but (a) I am no expert at decoding core dumps and (b) I don't want to dump this problem on somebody else.

I think this is probably the best way to go, even if you don't go
farther than the stack which is not a lot. It is a good skill to build
anyway.
An alternative is to get backtraces in the error_log:
https://emptyhammock.com/projects/httpd/diag/

> So ... is there a way to force Apache to flush the error log before proceeding?

It is probably memory corruption that clobbers some memory used by a
different module on a later request. I've not really seen buffering of
these kinds of messages at all in httpd.

On linux running the server with environment variable MALLOC_CHECK_=2
might get it to crash earlier/closer to the mismanagement. Or running
it under valgrind if you can do it in a low-load environment.

-- 
Eric Covener
covener@gmail.com

Re: Chasing a segfault

Posted by Sorin Manolache <so...@gmail.com>.
On 23/10/2021 02.49, miim wrote:
> I have a relatively simple module which is nonetheless causing Apache to intermittently segfault.
> 
> I've added debugging trace messages to be sent to the error log, but the lack of anything in the log at the time of the segfault leads me to think that the error log is not flushed when a message is sent.  For example, a segfault occurs at 00:18:04, last previous request was at 00:15:36, so clearly the new request caused the segfault.   But not even the "Here I am at the handler entry point" (see below) gets into the logfile before the server log reports a segfault taking down Apache.
> 
> 
>    /* Retrieve the per-server configuration */
>    mod_bc_config *bc_scfg = ap_get_module_config(r->server->module_config,
>                                            &bridcheck_module);
>    if (bc_scfg->bc_logdebug & 0x0020000000000)
>       ap_log_rerror(APLOG_MARK, APLOG_NOTICE, 0, r,
>                     "mod_bridcheck: Enter bridcheck_handler");
> 
> 
> I could turn on core dumping but (a) I am no expert at decoding core dumps and (b) I don't want to dump this problem on somebody else.
> 
> So ... is there a way to force Apache to flush the error log before proceeding?

Hello,

I think it is not a problem of log flushing. It is just that when a 
segfault occurs the death is sudden because the process is killed by the 
OS and has few chances to handle the error itself.

I am very confident, almost 100% sure, that if you don't see the message 
in the log then the execution has simply not reached it, the segfault 
happened before.

In my opinion it is easier to learn some four or five gdb commands than 
to do whatsoever when the segfault occurs. There's only one way of 
preventing the death of the process and that it to place a handler on 
the SIGSEGV signal in your module (see "man signal" or "man sigaction"). 
But there's not much you can do in the signal handler. As said, it is 
much much easier to activate coredumps and learn some commands.

Here's how I do it typically:

In Debian/Ubuntu distributions, they put a file named envvars in 
/etc/apache2. If you have such a distribution edit it as I show below. 
If not, then make sure you get the same effects with other means.

I put the following two lines:

ulimit -c unlimited
echo 1 > /proc/sys/kernel/core_uses_pid

The first line is an internal shell command saying that there should be 
no size limit on the core file. If you don't have /etc/apache2/envvars 
then this command should be executed in the shell from which you launch 
apache, such that the apache process inherits this configuration.

The second command instructs the kernel to add the process id to the 
name of the core file. Thus, if you have two apache children that dump 
cores at the same time, you'll get two different core files instead of 
single file in which the kernel writes both cores, and makes it thus 
unusable. If you don't have /etc/apache2/envvars then you can execute 
this command in any shell, just that you need root privileges in order 
to write to /proc/sys/kernel/core_uses_pid.

Let us assume you have now the core file and its name is core.12345, 
where 12345 is the process id of the apache child process that died.

Then I start gdb and I execute the following gdb commands at the gdb prompt:

file /usr/sbin/apache2
core-file core.12345
thread apply all bt

The first command loads the apache executable.
The second command loads the core file.
The thirst command displays the call stacks of all threads of the 
process (bt = backtrace).

You can switch between threads with the command
thread N

where N is the numerical id of the thread you want to switch to.

Once you're in a thread, you can move up and down the call stack with 
the commands "up" and "down". If you compiled your module with debug 
symbols then you can inspect variables with the "print" command, e.g. 
"print bc_scfg". If, for example, the segfault occurred somewhere in a 
libc function, such as malloc, free, strcpy, etc, you may move up the 
call chain to the caller of the libc function, to inspect its arguments.

Besides the necessary "-g" compiler switch for adding debugging symbols, 
I typically add the "-fno-inline -O0" switches. This prevents any code 
optimisation. When I execute step-by-step in a debugger (a live program, 
obviously, not a core-file) the instruction are really executed in the 
order written in the program and not rearranged for speed.

You may also debug a live program. "Normal" programs, when debugging, 
are typically launched directly in the debugger. This is not really 
advisable in apache, because it forks. What I do is to let apache start 
normally ("apache2ctl start" or "systemctl start apache2") and then 
attach the debugger to a live apache child process. I launch gdb, then I 
execute the following commands at the gdb prompt:

attach N (where N is the process id of the apache child)
break my_handler (set a breakpoint at one of my functions)
cont (let the process continue its execution until it reaches the 
breakpoint and I get the command prompt back)

When the breakpoint is reached I can inspect variables ("print 
variable") and execute step by step ("step" and "next").

HTH,
Sorin