You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Martin Gregorie <ma...@gregorie.org> on 2009/10/03 01:42:53 UTC
SIGCHLD query
What causes a spamd 3.2.5 child process to be terminated by receiving a
SIGCHLD signal?
I've looked at the spamc and spamd manpages but there's no mention of
them there. I can't remember seeing them discussed on this maillist
either.
My last month's logs show 7 of them and I can't work out what caused
them to be sent. However, Jose Luis Marin Perez' system is seeing a lot
of them - on the order of 10% of messages scanned are getting hit by
them, though his seem to be connected with very long running scans.
So, what do these signals mean and what should I do to my SA
configuration to get rid of them.
Martin
Re: SIGCHLD query
Posted by Martin Gregorie <ma...@gregorie.org>.
On Wed, 2009-10-07 at 14:31 +0200, Per Jessen wrote:
> Okay, I ran a check on my logs since midnight - yes, I also see a lot of
> child processes running for less than 10secs, in fact slightly more
> than 50%. Interesting issue.
>
Here's the results of a scan across all my mail logs:
Processing file /var/log/maillog*
3544 Messages found
3538 Results (99.8%)
6 SIGCHLDs caught (0.2%)
min avg max
Message size: 353 7340 496682
Scan time (secs): 0.5 2.3 34.5
I've checked all the SIGCHLD log lines. The previuous scan by those
children were all in the range 1.- to 3.1 seconds. I'm using the default
child population and the default --timeout-child of 300 secs.
Martin
Re: SIGCHLD query
Posted by Per Jessen <pe...@computer.org>.
Per Jessen wrote:
> Martin Gregorie wrote:
>>> Yeah - maybe there is some indication in the log? I think there is
>>> a switch that determines how many emails a child will process before
>>> needing restart. (just looked it up: --max-conn-per-child)
>>> I just checked my logs, during the last 9 hours I have 6016 of
>>> these:
>>>
>>> spamd[11362]: spamd: handled cleanup of child pid 14010 due to
>>> SIGCHLD
>>>
>>> Is that the one you mean?
>>>
>> That's the only log message I've seen. Sometimes you can associate it
>> with a scan that exceeded --timeout-child seconds and sometimes, much
>> more rarely, it happens after a scan taking two or three seconds.
>
> I don't know if that is happening on my systems too, I haven't
> checked.
Okay, I ran a check on my logs since midnight - yes, I also see a lot of
child processes running for less than 10secs, in fact slightly more
than 50%. Interesting issue.
/Per Jessen, Zürich
Re: SIGCHLD query
Posted by Per Jessen <pe...@computer.org>.
Martin Gregorie wrote:
>> Yeah - maybe there is some indication in the log? I think there is a
>> switch that determines how many emails a child will process before
>> needing restart. (just looked it up: --max-conn-per-child)
>> I just checked my logs, during the last 9 hours I have 6016 of these:
>>
>> spamd[11362]: spamd: handled cleanup of child pid 14010 due to SIGCHLD
>>
>> Is that the one you mean?
>>
> That's the only log message I've seen. Sometimes you can associate it
> with a scan that exceeded --timeout-child seconds and sometimes, much
> more rarely, it happens after a scan taking two or three seconds.
I don't know if that is happening on my systems too, I haven't checked.
I wonder if the latter could be caused by the maintenance of spare
child processes?
>> There are also arguments for controlling minimum/maximum number of spare
>> child processes - if your load varies, and you have a significant
>> difference between min and max, I could see that leading to more child
>> processes stopping and starting.
>>
> Does the parent or the child determine whether the child stays alive
> after completing a scan or whether it should terminate?
It's the child that determines that "Uh, I've done X scans, all done".
It's just a for-loop:
for( i=0; i<maxscansperchild; i++ )
wait for work
do work
If it's about pruning idle child processes, the parent is no doubt doing it.
/Per
Re: SIGCHLD query
Posted by Martin Gregorie <ma...@gregorie.org>.
> Yeah - maybe there is some indication in the log? I think there is a
> switch that determines how many emails a child will process before
> needing restart. (just looked it up: --max-conn-per-child)
> I just checked my logs, during the last 9 hours I have 6016 of these:
>
> spamd[11362]: spamd: handled cleanup of child pid 14010 due to SIGCHLD
>
> Is that the one you mean?
>
That's the only log message I've seen. Sometimes you can associate it
with a scan that exceeded --timeout-child seconds and sometimes, much
more rarely, it happens after a scan taking two or three seconds. Tuning
would be easier if there was some indication about why a scan had
terminated - maybe it could be added to the statistics list in the
'results' log line.
> There are also arguments for controlling minimum/maximum number of spare
> child processes - if your load varies, and you have a significant
> difference between min and max, I could see that leading to more child
> processes stopping and starting.
>
Does the parent or the child determine whether the child stays alive
after completing a scan or whether it should terminate?
Martin
Re: SIGCHLD query
Posted by Per Jessen <pe...@computer.org>.
Martin Gregorie wrote:
> On Tue, 2009-10-06 at 23:16 +0200, Per Jessen wrote:
>> Martin, generally speaking, the parent can only report the signal and
>> that the child has gone away. The child would have to report on why.
>>
> OK, rephrase that to "a pity the child doesn't say why its generating a
> SIGCHLD signal".
>
Yeah - maybe there is some indication in the log? I think there is a
switch that determines how many emails a child will process before
needing restart. (just looked it up: --max-conn-per-child)
I just checked my logs, during the last 9 hours I have 6016 of these:
spamd[11362]: spamd: handled cleanup of child pid 14010 due to SIGCHLD
Is that the one you mean?
There are also arguments for controlling minimum/maximum number of spare
child processes - if your load varies, and you have a significant
difference between min and max, I could see that leading to more child
processes stopping and starting.
/Per
Re: SIGCHLD query
Posted by Per Jessen <pe...@computer.org>.
Martin Gregorie wrote:
> On Tue, 2009-10-06 at 16:46 +0200, Per Jessen wrote:
>> Martin Gregorie wrote:
>>
>> > What causes a spamd 3.2.5 child process to be terminated by
>> > receiving a SIGCHLD signal?
>> >
>>
>> A timeout in the child perhaps?
>>
> That thought that may be the reason. It certainly seems to apply when
> a
> child runs longer than the time set by --timeout-child but there are
> a few cases where a SIGCHLD is sent when the child has only run for a
> second or two. Its a pity the log message doesn't include the reason
> why the SIGCHLD was sent.
Martin, generally speaking, the parent can only report the signal and
that the child has gone away. The child would have to report on why.
/Per Jessen, Zürich
Re: SIGCHLD query
Posted by Martin Gregorie <ma...@gregorie.org>.
On Tue, 2009-10-06 at 16:46 +0200, Per Jessen wrote:
> Martin Gregorie wrote:
>
> > What causes a spamd 3.2.5 child process to be terminated by receiving
> > a SIGCHLD signal?
> >
>
> A timeout in the child perhaps?
>
That thought that may be the reason. It certainly seems to apply when a
child runs longer than the time set by --timeout-child but there are a
few cases where a SIGCHLD is sent when the child has only run for a
second or two. Its a pity the log message doesn't include the reason why
the SIGCHLD was sent.
Martin
Re: SIGCHLD query
Posted by Per Jessen <pe...@computer.org>.
Martin Gregorie wrote:
> What causes a spamd 3.2.5 child process to be terminated by receiving
> a SIGCHLD signal?
>
A parent process receives a SIGCHLD when a child process terminates.
> My last month's logs show 7 of them and I can't work out what caused
> them to be sent. However, Jose Luis Marin Perez' system is seeing a
> lot of them - on the order of 10% of messages scanned are getting hit
> by them, though his seem to be connected with very long running scans.
A timeout in the child perhaps?
/Per Jessen, Zürich