You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Martin Gregorie <ma...@gregorie.org> on 2009/10/03 01:42:53 UTC

SIGCHLD query

What causes a spamd 3.2.5 child process to be terminated by receiving a
SIGCHLD signal? 

I've looked at the spamc and spamd manpages but there's no mention of
them there. I can't remember seeing them discussed on this maillist
either.

My last month's logs show 7 of them and I can't work out what caused
them to be sent. However, Jose Luis Marin Perez' system is seeing a lot
of them - on the order of 10% of messages scanned are getting hit by
them, though his seem to be connected with very long running scans.

So, what do these signals mean and what should I do to my SA
configuration to get rid of them.


Martin

Re: SIGCHLD query

Posted by Martin Gregorie <ma...@gregorie.org>.

On Wed, 2009-10-07 at 14:31 +0200, Per Jessen wrote:
> Okay, I ran a check on my logs since midnight - yes, I also see a lot of
> child processes running for less than 10secs, in fact slightly more
> than 50%.  Interesting issue.  
> 
Here's the results of a scan across all my mail logs:

Processing file /var/log/maillog*
 3544 Messages found
 3538 Results         (99.8%)
    6 SIGCHLDs caught (0.2%)
                     min    avg    max
Message size:        353   7340 496682
Scan time (secs):    0.5    2.3   34.5

I've checked all the SIGCHLD log lines. The previuous scan by those
children were all in the range 1.- to 3.1 seconds. I'm using the default
child population and the default --timeout-child of 300 secs.


Martin

Re: SIGCHLD query

Posted by Per Jessen <pe...@computer.org>.

Per Jessen wrote:

> Martin Gregorie wrote:
>>> Yeah - maybe there is some indication in the log?  I think there is
>>> a switch that determines how many emails a child will process before
>>> needing restart. (just looked it up:  --max-conn-per-child)
>>> I just checked my logs, during the last 9 hours I have 6016 of
>>> these:
>>>
>>> spamd[11362]: spamd: handled cleanup of child pid 14010 due to
>>> SIGCHLD
>>>
>>> Is that the one you mean?
>>>
>> That's the only log message I've seen. Sometimes you can associate it
>> with a scan that exceeded --timeout-child seconds and sometimes, much
>> more rarely, it happens after a scan taking two or three seconds.
> 
> I don't know if that is happening on my systems too, I haven't
> checked.

Okay, I ran a check on my logs since midnight - yes, I also see a lot of
child processes running for less than 10secs, in fact slightly more
than 50%.  Interesting issue.  


/Per Jessen, Zürich

Re: SIGCHLD query

Posted by Per Jessen <pe...@computer.org>.

Martin Gregorie wrote:
>> Yeah - maybe there is some indication in the log?  I think there is a 
>> switch that determines how many emails a child will process before 
>> needing restart. (just looked it up:  --max-conn-per-child)
>> I just checked my logs, during the last 9 hours I have 6016 of these:
>>
>> spamd[11362]: spamd: handled cleanup of child pid 14010 due to SIGCHLD
>>
>> Is that the one you mean?
>>
> That's the only log message I've seen. Sometimes you can associate it
> with a scan that exceeded --timeout-child seconds and sometimes, much
> more rarely, it happens after a scan taking two or three seconds. 

I don't know if that is happening on my systems too, I haven't checked. 
  I wonder if the latter could be caused by the maintenance of spare 
child processes?

>> There are also arguments for controlling minimum/maximum number of spare 
>> child processes - if your load varies, and you have a significant 
>> difference between min and max, I could see that leading to more child 
>> processes stopping and starting.
>>
> Does the parent or the child determine whether the child stays alive
> after completing a scan or whether it should terminate?

It's the child that determines that "Uh, I've done X scans, all done". 
It's just a for-loop:

for( i=0; i<maxscansperchild; i++ )
   wait for work
   do work

If it's about pruning idle child processes, the parent is no doubt doing it.


/Per

Re: SIGCHLD query

Posted by Martin Gregorie <ma...@gregorie.org>.

> Yeah - maybe there is some indication in the log?  I think there is a 
> switch that determines how many emails a child will process before 
> needing restart. (just looked it up:  --max-conn-per-child)
> I just checked my logs, during the last 9 hours I have 6016 of these:
> 
> spamd[11362]: spamd: handled cleanup of child pid 14010 due to SIGCHLD
> 
> Is that the one you mean?
> 
That's the only log message I've seen. Sometimes you can associate it
with a scan that exceeded --timeout-child seconds and sometimes, much
more rarely, it happens after a scan taking two or three seconds. Tuning
would be easier if there was some indication about why a scan had
terminated - maybe it could be added to the statistics list in the
'results' log line.

> There are also arguments for controlling minimum/maximum number of spare 
> child processes - if your load varies, and you have a significant 
> difference between min and max, I could see that leading to more child 
> processes stopping and starting.
> 
Does the parent or the child determine whether the child stays alive
after completing a scan or whether it should terminate?


Martin

Re: SIGCHLD query

Posted by Per Jessen <pe...@computer.org>.

Martin Gregorie wrote:
> On Tue, 2009-10-06 at 23:16 +0200, Per Jessen wrote:
>> Martin, generally speaking, the parent can only report the signal and
>> that the child has gone away.  The child would have to report on why. 
>>
> OK, rephrase that to "a pity the child doesn't say why its generating a
> SIGCHLD signal".
> 

Yeah - maybe there is some indication in the log?  I think there is a 
switch that determines how many emails a child will process before 
needing restart. (just looked it up:  --max-conn-per-child)
I just checked my logs, during the last 9 hours I have 6016 of these:

spamd[11362]: spamd: handled cleanup of child pid 14010 due to SIGCHLD

Is that the one you mean?

There are also arguments for controlling minimum/maximum number of spare 
child processes - if your load varies, and you have a significant 
difference between min and max, I could see that leading to more child 
processes stopping and starting.

/Per

Re: SIGCHLD query

Posted by Per Jessen <pe...@computer.org>.

Martin Gregorie wrote:

> On Tue, 2009-10-06 at 16:46 +0200, Per Jessen wrote:
>> Martin Gregorie wrote:
>> 
>> > What causes a spamd 3.2.5 child process to be terminated by
>> > receiving a SIGCHLD signal?
>> > 
>> 
>> A timeout in the child perhaps?
>> 
> That thought that may be the reason. It certainly seems to apply when
> a
> child runs longer than the time set by --timeout-child  but there are
> a few cases where a SIGCHLD is sent when the child has only run for a
> second or two. Its a pity the log message doesn't include the reason
> why the SIGCHLD was sent.

Martin, generally speaking, the parent can only report the signal and
that the child has gone away.  The child would have to report on why. 


/Per Jessen, Zürich

Re: SIGCHLD query

Posted by Martin Gregorie <ma...@gregorie.org>.

On Tue, 2009-10-06 at 16:46 +0200, Per Jessen wrote:
> Martin Gregorie wrote:
> 
> > What causes a spamd 3.2.5 child process to be terminated by receiving
> > a SIGCHLD signal?
> > 
> 
> A timeout in the child perhaps?
> 
That thought that may be the reason. It certainly seems to apply when a
child runs longer than the time set by --timeout-child  but there are a
few cases where a SIGCHLD is sent when the child has only run for a
second or two. Its a pity the log message doesn't include the reason why
the SIGCHLD was sent.


Martin

Re: SIGCHLD query

Posted by Per Jessen <pe...@computer.org>.

Martin Gregorie wrote:

> What causes a spamd 3.2.5 child process to be terminated by receiving
> a SIGCHLD signal?
> 

A parent process receives a SIGCHLD when a child process terminates. 

> My last month's logs show 7 of them and I can't work out what caused
> them to be sent. However, Jose Luis Marin Perez' system is seeing a
> lot of them - on the order of 10% of messages scanned are getting hit
> by them, though his seem to be connected with very long running scans.

A timeout in the child perhaps?


/Per Jessen, Zürich