You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2005/04/13 03:37:46 UTC
[Bug 4258] New: spamd locks up after varying load
http://bugzilla.spamassassin.org/show_bug.cgi?id=4258
Summary: spamd locks up after varying load
Product: Spamassassin
Version: SVN Trunk (Latest Devel Version)
Platform: Other
OS/Version: other
Status: NEW
Severity: critical
Priority: P3
Component: spamc/spamd
AssignedTo: dev@spamassassin.apache.org
ReportedBy: quinlan@pathname.com
When running spamd in a varying load environment ranging from very
loaded to idle, it will eventually stop responding to requests.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4258] spamd locks up after varying load
Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4258
------- Additional Comments From quinlan@pathname.com 2005-04-13 14:37 -------
Created an attachment (id=2790)
--> (http://bugzilla.spamassassin.org/attachment.cgi?id=2790&action=view)
strace
strace of parent process as zipped text file
strace of last child returned this:
read(8, <unfinished ...>
--- SIGINT (Interrupt) ---
--- SIGINT (Interrupt) ---
--- SIGINT (Interrupt) ---
(ended when I control-C'ed the parent)
spamd is always dead when the last ": I" status is printed
ps auxww:
root 66218 1.2 0.8 26988 25700 p2 S+ 2:26PM 0:05.72 perl -T ./spamd
-A 172.16.105.1 -i -C ../rules --max-children=5 --min-children=1 --max-spare=1
root 66217 1.1 0.0 1064 644 p2 S+ 2:26PM 0:05.20 strace -o
/tmp/strace.out perl -T ./spamd -A 172.16.105.1 -i -C ../rules --max-children=5
--min-children=1 --max-spare=1
root 66443 0.1 0.8 27728 26400 p2 I+ 2:27PM 0:00.38 spamd child
(perl)
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4258] spamd locks up after varying load
Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4258
jm@jmason.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Additional Comments From jm@jmason.org 2005-05-04 00:00 -------
OK, after some investigation offline, I've figured this out --
<jmason> I think we have a perl IO issue
<jmason> 5664 perl 1115186325.449041 CALL read(0x8,0x8c77000,0x2000)
<jmason> 5664 perl 1115186325.449053 GIO fd 8 read 12 bytes
<jmason> "B5682
<jmason> I5682
<jmason> "
<jmason> note the two lines in the one read
<jmason> it totally misses the second one...
<jmason> the problem is this
<jmason> 1. parent calls select(), which says "channel N is active"
<jmason> 2. it reads from channel N
<jmason> 3. perl calls read(2) and gets those 2 lines
<jmason> 4. perl returns one of those 2 lines to our code
<jmason> 5. parent later calls select() again, and gets no activity -- although
there is a line that wasn't processed sitting in the stdio buffer
<jmason> we call $socket->getline()
<quinlan> you're not mixing sysread/syswrite/etc. with read/write, etc. ?
<jmason> nope. but perhaps select() doesn't mix with readline()
<jmason> at least not on freebsd ;)
Moving to a new protocol that uses fixed-size buffers instead of variable-length
"lines" fixes the bug -- since we can then use the (unbuffered) sysread()
instead of (buffered) getline().
r168094.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4258] spamd locks up after varying load
Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4258
quinlan@pathname.com changed:
What |Removed |Added
----------------------------------------------------------------------------
OS/Version|other |FreeBSD
Platform|Other |PC
Target Milestone|Undefined |3.1.0
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4258] spamd locks up after varying load
Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4258
------- Additional Comments From jm@jmason.org 2005-05-04 23:11 -------
sooo.... any luck with this fix?
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4258] spamd locks up after varying load
Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4258
------- Additional Comments From quinlan@pathname.com 2005-05-04 23:23 -------
Subject: Re: spamd locks up after varying load
> sooo.... any luck with this fix?
Yes! It fixed the lock-up. I considered it closed once you resolved
the bug. I would have reopened if it didn't. ;-)
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4258] spamd locks up after varying load
Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4258
------- Additional Comments From jm@jmason.org 2005-04-14 18:56 -------
wierd -- not seeing any problem here :( check it out...
let's start halfway through.
cast of characters:
fd 5 = accept fd
fd 6 = control channel to child 66451
fd 7 = " " 66443
write(2, "prefork: child states: IB\n", 26) = 26
one child is processing, the other one's idle
select(16, [5 6 7], NULL, [5 6 7], {2, 0}) = 1 (in [6])
read(6, "I66451\n", 8192) = 7
child 66451 says "I'm idle"
write(2, "prefork: child states: II\n", 26) = 26
prefork master says, "ok, we don't need two kids, we're idle"
kill(66451, SIGINT <unfinished ...>
--- SIGCHLD (Child exited) ---
--- SIGCHLD (Child exited) ---
<... kill resumed> ) = 0
sigreturn(0x9ebff7a0) = 0
select(8, [4], NULL, [4], {0, 0}) = 0 (Timeout)
write(4, "<22>spamd[66218]: server hit by "..., 41) = 41
wait4(-1, [WIFSIGNALED(s) && WTERMSIG(s) == SIGINT], WNOHANG, NULL) = 66451
select(8, [4], NULL, [4], {0, 0}) = 0 (Timeout)
write(4, "<22>spamd[66218]: handled cleanu"..., 54) = 54
wait4(-1, 0x9ebff7dc, WNOHANG, NULL) = 0
select(16, [5 6 7], NULL, [5 6 7], {2, 0}) = 1 (in [6])
read(6, "", 8192) = 0
close(-1) = -1 EBADF (Bad file descriptor)
close(6) = 0
write(2, "prefork: child states: I\n", 25) = 25
and there's 1 idle child.
select(16, [5 7], NULL, [5 7], {2, 0}) = 1 (in [5])
write(7, "A\n", 2) = 2
activity on the accept socket -- tells the child on fd 7 to accept...
read(7, "B66443\n", 8192) = 7
child says "I'm busy"
select(16, [5 7], NULL, [5 7], {2, 0}) = 1 (in [7])
read(7, "I66443\n", 8192) = 7
child finishes processing and says "I'm idle".
write(2, "prefork: child states: I\n", 25) = 25
select(16, [5 7], NULL, [5 7], {2, 0}) = 0 (Timeout)
select(16, [5 7], NULL, [5 7], {2, 0}) = 0 (Timeout)
select(16, [5 7], NULL, [5 7], {2, 0}) = 0 (Timeout)
select(16, [5 7], NULL, [5 7], {2, 0}) = 0 (Timeout)
select(16, [5 7], NULL, [5 7], {2, 0}) = 0 (Timeout)
select(16, [5 7], NULL, [5 7], {2, 0}) = 0 (Timeout)
select(16, [5 7], NULL, [5 7], {2, 0}) = 0 (Timeout)
select(16, [5 7], NULL, [5 7], {2, 0}) = 0 (Timeout)
select(16, [5 7], NULL, [5 7], {2, 0}) = 0 (Timeout)
and repeat until eventually killed.
So in other words, the master spamd is not getting hung -- it really is
selecting on the accept fd. But there's no activity arriving on that fd, which
is obviously incorrect?!
right now, I'm stumped as to how a select on the accept() fd could *fail*
to notice activity, if there is activity there, and that select() call looks
100% fine to me (unless BSD has changed its semantics somewhere since BSD
4.3...)
Could you retry again, using strace -ft (follow forks, add time)? maybe there's
something in the kid's straces that'll help.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4258] spamd locks up after varying load
Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4258
------- Additional Comments From jm@jmason.org 2005-04-12 19:15 -------
is it still running? can you get a ptrace/strace of both procs?
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4258] spamd locks up after varying load
Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4258
------- Additional Comments From quinlan@pathname.com 2005-04-12 18:41 -------
Created an attachment (id=2785)
--> (http://bugzilla.spamassassin.org/attachment.cgi?id=2785&action=view)
log of spamd running
At the end of this, spamd won't even respond to a PING or allow you to
connect to the port.
It seems like spamd believes there are idle clients when there are none.
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4258] spamd locks up after varying load
Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4258
------- Additional Comments From quinlan@pathname.com 2005-04-12 18:38 -------
*** Bug 4257 has been marked as a duplicate of this bug. ***
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 4258] spamd locks up after varying load
Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4258
newsuser@famdijkstra.org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |newsuser@famdijkstra.org
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.