You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2005/04/13 03:37:46 UTC

[Bug 4258] New: spamd locks up after varying load

http://bugzilla.spamassassin.org/show_bug.cgi?id=4258

           Summary: spamd locks up after varying load
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: critical
          Priority: P3
         Component: spamc/spamd
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: quinlan@pathname.com


When running spamd in a varying load environment ranging from very
loaded to idle, it will eventually stop responding to requests.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4258] spamd locks up after varying load

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4258





------- Additional Comments From quinlan@pathname.com  2005-04-13 14:37 -------
Created an attachment (id=2790)
 --> (http://bugzilla.spamassassin.org/attachment.cgi?id=2790&action=view)
strace

strace of parent process as zipped text file

strace of last child returned this:

read(8,  <unfinished ...>
--- SIGINT (Interrupt) ---
--- SIGINT (Interrupt) ---
--- SIGINT (Interrupt) ---

(ended when I control-C'ed the parent)

spamd is always dead when the last ": I" status is printed

ps auxww:

root	66218  1.2  0.8 26988 25700  p2  S+    2:26PM	0:05.72 perl -T ./spamd
-A 172.16.105.1 -i -C ../rules --max-children=5 --min-children=1 --max-spare=1
root	66217  1.1  0.0  1064  644  p2	S+    2:26PM   0:05.20 strace -o
/tmp/strace.out perl -T ./spamd -A 172.16.105.1 -i -C ../rules --max-children=5
--min-children=1 --max-spare=1
root	66443  0.1  0.8 27728 26400  p2  I+    2:27PM	0:00.38 spamd child
(perl)




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4258] spamd locks up after varying load

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4258


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED




------- Additional Comments From jm@jmason.org  2005-05-04 00:00 -------
OK, after some investigation offline, I've figured this out --

<jmason> I think we have a perl IO issue
<jmason>   5664 perl     1115186325.449041 CALL  read(0x8,0x8c77000,0x2000)
<jmason>   5664 perl     1115186325.449053 GIO   fd 8 read 12 bytes
<jmason>        "B5682
<jmason>         I5682
<jmason>        "
<jmason> note the two lines in the one read
<jmason> it totally misses the second one...
<jmason> the problem is this
<jmason> 1. parent calls select(), which says "channel N is active"
<jmason> 2. it reads from channel N
<jmason> 3. perl calls read(2) and gets those 2 lines
<jmason> 4. perl returns one of those 2 lines to our code
<jmason> 5. parent later calls select() again, and gets no activity -- although
there is a line that wasn't processed sitting in the stdio buffer

<jmason> we call $socket->getline()
<quinlan> you're not mixing sysread/syswrite/etc. with read/write, etc. ?
<jmason> nope.  but perhaps select() doesn't mix with readline()
<jmason> at least not on freebsd ;)

Moving to a new protocol that uses fixed-size buffers instead of variable-length
"lines" fixes the bug -- since we can then use the (unbuffered) sysread()
instead of (buffered) getline().

r168094.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4258] spamd locks up after varying load

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4258


quinlan@pathname.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         OS/Version|other                       |FreeBSD
           Platform|Other                       |PC
   Target Milestone|Undefined                   |3.1.0






------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4258] spamd locks up after varying load

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4258





------- Additional Comments From jm@jmason.org  2005-05-04 23:11 -------
sooo.... any luck with this fix?



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4258] spamd locks up after varying load

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4258





------- Additional Comments From quinlan@pathname.com  2005-05-04 23:23 -------
Subject: Re:  spamd locks up after varying load

> sooo.... any luck with this fix?

Yes!  It fixed the lock-up.  I considered it closed once you resolved
the bug.  I would have reopened if it didn't.  ;-)





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4258] spamd locks up after varying load

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4258





------- Additional Comments From jm@jmason.org  2005-04-14 18:56 -------
wierd -- not seeing any problem here :(  check it out...
let's start halfway through.

cast of characters:

    fd 5 = accept fd
    fd 6 = control channel to child 66451
    fd 7 = "                      " 66443


write(2, "prefork: child states: IB\n", 26) = 26

        one child is processing, the other one's idle

select(16, [5 6 7], NULL, [5 6 7], {2, 0}) = 1 (in [6])
read(6, "I66451\n", 8192)               = 7

        child 66451 says "I'm idle"

write(2, "prefork: child states: II\n", 26) = 26

        prefork master says, "ok, we don't need two kids, we're idle"

kill(66451, SIGINT <unfinished ...>
--- SIGCHLD (Child exited) ---
--- SIGCHLD (Child exited) ---
<... kill resumed> )                    = 0
sigreturn(0x9ebff7a0)                   = 0
select(8, [4], NULL, [4], {0, 0})       = 0 (Timeout)
write(4, "<22>spamd[66218]: server hit by "..., 41) = 41
wait4(-1, [WIFSIGNALED(s) && WTERMSIG(s) == SIGINT], WNOHANG, NULL) = 66451
select(8, [4], NULL, [4], {0, 0})       = 0 (Timeout)
write(4, "<22>spamd[66218]: handled cleanu"..., 54) = 54
wait4(-1, 0x9ebff7dc, WNOHANG, NULL)    = 0
select(16, [5 6 7], NULL, [5 6 7], {2, 0}) = 1 (in [6])
read(6, "", 8192)                       = 0
close(-1)                               = -1 EBADF (Bad file descriptor)
close(6)                                = 0
write(2, "prefork: child states: I\n", 25) = 25

        and there's 1 idle child.

select(16, [5 7], NULL, [5 7], {2, 0})  = 1 (in [5])
write(7, "A\n", 2)                      = 2

        activity on the accept socket -- tells the child on fd 7 to accept...

read(7, "B66443\n", 8192)               = 7

        child says "I'm busy"

select(16, [5 7], NULL, [5 7], {2, 0})  = 1 (in [7])
read(7, "I66443\n", 8192)               = 7

        child finishes processing and says "I'm idle".

write(2, "prefork: child states: I\n", 25) = 25
select(16, [5 7], NULL, [5 7], {2, 0})  = 0 (Timeout)
select(16, [5 7], NULL, [5 7], {2, 0})  = 0 (Timeout)
select(16, [5 7], NULL, [5 7], {2, 0})  = 0 (Timeout)
select(16, [5 7], NULL, [5 7], {2, 0})  = 0 (Timeout)
select(16, [5 7], NULL, [5 7], {2, 0})  = 0 (Timeout)
select(16, [5 7], NULL, [5 7], {2, 0})  = 0 (Timeout)
select(16, [5 7], NULL, [5 7], {2, 0})  = 0 (Timeout)
select(16, [5 7], NULL, [5 7], {2, 0})  = 0 (Timeout)
select(16, [5 7], NULL, [5 7], {2, 0})  = 0 (Timeout)

        and repeat until eventually killed.


So in other words, the master spamd is not getting hung -- it really is
selecting on the accept fd.  But there's no activity arriving on that fd, which
is obviously incorrect?!

right now, I'm stumped as to how a select on the accept() fd could *fail*
to notice activity, if there is activity there, and that select() call looks
100% fine to me (unless BSD has changed its semantics somewhere since BSD
4.3...)

Could you retry again, using strace -ft (follow forks, add time)? maybe there's
something in the kid's straces that'll help.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4258] spamd locks up after varying load

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4258





------- Additional Comments From jm@jmason.org  2005-04-12 19:15 -------
is it still running?  can you get a ptrace/strace of both procs?



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4258] spamd locks up after varying load

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4258





------- Additional Comments From quinlan@pathname.com  2005-04-12 18:41 -------
Created an attachment (id=2785)
 --> (http://bugzilla.spamassassin.org/attachment.cgi?id=2785&action=view)
log of spamd running

At the end of this, spamd won't even respond to a PING or allow you to
connect to the port.

It seems like spamd believes there are idle clients when there are none.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4258] spamd locks up after varying load

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4258





------- Additional Comments From quinlan@pathname.com  2005-04-12 18:38 -------
*** Bug 4257 has been marked as a duplicate of this bug. ***



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4258] spamd locks up after varying load

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4258


newsuser@famdijkstra.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |newsuser@famdijkstra.org






------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.