You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2007/03/23 20:29:37 UTC

[Bug 5388] New: socket errors in spamd, both ipv4 and ipv6

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5388

           Summary: socket errors in spamd, both ipv4 and ipv6
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: blocker
          Priority: P5
         Component: spamc/spamd
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: mattw@nmgi.com


latest SVN build of spamassassin on latest everything on debian Etch, as of
yesterday: 

with debug enabled, spamd is reporting the following errors very often.  

Fri Mar 23 14:21:54 2007 [480] info: prefork: child states: II
Fri Mar 23 14:22:00 2007 [486] warn: spamd: error: failed to obtain port and ip
from socket at /usr/bin/spamd line 1196.
Fri Mar 23 14:22:00 2007 [486] warn:  Transport endpoint is not connected,
continuing at /usr/bin/spamd line 1071.

I have tried 'apt-get remove libio-socket-inet6-perl', which causes it to report
a different error.  Here is something similar to the error I see, recently
reported on spamassassin-users.

> Mar 23 11:50:48 spamfilter5 spamd[28398]: Use of uninitialized value 
> in subroutine entry at 
> /usr/lib/perl5/5.8.5/i386-linux-thread-multi/Socket.pm line 370.
> Mar 23 11:50:48 spamfilter5 spamd[28398]: Bad arg length for 
> Socket::unpack_sockaddr_in, length is 0, should be 16 at 
> /usr/lib/perl5/5.8.5/i386-linux-thread-multi/Socket.pm line 370.
> Mar 23 11:50:48 spamfilter5 spamd[28398]: spamd: error: Bad arg length 
> for Socket::unpack_sockaddr_in, length is 0, should be 16 at 
> /usr/lib/perl5/5.8.5/i386-linux-thread-multi/Socket.pm
> line 370.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5388] socket errors in spamd, both ipv4 and ipv6

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5388





------- Additional Comments From mattw@nmgi.com  2007-03-27 10:06 -------
err oops, that
$orders == PF_PING_ORDER
should be
$orders eq PF_PING_ORDER



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5388] socket errors in spamd, both ipv4 and ipv6

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5388


mattheww@nmgi.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |5416






------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5388] socket errors in spamd, both ipv4 and ipv6

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5388





------- Additional Comments From mattw@nmgi.com  2007-03-27 09:42 -------
Created an attachment (id=3888)
 --> (http://issues.apache.org/SpamAssassin/attachment.cgi?id=3888&action=view)
fix for several interrelated parent/child/socket problems.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5388] socket errors in spamd, both ipv4 and ipv6

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5388





------- Additional Comments From jm@jmason.org  2007-04-13 03:59 -------
(In reply to comment #7 on bug 5416, which I'd prefer to keep here to keep the
thread in one place ;)

> FWIW/HTH: it's my theory that this is more likely to happen on slower machines,
> or spamd installations with lots of large custom rulesets (are custom rulesets
> loaded during the tests?), b/c from what I understand, the first spamc
> connection attempt from those test scripts *might* be accepted by the server in
> the very short period of time between the instant the last child forks and the
> instant the first forked child finishes reporting that it's idle (ready to
accept).
> 
> So what my patch was trying to do was give SpamdForkScaling.pm a way to know if
> it was still in "startup" phase, which would have overriden the trigger for
> "overloaded" mode.  But I botched the implementation as you can tell from
> Justin's  commentary...  

yeah -- you see, what I would expect to happen in that situation is that the
parent spamd would just fork *more* children.  if the load then wasn't there to
keep that many kids active, they'd exit, and it'd settle down to a steady state
quickly enough.

You can fake the problem situation described above, by adding this JMD line to
update_child_status_idle() in SpamdForkScaling.pm:

sub update_child_status_idle {
  my ($self) = @_;
  warn "JMD faking slow machine with a sleep"; sleep 10;
  # "I  b1 b2 b3 b4 \n "
  $self->report_backchannel_socket("I".pack("N",$self->{pid})."\n");
}

even with this, the spamd continues to work ok.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5388] socket errors in spamd, both ipv4 and ipv6

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5388





------- Additional Comments From mattw@nmgi.com  2007-03-23 12:46 -------
I should note that under both scenarios, spamd is accepting connections and
processing mail.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5388] socket errors in spamd, both ipv4 and ipv6

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5388





------- Additional Comments From jm@jmason.org  2007-03-27 12:12 -------
hmm.

@@ -178,10 +178,11 @@
   foreach my $p (@pids) {
     if ($l > $p) { $l = $p };
   }
-  $self->{lowest_idle_pid} = $l;
+  $self->{lowest_idle_pid} = $l if defined $l;
 
+  # make sure not to auto-vivify lowest_idle_pid, so the parent knows no child
has ever gone idle.
   dbg("prefork: new lowest idle kid: ".
-            ($self->{lowest_idle_pid} ? $self->{lowest_idle_pid} : 'none'));
+            ((exists $self->{lowest_idle_pid} && $self->{lowest_idle_pid}) ?
$self->{lowest_idle_pid} : 'none'));
 }

If lowest_idle_pid is set to a value, then this is called and no child is idle,
the *previous* value will *be left intact*.  that's a bad idea, since that
child probably isn't idle anymore....



@@ -296,7 +297,8 @@
     if (!$self->order_idle_child_to_accept()) {
       # dbg("prefork: no idle kids, noting overloaded");
       # there are no idle kids!  we're overloaded, mark that
-      $self->{overloaded} = 1;
+      # Don't mark the server as overloaded if no child has ever gone idle.
+      $self->{overloaded} = 1 if exists $self->{lowest_idle_pid};
     }
     return;
   }

So if the spamd server accepts a connection before any of the initial
children reports "idle", it starts another child?  I'm inclined
to consider that a feature, to be honest ;)  if load is that high,
another child is probably going to be needed.


@@ -532,7 +534,7 @@
     chomp $line;
     if (index ($line, "P") == 0) {  # string starts with "P" = ping
       dbg("prefork: periodic ping from spamd parent");
-      next;
+      return PF_PING_ORDER;
     }
     if (index ($line, "A") == 0) {  # string starts with "A" = accept
       return PFORDER_ACCEPT;
Index: spamd/spamd.raw
===================================================================
--- spamd/spamd.raw	(revision 522978)
+++ spamd/spamd.raw	(working copy)
@@ -1059,7 +1059,12 @@
       if ($scaling) {
         $scaling->update_child_status_idle();
         $orders = $scaling->wait_for_orders(); # and sleep...
-
+        
+        if ($orders eq PF_PING_ORDER) {
+          $i--; #don't count this connection against the max; it's just a ping.
+          next;
+        }
+        
         if ($orders != PFORDER_ACCEPT) {
           info("spamd: unknown order: $orders");
         }

I don't get this either.  AFAICS this just defers the skipping of the 
ping request, up a level in the function call tree, and a few no-ops
later.


@@ -1165,7 +1170,7 @@
   }
 
   # Bah!
-  if ( !$client ) {
+  if ( !$client || !$client->connected() ) {
 
     # this can happen when interrupted by SIGCHLD on Solaris,
     # perl 5.8.0, and some other platforms with -m.

OK, on the other hand, I'm happy to apply this change!   It's in trunk now.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5388] socket errors in spamd, both ipv4 and ipv6

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5388


mattw@nmgi.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #3888 is|0                           |1
           obsolete|                            |




------- Additional Comments From mattw@nmgi.com  2007-03-27 10:30 -------
Created an attachment (id=3889)
 --> (http://issues.apache.org/SpamAssassin/attachment.cgi?id=3889&action=view)
corrected patch

Yes, those errors no longer occur because the children aren't continually told
anymore to accept non-existent connections.  See a few comments above.

Here's the corrected patch.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5388] socket errors in spamd, both ipv4 and ipv6

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5388





------- Additional Comments From jm@jmason.org  2007-03-27 10:25 -------
interesting patch!  I'm a bit leery of applying it so late in the 3.2.0 dev
cycle though :(

does that fix have anything to do with the original 'failed to obtain port and ip
from socket' error issue?  what are the symptoms of what it's fixing, if not?



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5388] socket errors in spamd, both ipv4 and ipv6

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5388


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED




------- Additional Comments From jm@jmason.org  2007-08-17 09:30 -------
I think this is fixed nowadays; please reopen if not



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5388] socket errors in spamd, both ipv4 and ipv6

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5388


mattw@nmgi.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID




------- Additional Comments From mattw@nmgi.com  2007-03-24 23:32 -------
Figured out the issue with the ipv4 version - I have an additional local
loopback address that is part of an LVS/keepalived scheme, and that address was
accept()ing connections on tcp783 every second as part of LVS' tcpcheck.  At the
initiation of these connections, the socket object returns a null ->peername()
in peer_info_from_socket(), which causes the error messages I listed above.  As
a workaround, we will configure LVS/keepalived to perform health status checks
via the correct interface at both ends, but also teach it to ping spamd with a
proper ping header.
I have not tested it, but I assume the ipv6 version error messages are caused
similarly.
I would also like to take this opportunity to request an additional flag for
spamd (--bind-ipv4-only) that would override/condition the eval{} in the second
BEGIN{}, since in our case, I would like to keep the debian package
libio-socket-inet6-perl installed, but still force spamd to bind to ipv4 only.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5388] socket errors in spamd, both ipv4 and ipv6

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5388


mattw@nmgi.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P5                          |P2




------- Additional Comments From mattw@nmgi.com  2007-03-23 12:40 -------
to clarify, here is the exact error I see when removing ipv6 capability:


Fri Mar 23 14:39:25 2007 [606] info: logger: removing stderr method
Fri Mar 23 14:39:32 2007 [608] info: spamd: server started on port 783/tcp
(running version 3.2.0-pre3)
Fri Mar 23 14:39:32 2007 [608] info: spamd: server pid: 608
Fri Mar 23 14:39:32 2007 [608] info: spamd: server successfully spawned child
process, pid 612
Fri Mar 23 14:39:32 2007 [608] info: spamd: server successfully spawned child
process, pid 613
Fri Mar 23 14:39:32 2007 [612] warn: Use of uninitialized value in subroutine
entry at /usr/lib/perl/5.8/Socket.pm line 198.
Fri Mar 23 14:39:32 2007 [612] warn: spamd: error: Bad arg length for
Socket::unpack_sockaddr_in, length is 0, should be 16 at
/usr/lib/perl/5.8/Socket.pm line 198.
Fri Mar 23 14:39:32 2007 [612] warn:  , continuing at /usr/bin/spamd line 1071.
Fri Mar 23 14:39:32 2007 [608] info: prefork: child states: BI
Fri Mar 23 14:39:32 2007 [608] info: prefork: child states: II
Fri Mar 23 14:39:34 2007 [612] warn: Use of uninitialized value in subroutine
entry at /usr/lib/perl/5.8/Socket.pm line 198.
Fri Mar 23 14:39:34 2007 [612] warn: spamd: error: Bad arg length for
Socket::unpack_sockaddr_in, length is 0, should be 16 at
/usr/lib/perl/5.8/Socket.pm line 198.
Fri Mar 23 14:39:34 2007 [612] warn:  , continuing at /usr/bin/spamd line 1071.
Fri Mar 23 14:39:34 2007 [608] info: prefork: child states: II
Fri Mar 23 14:39:40 2007 [612] warn: Use of uninitialized value in subroutine
entry at /usr/lib/perl/5.8/Socket.pm line 198.
Fri Mar 23 14:39:40 2007 [612] warn: spamd: error: Bad arg length for
Socket::unpack_sockaddr_in, length is 0, should be 16 at
/usr/lib/perl/5.8/Socket.pm line 198.
Fri Mar 23 14:39:40 2007 [612] warn:  , continuing at /usr/bin/spamd line 1071.
Fri Mar 23 14:39:40 2007 [608] info: prefork: child states: II




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5388] socket errors in spamd, both ipv4 and ipv6

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5388


mattw@nmgi.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|INVALID                     |




------- Additional Comments From mattw@nmgi.com  2007-03-27 09:44 -------
This patch solves several problems with spamd parent/child relations and
sockets.  Two problems: 
1. On initial startup, server was being marked as overloaded b/c no children had
started up yet (so there were no idle children), which caused the parent to
continually direct the children to accept non-existent connections.
2. The "periodic ping" from the parent to all the children was leaving the
listening child socket in a state that sometimes caused the child to commit suicide.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 5388] socket errors in spamd, both ipv4 and ipv6

Posted by bu...@bugzilla.spamassassin.org.
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5388


Bug 5388 depends on bug 5416, which changed state.

Bug 5416 Summary: make test fails spamc_cf.t, spamd_plugin.t, spamd_unix.t, and spamd_unix_and_tcp.t
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5416

           What    |Old Value                   |New Value
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.