You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2005/04/23 00:30:39 UTC

[Bug 4278] New: dnsbl test hangs in Windows, both Cygwin and WIN32

http://bugzilla.spamassassin.org/show_bug.cgi?id=4278

           Summary: dnsbl test hangs in Windows, both Cygwin and WIN32
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Regression Tests
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: sidney@sidney.com


Running make test hangs in dnsbl.t and in debug.t (which runs the same test case
as dnsbl.t). The log files don't show anything, and even using TEST_VERBOSE=1
seems not to show how far it gets before hanging because the last output stays
buffered. I'm attaching a log that was generated by running spammassassin from
the command line directly.

The last things in the logs are debug messages from dns looking up records, so I
suspect it has to do with the new dns single socket code.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4278] dnsbl test hangs in Windows, both Cygwin and WIN32

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4278


sidney@sidney.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED




------- Additional Comments From sidney@sidney.com  2005-04-28 08:25 -------
checked in rev 16149




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4278] dnsbl test hangs in Windows, both Cygwin and WIN32

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4278





------- Additional Comments From sidney@sidney.com  2005-04-24 15:44 -------
The hang appears to be in DnsResolver->search() where it is calling
poll_results(-1) which loops forever with no timeout. Just changing it to all
poll_results with a timeout doesn't help because the loop that calls it just
keeps going forever with poll_results timing out and then being called again.

I notice that the comment for search() in DnsResolver says that it is emulating
Net::DNS::Resolver->search(), but it is calling bgsend to do that. There is an
important difference between what Net::DNS::Resolver does and using bgsend. The
former does an immediate query that uses the searchlist to go through all the
nameservers. bgsend can only use the first nameserver on the searchlist, because
it backgrounds the query. I suspect that the first nameserver on my PC's list is
dodgey in some way, causing the bgsend not to work all the time. There is no
provision for a timeout and no provision for using the searchlist, hence the
call to search() never returns.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4278] dnsbl test hangs in Windows, both Cygwin and WIN32

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4278





------- Additional Comments From sidney@sidney.com  2005-04-28 08:24 -------
Created an attachment (id=2815)
 --> (http://bugzilla.spamassassin.org/attachment.cgi?id=2815&action=view)
patch to remove unresponsive nameservers fro beginning of the list when testing
for DNS availability

This patch assumes a code change in DnsResolver->search that times out an
unresponsive query, as checked in for bug 4260.

Perl experts: I am not happy with the two lines

my @nameservers = $self->{resolver}->get_resolver->nameservers();
   and
$self->{resolver}->get_resolver->nameservers(@nameservers);

They should call a new wrapper function DnsResolver->nameservers instead of
DnsResolver->get_resolver->nameservers, but my Perl-fu was not up to getting
the use of arrays and references correct to write the simple wrapper function.
Can someone else fix that please?

I checked this in as is to rev 16149, since it does work.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4278] dnsbl test hangs in Windows, both Cygwin and WIN32

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4278


sidney@sidney.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |major




------- Additional Comments From sidney@sidney.com  2005-04-24 13:14 -------
Results from some debugging:

Here is a debug log fragment from when it works ok:

dbg: dns: is DNS available? 1
dbg: dns: IP is private, not looking up PTR: 127.0.0.1
dbg: received-header: parsed as [ ip=127.0.0.1 rdns= helo=internal.example.com
by=localhost ident= envfrom= intl=0 id= a
uth= ]
dbg: received-header: relay 127.0.0.1 trusted? yes internal? yes
dbg: dns: looking up PTR record for '150.51.53.1'
dbg: dns: PTR for '150.51.53.1': ''
dbg: received-header: parsed as [ ip=150.51.53.1 rdns= helo=dmz.example.com
by=internal.example.com ident= envfrom= intl
=0 id= auth= ]
dbg: received-header: relay 150.51.53.1 trusted? yes internal? yes

When it hangs, the output stops after the line:

dbg: dns: PTR for '150.51.53.1': ''

However, if I run the test again immediately it does not hang there. It might
hang after the next "dbg: dns: PTR for 'xxx.xxx.xxx.xxx'".

It looks like that when the query is cached (on the PC? on DNS server used by
the PC?) the reply comes back right away, but when it is not cached it always
hangs forever.

Once processing gets through this group of DNS queries and gets to the various
background requests in the URIDNSBL processing, it never hangs.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4278] dnsbl test hangs in Windows, both Cygwin and WIN32

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4278





------- Additional Comments From sidney@sidney.com  2005-04-22 15:31 -------
Created an attachment (id=2803)
 --> (http://bugzilla.spamassassin.org/attachment.cgi?id=2803&action=view)
log from running spamassassin command line that is used in the test




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4278] dnsbl test hangs in Windows, both Cygwin and WIN32

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4278


sidney@sidney.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |Mark.Martinec@ijs.si




------- Additional Comments From sidney@sidney.com  2005-04-30 17:17 -------
*** Bug 4295 has been marked as a duplicate of this bug. ***



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4278] dnsbl test hangs in Windows, both Cygwin and WIN32

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4278





------- Additional Comments From sidney@sidney.com  2005-04-24 14:06 -------
It may be more of a random (timing? race condition?) thing then a caching thing.
Running repeated tests does not produce as consistent results as I indicated in
the last comment.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4278] dnsbl test hangs in Windows, both Cygwin and WIN32

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4278





------- Additional Comments From sidney@sidney.com  2005-04-27 14:37 -------
I have more information and a patch. Just using Net::DNS::Resolver->search is
not the right answer.

The immediate cause of the problem was in Net::DNS in the Win32 and Cygwin
versions. They get a list of nameservers to try from the Windows registry, which
contains a list of network interfaces and the nameservers for each one. I was
using a WiFi PC card on my laptop which also has a wired ethernet port that was
not plugged in. The registry entry for the wired interface still contained the
ip address of the nameserver that was provided by DHCP the last time it had been
plugged in. That nameserver ip address was first in the list of nameservers put
together by Net::DNS.

Net::DNS::Resolver->search tries each nameserver on the list in turn, with
settable options for number of retries and progressively long timeouts. Thus it
was immune to the bad nameserver address at the head of the list.

Background queries have to use only the first nameserver, so they fail.
DnsResolver->search is written to use a background query and to poll for a
result with no timeout. With the first nameserver being bad, that caused an
endless loop.

NOTE: This bug is not specific to Win32 and Cygwin as far as I can tell. The
same problem should occur on any system if Net::DNS has a list of nameservers
and the first one is down when you run SpamAssassin. If Net::DNS on Unix is
smart about initializing the nameserver list it might not be likely, but still
could happen if access to the first nameserver crashes after it has been put on
the list and before the DNS availability test in Dns.pm.

Here are the fixes I'll upload as patches when I get a chance. I've already
written and mostly tested them:

Net::DNS::Resolver Win32 and Cygwin modules should be fixed to ignore the
adaptors whose registry entries specify an ip address of 0.0.0.0. I'll submit
patches upstream, but we should make our code proof against similar problems
that can occur wihtout the bug in Net::DNS, such as if the primary nameserver is
down.

Mail::SpamAssassin::Dns.pm should be changed so that the test for DNS
availability is done on each nameserver in Net::DNS::Resolver->nameservers, with
an appropriate timeout and removing unresponsive nameservers until a good one is
found or the list is emptied. This will allow the use of background queries
during the rest of the processing.

DnsResolver->search should continue to use background queries, but be changed to
have a timeout. That was in the TODO comments for that anyway, and the patch was
pretty simple. With the tinmeout it will have the correct behaviour to be used
by the DNS availablility test in Dns.pm, as described in the previous paragraph.

Patches to be attached after I spend some time on work and school.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4278] dnsbl test hangs in Windows, both Cygwin and WIN32

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4278





------- Additional Comments From sidney@sidney.com  2005-04-24 22:09 -------
If DnsResolver->search is rewritten to simply

 return $self->{res}->search($name, $type, $class);

all seems to work with no hang.

Justin, what was the reason for writing it the way you did instead? Does this
way break anything?





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.