You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2005/04/29 17:22:27 UTC

[Bug 4290] New: Test t/dnsbl fails on a slow machine

http://bugzilla.spamassassin.org/show_bug.cgi?id=4290

           Summary: Test t/dnsbl fails on a slow machine
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: Sun
        OS/Version: Solaris
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Building & Packaging
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: schulz@adi.com


When running make test on a very slow machine (a SPARCstation-20) the dnsbl
test fails most of the time (about 90% of runs).  When run on an Ultra 10, the
test always passes.  Both machines are running Solaris 8 with the same patch
level and the same build of perl (v5.8.5).  Both machines are using the same
dns server. The error message is:

t/dnsbl.....................ok 2/22     Not found: P_4 =
  <dns:14.35.17.212.dnsbltest.spamassassin.org> [127.0.0.1, 127.0.0.1] 
t/dnsbl.....................NOK 3# Failed test 3 in t/SATest.pm at line 552  
t/dnsbl.....................ok 15/22    Not found: P_9 =  DNSBL_TEST_WHITELIST 
t/dnsbl.....................NOK 16# Failed test 16 in t/SATest.pm at line 552
 fail #2
t/dnsbl.....................FAILED tests 3, 16                               
        Failed 2/22 tests, 90.91% okay

Note that the machine idle except for the make test.  If there is anything
else running, there are more failures.  This happens with svn versions
20050425163422 and 20050429104320.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4290] Test t/dnsbl fails on a slow machine

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4290





------- Additional Comments From sidney@sidney.com  2005-05-01 22:12 -------
I think that the only thing that doesn't work when both host names end in "." is
the test for [127.0.0.1, 127.0.0.1] and that is a test case error, not a bug in
the code. Or rather, the two different forms of the host name produced the two
instances of 127.0.0.1.

The right place to canonicalize the host name is in EvalTests.pm. Fixing that
and changing the test match string should do it.

I should be able to get to it in four or five hours if nobody else does it first.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4290] Test t/dnsbl fails on a slow machine

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4290





------- Additional Comments From quinlan@pathname.com  2005-04-30 15:31 -------
Subject: Re:  Test t/dnsbl fails on a slow machine

bugzilla-daemon@bugzilla.spamassassin.org writes:

> My question is if this is a totally artificial aspect of the test or
> if in real life we might have a reason to use one RBL in two different
> ways like this? If it can arise in real use then we have a bug we have
> to fix. Otherwise we can just change the test, perhaps by using a
> different domain for one of those.

We do use that style of test, but perhaps we should canonicalize the
period at the end vs. not when reading in the cf files.  Does always
having a period (or always not having it) fix the issue?

Daniel





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4290] Test t/dnsbl fails on a slow machine

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4290





------- Additional Comments From felicity@kluge.net  2005-04-29 14:14 -------
Subject: Re:  Test t/dnsbl fails on a slow machine

On Fri, Apr 29, 2005 at 02:06:37PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> I like easy fixes like that. Unless someone sees this as being a symptom of a
> real error, let's go for that.

I wonder -- is there a problem with the NS dealing with TXT records?  That's
really the only difference between the SB stuff and the other records in the
zone.

I don't think removing the sb bit is going to solve anything on its own, but
why do work that we don't need to do? ;)





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4290] Test t/dnsbl fails on a slow machine

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4290





------- Additional Comments From felicity@kluge.net  2005-04-29 13:50 -------
Subject: Re:  Test t/dnsbl fails on a slow machine

On Fri, Apr 29, 2005 at 01:46:54PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> dbg: dns: launching DNS TXT query for
> 14.35.17.212.sb.dnsbltest.spamassassin.org. in background

Interesting.

134.88.73.210.sb

is the only entry in the sb subdomain.  Everything else should error out.

Anyway, since we don't use SenderBase at all, we should probably just nix out
the tests that involve it.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4290] Test t/dnsbl fails on a slow machine

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4290





------- Additional Comments From sidney@sidney.com  2005-04-29 13:46 -------
Murphy's Law is at work here. I can see the error every time right now at home
with my laptop on my DSL under Cygwin, but when I run the same test with -D to
see debug logs it works every time. I guess there's some timing issue involved.
What I find confusing is that DNS queries are cached, isn't that right? BTW, I
think that both errors are the same one query, just testing for two different
strings that should result from 1) getting the response, then 2) the rule match
that it causes.

dbg: dns: launching DNS TXT query for
14.35.17.212.sb.dnsbltest.spamassassin.org. in background

That's the query that times out when this error occurs





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4290] Test t/dnsbl fails on a slow machine

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4290





------- Additional Comments From sidney@sidney.com  2005-05-01 14:43 -------
I looked over the code and it is written to do the right thing if the two rules
used the same host name (with the "." at the end). So it will take some digging
to see why that doesn't work, or to verify that I got mixed up during testing
and it really does work.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4290] Test t/dnsbl fails on a slow machine

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4290





------- Additional Comments From sidney@sidney.com  2005-04-29 18:29 -------
The missing "." is in the configuration entries in t/dnsbl.t

Adding the "." causes the rule to fire ok, making the second test work, but the
[127.0.0.1, 127.0.0.1] test fails because only one A record is found.

Is that test actually checking some behaviour that is supposed to happen, or was
it only put in because that's what the output looked like?

In other words, is the fix to put in the "." and change what the test is looking
for, or to do something so that the two A queries can be made and the results
still kept separate? 




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4290] Test t/dnsbl fails on a slow machine

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4290


sidney@sidney.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         OS/Version|Solaris                     |All
           Platform|Sun                         |All






------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4290] Test t/dnsbl fails on a slow machine

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4290





------- Additional Comments From sidney@sidney.com  2005-05-02 08:29 -------
Created an attachment (id=2824)
 --> (http://bugzilla.spamassassin.org/attachment.cgi?id=2824&action=view)
patch to force the final dot and to change the test result string

Here is a patch to EvalTests.pm and dnsbl.t which seems to fix the problem.

I'm not real comfortable with this. It works, but may not be the best way to
express this in perl. Please someone with more perl-fu (Daniel?) take a look at
this and fix it up if the code is sloppy, or the test is overkill, or can be
written in better perl.

I am going ahead and checking this in just to keep things moving.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4290] Test t/dnsbl fails on a slow machine

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4290





------- Additional Comments From sidney@sidney.com  2005-04-29 14:06 -------
> we should probably just nix out the tests that involve it

I like easy fixes like that. Unless someone sees this as being a symptom of a
real error, let's go for that.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4290] Test t/dnsbl fails on a slow machine

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4290


sidney@sidney.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED




------- Additional Comments From sidney@sidney.com  2005-05-02 08:33 -------
fix committed to rev 165643




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4290] Test t/dnsbl fails on a slow machine

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4290





------- Additional Comments From lwilton@earthlink.net  2005-04-30 15:52 -------
> We do use that style of test, but perhaps we should canonicalize the
> period at the end vs. not when reading in the cf files.  Does always
> having a period (or always not having it) fix the issue?

See Sidney's comment #10.  The answer is 'yes'.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4290] Test t/dnsbl fails on a slow machine

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4290





------- Additional Comments From sidney@sidney.com  2005-04-29 12:34 -------
I'm seeing it too on my laptop depending on where I am connected to the Internet
and which primary nameserver I am using. Looking at logs it appears that most of
the DNS queries work and those two to the test nameserver for
dbsbltest.spamassassin.org time out. I don't know why it is always those two
when the other queries for the same domain work fine.

Can someone with DNS expertise comment whether there is something about those
two tests that hit the spamassassin.org nameserver directly rather than the
usual caching through the local nameserver?




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4290] Test t/dnsbl fails on a slow machine

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4290





------- Additional Comments From sidney@sidney.com  2005-04-29 17:43 -------
> why do work that we don't need to do?

Umm, because I can't read? I looked at the wrong test. The one that is not
working is looking for

 Not found: P_4 =  <dns:14.35.17.212.dnsbltest.spamassassin.org> [127.0.0.1,
127.0.0.1]
# Failed test 3 in t/SATest.pm at line 552
 Not found: P_9 =  DNSBL_TEST_WHITELIST
# Failed test 16 in t/SATest.pm at line 552 fail #2

Here's a question: What's the difference between
14.35.17.212.dnsbltest.spamassassin.org and
226.148.120.193.dnsbltest.spamassassin.org? Why does one result in [127.0.0.1,
127.0.0.1] in the log and the other [127.0.0.1]?





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4290] Test t/dnsbl fails on a slow machine

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4290





------- Additional Comments From sidney@sidney.com  2005-04-29 18:05 -------
Ok, I see in the debug log that there is one test that only looks at
14.35.17.212.dnsbltest.spamassassin.org and it launches a second query of the A
record. There is supposed to be code that prevents multiple queries of the same
thing. Perhaps it is not doing the right thing about caching the answer. Or
perhaps the code is handling this but there is a race condition.

First

dbg: dns: checking RBL dnsbltest.spamassassin.org., set test
dbg: dns: IPs found: full-external: 212.17.35.14, 193.120.149.226, 61.119.13.18,
210.73.88.134, 144.137.3.98 unt\
 212.17.35.14, 193.120.149.226, 61.119.13.18, 210.73.88.134, 144.137.3.98
originating:
dbg: dns: only inspecting the following IPs: 144.137.3.98, 210.73.88.134,
61.119.13.18, 193.120.149.226, 212.17.\

dbg: dns: launching DNS A query for 98.3.137.144.dnsbltest.spamassassin.org. in
background
dbg: dns: launching DNS A query for 134.88.73.210.dnsbltest.spamassassin.org. in
background
dbg: dns: launching DNS A query for 18.13.119.61.dnsbltest.spamassassin.org. in
background
dbg: dns: launching DNS A query for 226.149.120.193.dnsbltest.spamassassin.org.
in background
dbg: dns: launching DNS A query for 14.35.17.212.dnsbltest.spamassassin.org. in
background

and then

dbg: dns: checking RBL dnsbltest.spamassassin.org, set white-firsttrusted
dbg: dns: IPs found: full-external: 212.17.35.14, 193.120.149.226, 61.119.13.18,
210.73.88.134, 144.137.3.98 unt\
 212.17.35.14, 193.120.149.226, 61.119.13.18, 210.73.88.134, 144.137.3.98
originating:
dbg: dns: only inspecting the following IPs: 212.17.35.14
dbg: dns: launching DNS A query for 14.35.17.212.dnsbltest.spamassassin.org in
background




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4290] Test t/dnsbl fails on a slow machine

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4290





------- Additional Comments From schulz@adi.com  2005-04-29 13:28 -------
Well, in our case the same name server is used by both the slow and the
less slow machine (I can't call an Ultra 10 fast :-).  I first ran into the
problem when I was adapting the patch for Bug 4260 to SA 3.0.2 and noticed
that the dnsbl test failed about half the time.  I decided to see what the
current svn version did and found that it had the same failure but with a
80 to 90% failure rate.  That is, on about 10 to 20% of the times that I
ran make test, the test passes with no errors.  As the patch for Bug 4260
was then being changed, I decided to wait a few days and try the now current
svn.  Same problem. The unpatched SA 3.0.2 passes every time as does the
svn version when run on a faster machine (well, my patched 3.0.2 is also fine
on the faster machine).



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4290] Test t/dnsbl fails on a slow machine

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4290





------- Additional Comments From sidney@sidney.com  2005-04-30 07:41 -------
I've run out of time for a while to spend on tracking this down. It appears that
something doesn't work right, in a way that depends on timing, when two
different RBL queries end up using identical DNS queries. This is caused by
dnsbl.t containing both of these two rules:

header DNSBL_TEST_TOP   eval:check_rbl('test', 'dnsbltest.spamassassin.org.')
describe DNSBL_TEST_TOP DNSBL A record match
tflags DNSBL_TEST_TOP   net

header DNSBL_TEST_WHITELIST     eval:check_rbl('white-firsttrusted',
'dnsbltest.spamassassin.org', '127.0.0.1')
describe DNSBL_TEST_WHITELIST   DNSBL whitelist match
tflags DNSBL_TEST_WHITELIST     net nice

Note that they are made to be different by one saying "org." and the other "org"
without the ".", causing there to be two queries that sometimes result in the
second query not getting a response that is seen by code that is looking at the
IO::Socket. If both have the "." then the second query is never made and the
code doesn't handle that either.

My question is if this is a totally artificial aspect of the test or if in real
life we might have a reason to use one RBL in two different ways like this? If
it can arise in real use then we have a bug we have to fix. Otherwise we can
just change the test, perhaps by using a different domain for one of those.

Can anyone chime in with an answer to that?




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4290] Test t/dnsbl fails on a slow machine

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4290





------- Additional Comments From sidney@sidney.com  2005-04-29 14:07 -------
Oh, but we should make sure that we do not break the 3.0.3 tests by removing
anything they use from the DNS.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4290] Test t/dnsbl fails on a slow machine

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4290





------- Additional Comments From sidney@sidney.com  2005-04-30 17:41 -------
> See Sidney's comment #10.  The answer is 'yes'

With further analysis the answer is "not quite". I was fooled by the fact that
behavior is dependent on timing and my firsts tests seemed to work.

If both or neither have the "." then if the second query is done while the first
is still pending, the code treats the second query as a duplicate and never
sends it. The first query is associated only with the rule that requested it, so
the second rule never matches. There is no provision in the code for two DNSRBL
rules to use the same query.

If we do want to be able to do that, then we have some coding to do in addition
to canonicalizing the host name either when reading the cf file or in the hash
key used for identifying duplicates.

If I remember correctly, what we do now is generate the background query and
save an entry for it on a pending list, where the entry identifies the query and
the rule that is requesting it. When the reply arrives, we find the entry for it
and process the rule. When the second query is requested nothing is done if it
is requesting duplicate information. Instead we could still add an entry to the
pending list that is the same as the one that is already there, except referring
to the second rule. When a reply packet is read, it would be processed against
all the matching entries for it on the pending list.

I don't have time to code this for at least the next 24 hours unless I get some
homework done remarkably early, but if nobody has got to yet when I have time I
will.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4290] Test t/dnsbl fails on a slow machine

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4290





------- Additional Comments From sidney@sidney.com  2005-04-29 18:13 -------
Aha! The first one is for 14.35.17.212.dnsbltest.spamassassin.org. and the
second is for 14.35.17.212.dnsbltest.spamassassin.org without the "." at the
end. Otherwise the code would only do hte actual query once and use the result
the second time. If the first query is fully processed before the second query
is made, then there is no problem, otherwise the two queries collide.

Now to think about what is the right way to fix this...




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.