You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Theo Van Dinter <fe...@kluge.net> on 2005/04/24 02:00:19 UTC

"uridnsbl: bogus rr" run ...

Ok, I assume someone wants this:

On Sat, Apr 23, 2005 at 04:18:15PM -0400, Cron Daemon wrote:
> uridnsbl: bogus rr: domain=usafreemerchantsource.com, zone=multi.surbl.org., rr=dns7.hichina.com.	142476	IN	A	218.30.103.250 at /home/corpus/SA/spamassassin-corpora/masses//../lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 634.
> uridnsbl: bogus rr: domain=netian.com, zone=multi.surbl.org., rr=nis.blueweb.co.kr.	100	IN	A	210.205.6.4 at /home/corpus/SA/spamassassin-corpora/masses//../lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 634.
> status:  20% ham: 0      spam: 16104  date: 2005-04-06   now: 2005-04-23 08:05:51
> uridnsbl: bogus rr for domain=com.cn, rule=URIBL_SBL, rr=ns2.zoneedit.com.	6023	IN	A	69.72.158.226 at /home/corpus/SA/spamassassin-corpora/masses//../lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 623.
> uridnsbl: bogus rr for domain=com.cn, rule=URIBL_SBL, rr=dns5.acosm.com.	172653	IN	A	4.79.212.11 at /home/corpus/SA/spamassassin-corpora/masses//../lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 623.
> uridnsbl: bogus rr for domain=bestfriends.co.kr, rule=URIBL_SBL, rr=ns2.dnsm.net.	59020	IN	A	218.7.120.70 at /home/corpus/SA/spamassassin-corpora/masses//../lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 623.
> uridnsbl: bogus rr for domain=210.114.220.52, rule=URIBL_SBL, rr=ns3.texasdns.com.	172790	IN	A	69.56.15.2 at /home/corpus/SA/spamassassin-corpora/masses//../lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 623.
> uridnsbl: bogus rr for domain=refi-with-us.com, rule=URIBL_SBL, rr=ns2.premiumhealthnow.info.	3600	IN	A	69.228.160.52 at /home/corpus/SA/spamassassin-corpora/masses//../lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 623.
[...]


I have ~300K of them.  http://www.kluge.net/~felicity/set1.txt

I'm +1 on making this a non-warn, BTW.  There's nothing a user can do about
it, so it's just really annoying.  It's ok for now since we want to know when
it happens, but ...

-- 
Randomly Generated Tagline:
"Derive the Big Bang Theory from F=ma."    - Michelle about her Physics class

Re: "uridnsbl: bogus rr" run ...

Posted by Sidney Markowitz <si...@sidney.com>.
Loren Wilton wrote:
> Depending on the value of the parameter
> that Perl is deducing from that statement, you may or may not be getting the
> results you expect.

>From the doc:

srand
    Sets the random number seed for the rand operator. If EXPR is omitted,
uses a semi-random value based on the current time and process ID, among
other things.


Since this call is only done when the process id is different (i.e., in a
fork) then srand with no arguments is correct for initializing rand for the
process.

 -- sidney


Re: "uridnsbl: bogus rr" run ...

Posted by Loren Wilton <lw...@earthlink.net>.
>       $self->{_initted} = $$;
>       srand;
>       $self->{resolver}->reinit_post_fork();

In the C/C++ world, srand has a parameter.  I suspect the C srand() function
underlys perl's srand statement.  Depending on the value of the parameter
that Perl is deducing from that statement, you may or may not be getting the
results you expect.

srand() C documentation includes:

The srand function sets the starting point for generating a series of
pseudorandom integers. To reinitialize the generator, use 1 as the seed
argument. Any other value for seed sets the generator to a random starting
point. rand retrieves the pseudorandom numbers that are generated. Calling
rand before any call to srand generates the same sequence as calling srand
with seed passed as 1.

        Loren




Re: "uridnsbl: bogus rr" run ...

Posted by Sidney Markowitz <si...@sidney.com>.
Matt Sergeant wrote:
> May be a problem with forking

Do you think that this code fragment I see in SpamAssassin.pm should work as
well as your fork code, or could relying on this be part of the problem?

sub init {
  my ($self, $use_user_pref) = @_;

  # Allow init() to be called multiple times, but only run once.
  if (defined $self->{_initted}) {
    # If the PID changes, reseed the PRNG and the DNS ID counter
    if ($self->{_initted} != $$) {
      $self->{_initted} = $$;
      srand;
      $self->{resolver}->reinit_post_fork();
    }
    return;
  }

  # Note that this PID has run init()
  $self->{_initted} = $$;


 -- sidney

Re: "uridnsbl: bogus rr" run ...

Posted by Matt Sergeant <ms...@messagelabs.com>.
On 24 Apr 2005, at 17:01, Sidney Markowitz wrote:

> This could happen if the random ID isn't random enough

May be a problem with forking. Here's part of the fork replacement I 
use in my code that uses the single-packet-DNS stuff:

sub _fork {
     my $pid = fork;
     if (!defined($pid)) { die "Cannot fork: $!" }
     return $pid if $pid;

     # Fixup Net::DNS randomness after fork
     srand($$ ^ time);

     local $^W;
     delete $INC{'Net/DNS/Header.pm'};
     require Net::DNS::Header;

     # cope with different versions of Net::DNS
     eval {
         $Net::DNS::Resolver::global{id} = 1;
         $Net::DNS::Resolver::global{id} = 
int(rand(Net::DNS::Resolver::MAX_ID()));
         # print "Next DNS ID: $Net::DNS::Resolver::global{id}\n";
     };
     if ($@) {
         # print "Next DNS ID: " . Net::DNS::Header::nextid() . "\n";
     }

    ...
}


______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________

Re: "uridnsbl: bogus rr" run ...

Posted by Sidney Markowitz <si...@sidney.com>.
I haven't been running mass-checks until now, but I just tried it with svn
trunk and got a couple of bogus rr warnings so far between the 50% and 60%
marks so far. It's taken two and a half hours to get that far, so this is a
very slow process. I just shut down the vmware session that was running the
Windows and Cygwin botslaves on that machine, and I hope that speeds things
up a bit. It's a 1Ghz Athlon machine running Fedora core 3.

In any case, it looks like I'll be able to run my own painfully slow tests
to try things out.

 -- sidney

Re: "uridnsbl: bogus rr" run ...

Posted by Sidney Markowitz <si...@sidney.com>.
Sidney Markowitz wrote:
> use something that combines the pid and id

Brain fade... This patch works by matching information that is in the reply
packet to information in the query packet, which means it has to use the
host name and the packet ID. Duh! Sorry.

Still, we could try some debug log output to determine if the different
processes are using the same source ports. I don't see how we could have
collisions in the ID unless the source ports are the same. If that's it, we
would not have to use the host name to ensure that the reply matches the
query if we had a way of making the source ports different across processes.

Could you run with debug output that shows the pid, packet ID and source
port for the packets that are created in DnsResolver in a run that
demonstrates the bug?

 -- sidney

Re: "uridnsbl: bogus rr" run ...

Posted by Sidney Markowitz <si...@sidney.com>.
Theo Van Dinter wrote:
> The patch does make things *much* slower though, around 3x:
[...]
> Without the patch, lots of issues starting after 80%.

I don't claim that the patch is the most efficient way of dealing with it...
I just wanted to use SHA1 to ensure that there was no chance of an ID
collision. I think we have now verified that ID collision is the likely
proximate cause of the problem.

Still, can three SHA1 calculations compare to the time it takes for a DNS
query? I don't see how the computation would slow things down by a factor of
three.

Perhaps what you are seeing is the difference in wall clock time between
processing a reply to an old packet when it arrives right away vs rejecting
those packets and waiting for the actual reply. If that's what's happening
you are not going to get the faster time when everything works, as it is the
nameserver's response time that is slowing down the run. The faster time is
perhaps just a symptom of the bug?

There is still the question of where the collisions are coming from. Here's
another idea -- Instead of using substr(sha1_base64($host . $id), -7) use
something that combines the pid and id into a six byte string in the three
places in the code where sha1 is used. That will be faster than using SHA1,
which will let you know if the slowdown is due to computation or waiting for
good packets to arrive, and it will let you know if the problem is with
different processes using the same source port for sending the UDP queries.
If it is the latter, we may be able to avoid the collisions if we are better
about picking the source ports.

Are you up for some more few-hour tests? :-)

 -- sidney

Re: "uridnsbl: bogus rr" run ...

Posted by Theo Van Dinter <fe...@kluge.net>.
On Mon, Apr 25, 2005 at 09:54:12PM -0400, Theo Van Dinter wrote:
> I've done 1 run at 14 days with the patch, and no problems found.  I'll run
> another time or two and see what happens.  It takes a good couple of hours to
> do a single 14 day run though. <sigh>

The second run had no errors either.  I have not verified that I get the
same (or near same) results though.  I'm rerunning without the patch as
well, just to see if it was a random occurance before.

The patch does make things *much* slower though, around 3x:

With (~3h):
status: starting run stage                               now: 2005-04-25 08:55:58 PM
status:  10% ham: 337    spam: 2232   date: 2005-04-13   now: 2005-04-25 09:15:27 PM
status:  20% ham: 674    spam: 4464   date: 2005-04-14   now: 2005-04-25 09:34:16 PM
status:  30% ham: 1011   spam: 6696   date: 2005-04-15   now: 2005-04-25 09:53:43 PM
status:  40% ham: 1347   spam: 8929   date: 2005-04-17   now: 2005-04-25 10:14:25 PM
status:  50% ham: 1684   spam: 11161  date: 2005-04-18   now: 2005-04-25 10:33:53 PM
status:  60% ham: 2021   spam: 13393  date: 2005-04-20   now: 2005-04-25 10:53:15 PM
status:  70% ham: 2358   spam: 15625  date: 2005-04-21   now: 2005-04-25 11:12:21 PM
status:  80% ham: 2695   spam: 17857  date: 2005-04-22   now: 2005-04-25 11:31:29 PM
status:  90% ham: 3031   spam: 20090  date: 2005-04-23   now: 2005-04-25 11:50:28 PM
status: 100% ham: 3367   spam: 22314  date: 2005-04-25   now: 2005-04-26 12:09:24 AM
status: completed run stage                              now: 2005-04-26 12:09:29 AM

Without (~1h):
status: starting run stage                               now: 2005-04-25 04:28:05 PM
status:  10% ham: 338    spam: 2231   date: 2005-04-12   now: 2005-04-25 04:34:47 PM
status:  20% ham: 675    spam: 4463   date: 2005-04-14   now: 2005-04-25 04:40:59 PM
status:  30% ham: 1010   spam: 6697   date: 2005-04-14   now: 2005-04-25 04:47:13 PM
status:  40% ham: 1348   spam: 8928   date: 2005-04-17   now: 2005-04-25 04:53:57 PM
status:  50% ham: 1685   spam: 11160  date: 2005-04-18   now: 2005-04-25 04:59:41 PM
status:  60% ham: 2022   spam: 13392  date: 2005-04-20   now: 2005-04-25 05:05:41 PM
status:  70% ham: 2358   spam: 15625  date: 2005-04-21   now: 2005-04-25 05:11:14 PM
status:  80% ham: 2695   spam: 17857  date: 2005-04-22   now: 2005-04-25 05:16:55 PM
status:  90% ham: 3032   spam: 20089  date: 2005-04-23   now: 2005-04-25 05:22:06 PM
status: 100% ham: 3367   spam: 22314  date: 2005-04-25   now: 2005-04-25 05:27:35 PM
status: completed run stage                              now: 2005-04-25 05:27:36 PM


Without the patch, lots of issues starting after 80%.

I'm thinking that a per-request ID isn't necessary, but a per instance
ID is -- either initialized at check() or whenever the module gets a
new instance setup.

-- 
Randomly Generated Tagline:
A writer thinks of critics as a tree feels about dogs.

Re: "uridnsbl: bogus rr" run ...

Posted by Theo Van Dinter <fe...@kluge.net>.
On Tue, Apr 26, 2005 at 01:39:04AM +1200, Sidney Markowitz wrote:
> It is written a bit crudely, but if Theo or someone else who is seeing the
> problem would try this in a mass test it would demonstrate whether the
> problem has anything to do with this:

Well, the first 2 times I ran the current trunk code w/ messages from 3 days
and 7 days, I didn't see any problem.  When I ran a third time at 14 days, I
got a bunch between 90-100%.

I've done 1 run at 14 days with the patch, and no problems found.  I'll run
another time or two and see what happens.  It takes a good couple of hours to
do a single 14 day run though. <sigh>

-- 
Randomly Generated Tagline:
"I instigated Linus's first shooting expedition in a long while a few months
 back (I can report that he is a steady, competent shot with a 9mm semi)."
                   - Eric Raymond

Re: "uridnsbl: bogus rr" run ...

Posted by Sidney Markowitz <si...@sidney.com>.
This is the corrected patch that ensures that IDs are not colliding by
including the host name in an SHA1 hash with the 16 bit ID counter.

It is written a bit crudely, but if Theo or someone else who is seeing the
problem would try this in a mass test it would demonstrate whether the
problem has anything to do with this:

 -- sidney

Index: lib/Mail/SpamAssassin/Dns.pm
===================================================================
--- lib/Mail/SpamAssassin/Dns.pm        (revision 164570)
+++ lib/Mail/SpamAssassin/Dns.pm        (working copy)
@@ -22,6 +22,7 @@
 use Mail::SpamAssassin::Conf;
 use Mail::SpamAssassin::PerMsgStatus;
 use Mail::SpamAssassin::Constants qw(:ip);
+use Digest::SHA1 qw(sha1_base64);
 use File::Spec;
 use IO::Socket;
 use IPC::Open2;
@@ -145,7 +146,10 @@

   return $self->{resolver}->bgsend($host, $type, undef, sub {
           my $pkt = shift;
-          $self->{dnsfinished}->{$pkt->header->id} = $pkt;
+          my $h = $host;
+          $h =~ s/\.$//;
+          my $id = substr(sha1_base64($h . $pkt->header->id), -8);
+          $self->{dnsfinished}->{$id} = $pkt;
         });
 }

Index: lib/Mail/SpamAssassin/DnsResolver.pm
===================================================================
--- lib/Mail/SpamAssassin/DnsResolver.pm        (revision 164570)
+++ lib/Mail/SpamAssassin/DnsResolver.pm        (working copy)
@@ -45,7 +45,7 @@
 use Mail::SpamAssassin::Logger;

 use IO::Socket::INET;
-
+use Digest::SHA1 qw(sha1_base64);
 our @ISA = qw();

 # a counter value to use for DNS ID numbers in new_dns_packet().
@@ -243,8 +243,8 @@
   return if $self->{no_resolver};

   my $pkt = $self->new_dns_packet($host, $type, $class);
-
-  my $id = $pkt->header->id;
+  $host =~ s/\.$//;
+  my $id = substr(sha1_base64($host . $pkt->header->id), -8);
   my $data = $pkt->data;
   my $dest = $self->{dest};
   if (!$self->{sock}->send ($pkt->data, 0, $self->{dest})) {
@@ -291,8 +291,11 @@
       defined $packet->answer)
   {
     my $header = $packet->header;
-    my $id = $header->id;
-
+    my @questions = $packet->question;
+    my $ques = $questions[0];
+    my $host = $ques->qname;
+    my $nid = $header->id;
+    my $id = substr(sha1_base64($host . $nid), -8);
     # dbg("dns: reply id=$id");

     my $cb = delete $self->{id_to_callback}->{$id};

Re: "uridnsbl: bogus rr" run ...

Posted by Sidney Markowitz <si...@sidney.com>.
Loren Wilton wrote:
> How about a simple debug printout of the id value sent and the id value
> received?  Maybe it is as simple as the id matching code is failing.

That's definitely a better idea considering that there is a bug in the patch
I posted that prevents any of the DNS stuff from working :-).

On the other hand, it does look like the id matching code is working and it
is difficult to see just from looking at tons of debug logs if IDs are
getting reused across processes and getting mixed up through use of the same
port. I'll see if I can get the sha1 version working better in case Theo is
inclined to try it to see what it does.

 -- sidney

Re: "uridnsbl: bogus rr" run ...

Posted by Loren Wilton <lw...@earthlink.net>.
How about a simple debug printout of the id value sent and the id value
received?  Maybe it is as simple as the id matching code is failing.

        Loren


Re: "uridnsbl: bogus rr" run ...

Posted by Sidney Markowitz <si...@sidney.com>.
Theo Van Dinter wrote:
> It looks like 4260 was
> committed as r161778, the nightly run was r164362.

Do people think we should reopen 4260?

This could happen if the random ID isn't random enough or 16 bits isn't
large enough to avoid collisions. I don't see how that would happen if
different processes choose different ports to listen on, as there should be
no way then for queries to collide across processes and with the ID being
incremented each time there should be no collision within the same process.

If somehow the IDs are colliding, the fix would be to include some
information from the question along with the 16 bit ID to prevent that. I
have a small patch that will do that, but I would like to see it used in a
test to find out if it has anything to do with the problem before proposing
to use it for real.

Theo, would you be willing to run a mass test with this to see if it helps?

$ svn diff lib/Mail/SpamAssassin/DnsResolver.pm
Index: lib/Mail/SpamAssassin/DnsResolver.pm
===================================================================
--- lib/Mail/SpamAssassin/DnsResolver.pm        (revision 164463)
+++ lib/Mail/SpamAssassin/DnsResolver.pm        (working copy)
@@ -45,7 +45,7 @@
 use Mail::SpamAssassin::Logger;

 use IO::Socket::INET;
-
+use Digest::SHA1 qw(sha1_base64);
 our @ISA = qw();

 # a counter value to use for DNS ID numbers in new_dns_packet().
@@ -243,8 +243,8 @@
   return if $self->{no_resolver};

   my $pkt = $self->new_dns_packet($host, $type, $class);
-
-  my $id = $pkt->header->id;
+  $host =~ s/\.$//;
+  my $id = substr(sha1_base64($host . $pkt->header->id), -8);
   my $data = $pkt->data;
   my $dest = $self->{dest};
   if (!$self->{sock}->send ($pkt->data, 0, $self->{dest})) {
@@ -291,8 +291,11 @@
       defined $packet->answer)
   {
     my $header = $packet->header;
-    my $id = $header->id;
-
+    my @questions = $packet->question;
+    my $ques = $questions[0];
+    my $host = $ques->qname;
+    my $nid = $header->id;
+    my $id = substr(sha1_base64($host . $nid), -8);
     # dbg("dns: reply id=$id");

     my $cb = delete $self->{id_to_callback}->{$id};


Re: "uridnsbl: bogus rr" run ...

Posted by Rod Begbie <ro...@gmail.com>.
On 4/23/05, Sidney Markowitz <si...@sidney.com> wrote:
> Yuck, this looks like you are still getting DNS records in the wrong order.
> Look at that first log entry. It says that a query for
> usafreemerchantsource.com.multi.surbl.org is getting the response that is
> correct for query for dns7.hichina.com. 

I got a bunch of them too in my weekly net mass-check:

uridnsbl: bogus rr for domain=gdfhgfnhg.com, rule=URIBL_SBL, rr=ns1.internet.co
.       172789  IN      A       63.236.72.133 at /Users/rod/sanightly/trunk/mas
es/../lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 623.
uridnsbl: bogus rr for domain=ing21s.com, rule=URIBL_SBL, rr=ns3.internet.com.
172789  IN      A       63.236.72.135 at /Users/rod/sanightly/trunk/masses/../l
b/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 623.
uridnsbl: bogus rr for domain=rau.co.kr, rule=URIBL_SBL, rr=ns2.osdn.com.
6847    IN      A       66.35.250.11 at /Users/rod/sanightly/trunk/masses/../li
/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 623.
uridnsbl: bogus rr for domain=rau.co.kr, rule=URIBL_SBL, rr=ns2.vasoftware.com.
170336  IN      A       12.152.184.136 at /Users/rod/sanightly/trunk/masses/../
ib/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 623.    

Rod.

-- 
:: Rod Begbie :: http://groovymother.com/ ::

Re: "uridnsbl: bogus rr" run ...

Posted by Sidney Markowitz <si...@sidney.com>.
Theo Van Dinter wrote:
> The output is from my Saturday weekly net run.  It looks like 4260 was
> committed as r161778, the nightly run was r164362.

Yuck, this looks like you are still getting DNS records in the wrong order.
Look at that first log entry. It says that a query for
usafreemerchantsource.com.multi.surbl.org is getting the response that is
correct for query for dns7.hichina.com. That's the problem that the warning
is supposed to help us catch. That should no longer be possible given that
there is a unique ID associated with each query and it is supposed to match
the ID in the response.

This is serious. It certainly proves the worth of having the warning in there.

 -- sidney

Re: "uridnsbl: bogus rr" run ...

Posted by Theo Van Dinter <fe...@kluge.net>.
On Sun, Apr 24, 2005 at 12:13:02PM +1200, Sidney Markowitz wrote:
> > I have ~300K of them.  http://www.kluge.net/~felicity/set1.txt
> 
> This should not be happening anymore since the patch for bug #4260 was
> committed to trunk. Are you still getting them? The warning was only there
> to help us track down that problem.

The output is from my Saturday weekly net run.  It looks like 4260 was
committed as r161778, the nightly run was r164362.

-- 
Randomly Generated Tagline:
I'm practicing assertiveness.  Do you think that's okay?

Re: "uridnsbl: bogus rr" run ...

Posted by Sidney Markowitz <si...@sidney.com>.
Daniel Quinlan wrote:
> Sidney Markowitz <si...@sidney.com> writes: 
>>If we are sure that the problem has been fixed I'm also +1 on removing it.
[...]
> I'm -1 on removing them for now.  The cost is infinitesimal compared to
> the cost of doing DNS queries.

Ack, I don't know what my fingers were doing when I was typing. I meant to
say "ambivalent" not +1. Which means I have no argument with your -1.

 -- sidney

Re: "uridnsbl: bogus rr" run ...

Posted by Daniel Quinlan <qu...@pathname.com>.
Sidney Markowitz <si...@sidney.com> writes:

> If we are sure that the problem has been fixed I'm also +1 on removing it.
> It would be nice to know if it happens, but if the problem has been fixed it
> is just some extra code that will never be run.

I'm -1 on removing them for now.  The cost is infinitesimal compared to
the cost of doing DNS queries.

Daniel

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/

Re: "uridnsbl: bogus rr" run ...

Posted by Sidney Markowitz <si...@sidney.com>.
Theo Van Dinter wrote:
> I have ~300K of them.  http://www.kluge.net/~felicity/set1.txt

This should not be happening anymore since the patch for bug #4260 was
committed to trunk. Are you still getting them? The warning was only there
to help us track down that problem.

If we are sure that the problem has been fixed I'm also +1 on removing it.
It would be nice to know if it happens, but if the problem has been fixed it
is just some extra code that will never be run.

 -- sidney