You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Theo Van Dinter <fe...@kluge.net> on 2005/04/24 02:00:19 UTC
"uridnsbl: bogus rr" run ...
Ok, I assume someone wants this:
On Sat, Apr 23, 2005 at 04:18:15PM -0400, Cron Daemon wrote:
> uridnsbl: bogus rr: domain=usafreemerchantsource.com, zone=multi.surbl.org., rr=dns7.hichina.com. 142476 IN A 218.30.103.250 at /home/corpus/SA/spamassassin-corpora/masses//../lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 634.
> uridnsbl: bogus rr: domain=netian.com, zone=multi.surbl.org., rr=nis.blueweb.co.kr. 100 IN A 210.205.6.4 at /home/corpus/SA/spamassassin-corpora/masses//../lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 634.
> status: 20% ham: 0 spam: 16104 date: 2005-04-06 now: 2005-04-23 08:05:51
> uridnsbl: bogus rr for domain=com.cn, rule=URIBL_SBL, rr=ns2.zoneedit.com. 6023 IN A 69.72.158.226 at /home/corpus/SA/spamassassin-corpora/masses//../lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 623.
> uridnsbl: bogus rr for domain=com.cn, rule=URIBL_SBL, rr=dns5.acosm.com. 172653 IN A 4.79.212.11 at /home/corpus/SA/spamassassin-corpora/masses//../lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 623.
> uridnsbl: bogus rr for domain=bestfriends.co.kr, rule=URIBL_SBL, rr=ns2.dnsm.net. 59020 IN A 218.7.120.70 at /home/corpus/SA/spamassassin-corpora/masses//../lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 623.
> uridnsbl: bogus rr for domain=210.114.220.52, rule=URIBL_SBL, rr=ns3.texasdns.com. 172790 IN A 69.56.15.2 at /home/corpus/SA/spamassassin-corpora/masses//../lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 623.
> uridnsbl: bogus rr for domain=refi-with-us.com, rule=URIBL_SBL, rr=ns2.premiumhealthnow.info. 3600 IN A 69.228.160.52 at /home/corpus/SA/spamassassin-corpora/masses//../lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 623.
[...]
I have ~300K of them. http://www.kluge.net/~felicity/set1.txt
I'm +1 on making this a non-warn, BTW. There's nothing a user can do about
it, so it's just really annoying. It's ok for now since we want to know when
it happens, but ...
--
Randomly Generated Tagline:
"Derive the Big Bang Theory from F=ma." - Michelle about her Physics class
Re: "uridnsbl: bogus rr" run ...
Posted by Sidney Markowitz <si...@sidney.com>.
Loren Wilton wrote:
> Depending on the value of the parameter
> that Perl is deducing from that statement, you may or may not be getting the
> results you expect.
>From the doc:
srand
Sets the random number seed for the rand operator. If EXPR is omitted,
uses a semi-random value based on the current time and process ID, among
other things.
Since this call is only done when the process id is different (i.e., in a
fork) then srand with no arguments is correct for initializing rand for the
process.
-- sidney
Re: "uridnsbl: bogus rr" run ...
Posted by Loren Wilton <lw...@earthlink.net>.
> $self->{_initted} = $$;
> srand;
> $self->{resolver}->reinit_post_fork();
In the C/C++ world, srand has a parameter. I suspect the C srand() function
underlys perl's srand statement. Depending on the value of the parameter
that Perl is deducing from that statement, you may or may not be getting the
results you expect.
srand() C documentation includes:
The srand function sets the starting point for generating a series of
pseudorandom integers. To reinitialize the generator, use 1 as the seed
argument. Any other value for seed sets the generator to a random starting
point. rand retrieves the pseudorandom numbers that are generated. Calling
rand before any call to srand generates the same sequence as calling srand
with seed passed as 1.
Loren
Re: "uridnsbl: bogus rr" run ...
Posted by Sidney Markowitz <si...@sidney.com>.
Matt Sergeant wrote:
> May be a problem with forking
Do you think that this code fragment I see in SpamAssassin.pm should work as
well as your fork code, or could relying on this be part of the problem?
sub init {
my ($self, $use_user_pref) = @_;
# Allow init() to be called multiple times, but only run once.
if (defined $self->{_initted}) {
# If the PID changes, reseed the PRNG and the DNS ID counter
if ($self->{_initted} != $$) {
$self->{_initted} = $$;
srand;
$self->{resolver}->reinit_post_fork();
}
return;
}
# Note that this PID has run init()
$self->{_initted} = $$;
-- sidney
Re: "uridnsbl: bogus rr" run ...
Posted by Matt Sergeant <ms...@messagelabs.com>.
On 24 Apr 2005, at 17:01, Sidney Markowitz wrote:
> This could happen if the random ID isn't random enough
May be a problem with forking. Here's part of the fork replacement I
use in my code that uses the single-packet-DNS stuff:
sub _fork {
my $pid = fork;
if (!defined($pid)) { die "Cannot fork: $!" }
return $pid if $pid;
# Fixup Net::DNS randomness after fork
srand($$ ^ time);
local $^W;
delete $INC{'Net/DNS/Header.pm'};
require Net::DNS::Header;
# cope with different versions of Net::DNS
eval {
$Net::DNS::Resolver::global{id} = 1;
$Net::DNS::Resolver::global{id} =
int(rand(Net::DNS::Resolver::MAX_ID()));
# print "Next DNS ID: $Net::DNS::Resolver::global{id}\n";
};
if ($@) {
# print "Next DNS ID: " . Net::DNS::Header::nextid() . "\n";
}
...
}
______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________
Re: "uridnsbl: bogus rr" run ...
Posted by Sidney Markowitz <si...@sidney.com>.
I haven't been running mass-checks until now, but I just tried it with svn
trunk and got a couple of bogus rr warnings so far between the 50% and 60%
marks so far. It's taken two and a half hours to get that far, so this is a
very slow process. I just shut down the vmware session that was running the
Windows and Cygwin botslaves on that machine, and I hope that speeds things
up a bit. It's a 1Ghz Athlon machine running Fedora core 3.
In any case, it looks like I'll be able to run my own painfully slow tests
to try things out.
-- sidney
Re: "uridnsbl: bogus rr" run ...
Posted by Sidney Markowitz <si...@sidney.com>.
Sidney Markowitz wrote:
> use something that combines the pid and id
Brain fade... This patch works by matching information that is in the reply
packet to information in the query packet, which means it has to use the
host name and the packet ID. Duh! Sorry.
Still, we could try some debug log output to determine if the different
processes are using the same source ports. I don't see how we could have
collisions in the ID unless the source ports are the same. If that's it, we
would not have to use the host name to ensure that the reply matches the
query if we had a way of making the source ports different across processes.
Could you run with debug output that shows the pid, packet ID and source
port for the packets that are created in DnsResolver in a run that
demonstrates the bug?
-- sidney
Re: "uridnsbl: bogus rr" run ...
Posted by Sidney Markowitz <si...@sidney.com>.
Theo Van Dinter wrote:
> The patch does make things *much* slower though, around 3x:
[...]
> Without the patch, lots of issues starting after 80%.
I don't claim that the patch is the most efficient way of dealing with it...
I just wanted to use SHA1 to ensure that there was no chance of an ID
collision. I think we have now verified that ID collision is the likely
proximate cause of the problem.
Still, can three SHA1 calculations compare to the time it takes for a DNS
query? I don't see how the computation would slow things down by a factor of
three.
Perhaps what you are seeing is the difference in wall clock time between
processing a reply to an old packet when it arrives right away vs rejecting
those packets and waiting for the actual reply. If that's what's happening
you are not going to get the faster time when everything works, as it is the
nameserver's response time that is slowing down the run. The faster time is
perhaps just a symptom of the bug?
There is still the question of where the collisions are coming from. Here's
another idea -- Instead of using substr(sha1_base64($host . $id), -7) use
something that combines the pid and id into a six byte string in the three
places in the code where sha1 is used. That will be faster than using SHA1,
which will let you know if the slowdown is due to computation or waiting for
good packets to arrive, and it will let you know if the problem is with
different processes using the same source port for sending the UDP queries.
If it is the latter, we may be able to avoid the collisions if we are better
about picking the source ports.
Are you up for some more few-hour tests? :-)
-- sidney
Re: "uridnsbl: bogus rr" run ...
Posted by Theo Van Dinter <fe...@kluge.net>.
On Mon, Apr 25, 2005 at 09:54:12PM -0400, Theo Van Dinter wrote:
> I've done 1 run at 14 days with the patch, and no problems found. I'll run
> another time or two and see what happens. It takes a good couple of hours to
> do a single 14 day run though. <sigh>
The second run had no errors either. I have not verified that I get the
same (or near same) results though. I'm rerunning without the patch as
well, just to see if it was a random occurance before.
The patch does make things *much* slower though, around 3x:
With (~3h):
status: starting run stage now: 2005-04-25 08:55:58 PM
status: 10% ham: 337 spam: 2232 date: 2005-04-13 now: 2005-04-25 09:15:27 PM
status: 20% ham: 674 spam: 4464 date: 2005-04-14 now: 2005-04-25 09:34:16 PM
status: 30% ham: 1011 spam: 6696 date: 2005-04-15 now: 2005-04-25 09:53:43 PM
status: 40% ham: 1347 spam: 8929 date: 2005-04-17 now: 2005-04-25 10:14:25 PM
status: 50% ham: 1684 spam: 11161 date: 2005-04-18 now: 2005-04-25 10:33:53 PM
status: 60% ham: 2021 spam: 13393 date: 2005-04-20 now: 2005-04-25 10:53:15 PM
status: 70% ham: 2358 spam: 15625 date: 2005-04-21 now: 2005-04-25 11:12:21 PM
status: 80% ham: 2695 spam: 17857 date: 2005-04-22 now: 2005-04-25 11:31:29 PM
status: 90% ham: 3031 spam: 20090 date: 2005-04-23 now: 2005-04-25 11:50:28 PM
status: 100% ham: 3367 spam: 22314 date: 2005-04-25 now: 2005-04-26 12:09:24 AM
status: completed run stage now: 2005-04-26 12:09:29 AM
Without (~1h):
status: starting run stage now: 2005-04-25 04:28:05 PM
status: 10% ham: 338 spam: 2231 date: 2005-04-12 now: 2005-04-25 04:34:47 PM
status: 20% ham: 675 spam: 4463 date: 2005-04-14 now: 2005-04-25 04:40:59 PM
status: 30% ham: 1010 spam: 6697 date: 2005-04-14 now: 2005-04-25 04:47:13 PM
status: 40% ham: 1348 spam: 8928 date: 2005-04-17 now: 2005-04-25 04:53:57 PM
status: 50% ham: 1685 spam: 11160 date: 2005-04-18 now: 2005-04-25 04:59:41 PM
status: 60% ham: 2022 spam: 13392 date: 2005-04-20 now: 2005-04-25 05:05:41 PM
status: 70% ham: 2358 spam: 15625 date: 2005-04-21 now: 2005-04-25 05:11:14 PM
status: 80% ham: 2695 spam: 17857 date: 2005-04-22 now: 2005-04-25 05:16:55 PM
status: 90% ham: 3032 spam: 20089 date: 2005-04-23 now: 2005-04-25 05:22:06 PM
status: 100% ham: 3367 spam: 22314 date: 2005-04-25 now: 2005-04-25 05:27:35 PM
status: completed run stage now: 2005-04-25 05:27:36 PM
Without the patch, lots of issues starting after 80%.
I'm thinking that a per-request ID isn't necessary, but a per instance
ID is -- either initialized at check() or whenever the module gets a
new instance setup.
--
Randomly Generated Tagline:
A writer thinks of critics as a tree feels about dogs.
Re: "uridnsbl: bogus rr" run ...
Posted by Theo Van Dinter <fe...@kluge.net>.
On Tue, Apr 26, 2005 at 01:39:04AM +1200, Sidney Markowitz wrote:
> It is written a bit crudely, but if Theo or someone else who is seeing the
> problem would try this in a mass test it would demonstrate whether the
> problem has anything to do with this:
Well, the first 2 times I ran the current trunk code w/ messages from 3 days
and 7 days, I didn't see any problem. When I ran a third time at 14 days, I
got a bunch between 90-100%.
I've done 1 run at 14 days with the patch, and no problems found. I'll run
another time or two and see what happens. It takes a good couple of hours to
do a single 14 day run though. <sigh>
--
Randomly Generated Tagline:
"I instigated Linus's first shooting expedition in a long while a few months
back (I can report that he is a steady, competent shot with a 9mm semi)."
- Eric Raymond
Re: "uridnsbl: bogus rr" run ...
Posted by Sidney Markowitz <si...@sidney.com>.
This is the corrected patch that ensures that IDs are not colliding by
including the host name in an SHA1 hash with the 16 bit ID counter.
It is written a bit crudely, but if Theo or someone else who is seeing the
problem would try this in a mass test it would demonstrate whether the
problem has anything to do with this:
-- sidney
Index: lib/Mail/SpamAssassin/Dns.pm
===================================================================
--- lib/Mail/SpamAssassin/Dns.pm (revision 164570)
+++ lib/Mail/SpamAssassin/Dns.pm (working copy)
@@ -22,6 +22,7 @@
use Mail::SpamAssassin::Conf;
use Mail::SpamAssassin::PerMsgStatus;
use Mail::SpamAssassin::Constants qw(:ip);
+use Digest::SHA1 qw(sha1_base64);
use File::Spec;
use IO::Socket;
use IPC::Open2;
@@ -145,7 +146,10 @@
return $self->{resolver}->bgsend($host, $type, undef, sub {
my $pkt = shift;
- $self->{dnsfinished}->{$pkt->header->id} = $pkt;
+ my $h = $host;
+ $h =~ s/\.$//;
+ my $id = substr(sha1_base64($h . $pkt->header->id), -8);
+ $self->{dnsfinished}->{$id} = $pkt;
});
}
Index: lib/Mail/SpamAssassin/DnsResolver.pm
===================================================================
--- lib/Mail/SpamAssassin/DnsResolver.pm (revision 164570)
+++ lib/Mail/SpamAssassin/DnsResolver.pm (working copy)
@@ -45,7 +45,7 @@
use Mail::SpamAssassin::Logger;
use IO::Socket::INET;
-
+use Digest::SHA1 qw(sha1_base64);
our @ISA = qw();
# a counter value to use for DNS ID numbers in new_dns_packet().
@@ -243,8 +243,8 @@
return if $self->{no_resolver};
my $pkt = $self->new_dns_packet($host, $type, $class);
-
- my $id = $pkt->header->id;
+ $host =~ s/\.$//;
+ my $id = substr(sha1_base64($host . $pkt->header->id), -8);
my $data = $pkt->data;
my $dest = $self->{dest};
if (!$self->{sock}->send ($pkt->data, 0, $self->{dest})) {
@@ -291,8 +291,11 @@
defined $packet->answer)
{
my $header = $packet->header;
- my $id = $header->id;
-
+ my @questions = $packet->question;
+ my $ques = $questions[0];
+ my $host = $ques->qname;
+ my $nid = $header->id;
+ my $id = substr(sha1_base64($host . $nid), -8);
# dbg("dns: reply id=$id");
my $cb = delete $self->{id_to_callback}->{$id};
Re: "uridnsbl: bogus rr" run ...
Posted by Sidney Markowitz <si...@sidney.com>.
Loren Wilton wrote:
> How about a simple debug printout of the id value sent and the id value
> received? Maybe it is as simple as the id matching code is failing.
That's definitely a better idea considering that there is a bug in the patch
I posted that prevents any of the DNS stuff from working :-).
On the other hand, it does look like the id matching code is working and it
is difficult to see just from looking at tons of debug logs if IDs are
getting reused across processes and getting mixed up through use of the same
port. I'll see if I can get the sha1 version working better in case Theo is
inclined to try it to see what it does.
-- sidney
Re: "uridnsbl: bogus rr" run ...
Posted by Loren Wilton <lw...@earthlink.net>.
How about a simple debug printout of the id value sent and the id value
received? Maybe it is as simple as the id matching code is failing.
Loren
Re: "uridnsbl: bogus rr" run ...
Posted by Sidney Markowitz <si...@sidney.com>.
Theo Van Dinter wrote:
> It looks like 4260 was
> committed as r161778, the nightly run was r164362.
Do people think we should reopen 4260?
This could happen if the random ID isn't random enough or 16 bits isn't
large enough to avoid collisions. I don't see how that would happen if
different processes choose different ports to listen on, as there should be
no way then for queries to collide across processes and with the ID being
incremented each time there should be no collision within the same process.
If somehow the IDs are colliding, the fix would be to include some
information from the question along with the 16 bit ID to prevent that. I
have a small patch that will do that, but I would like to see it used in a
test to find out if it has anything to do with the problem before proposing
to use it for real.
Theo, would you be willing to run a mass test with this to see if it helps?
$ svn diff lib/Mail/SpamAssassin/DnsResolver.pm
Index: lib/Mail/SpamAssassin/DnsResolver.pm
===================================================================
--- lib/Mail/SpamAssassin/DnsResolver.pm (revision 164463)
+++ lib/Mail/SpamAssassin/DnsResolver.pm (working copy)
@@ -45,7 +45,7 @@
use Mail::SpamAssassin::Logger;
use IO::Socket::INET;
-
+use Digest::SHA1 qw(sha1_base64);
our @ISA = qw();
# a counter value to use for DNS ID numbers in new_dns_packet().
@@ -243,8 +243,8 @@
return if $self->{no_resolver};
my $pkt = $self->new_dns_packet($host, $type, $class);
-
- my $id = $pkt->header->id;
+ $host =~ s/\.$//;
+ my $id = substr(sha1_base64($host . $pkt->header->id), -8);
my $data = $pkt->data;
my $dest = $self->{dest};
if (!$self->{sock}->send ($pkt->data, 0, $self->{dest})) {
@@ -291,8 +291,11 @@
defined $packet->answer)
{
my $header = $packet->header;
- my $id = $header->id;
-
+ my @questions = $packet->question;
+ my $ques = $questions[0];
+ my $host = $ques->qname;
+ my $nid = $header->id;
+ my $id = substr(sha1_base64($host . $nid), -8);
# dbg("dns: reply id=$id");
my $cb = delete $self->{id_to_callback}->{$id};
Re: "uridnsbl: bogus rr" run ...
Posted by Rod Begbie <ro...@gmail.com>.
On 4/23/05, Sidney Markowitz <si...@sidney.com> wrote:
> Yuck, this looks like you are still getting DNS records in the wrong order.
> Look at that first log entry. It says that a query for
> usafreemerchantsource.com.multi.surbl.org is getting the response that is
> correct for query for dns7.hichina.com.
I got a bunch of them too in my weekly net mass-check:
uridnsbl: bogus rr for domain=gdfhgfnhg.com, rule=URIBL_SBL, rr=ns1.internet.co
. 172789 IN A 63.236.72.133 at /Users/rod/sanightly/trunk/mas
es/../lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 623.
uridnsbl: bogus rr for domain=ing21s.com, rule=URIBL_SBL, rr=ns3.internet.com.
172789 IN A 63.236.72.135 at /Users/rod/sanightly/trunk/masses/../l
b/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 623.
uridnsbl: bogus rr for domain=rau.co.kr, rule=URIBL_SBL, rr=ns2.osdn.com.
6847 IN A 66.35.250.11 at /Users/rod/sanightly/trunk/masses/../li
/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 623.
uridnsbl: bogus rr for domain=rau.co.kr, rule=URIBL_SBL, rr=ns2.vasoftware.com.
170336 IN A 12.152.184.136 at /Users/rod/sanightly/trunk/masses/../
ib/Mail/SpamAssassin/Plugin/URIDNSBL.pm line 623.
Rod.
--
:: Rod Begbie :: http://groovymother.com/ ::
Re: "uridnsbl: bogus rr" run ...
Posted by Sidney Markowitz <si...@sidney.com>.
Theo Van Dinter wrote:
> The output is from my Saturday weekly net run. It looks like 4260 was
> committed as r161778, the nightly run was r164362.
Yuck, this looks like you are still getting DNS records in the wrong order.
Look at that first log entry. It says that a query for
usafreemerchantsource.com.multi.surbl.org is getting the response that is
correct for query for dns7.hichina.com. That's the problem that the warning
is supposed to help us catch. That should no longer be possible given that
there is a unique ID associated with each query and it is supposed to match
the ID in the response.
This is serious. It certainly proves the worth of having the warning in there.
-- sidney
Re: "uridnsbl: bogus rr" run ...
Posted by Theo Van Dinter <fe...@kluge.net>.
On Sun, Apr 24, 2005 at 12:13:02PM +1200, Sidney Markowitz wrote:
> > I have ~300K of them. http://www.kluge.net/~felicity/set1.txt
>
> This should not be happening anymore since the patch for bug #4260 was
> committed to trunk. Are you still getting them? The warning was only there
> to help us track down that problem.
The output is from my Saturday weekly net run. It looks like 4260 was
committed as r161778, the nightly run was r164362.
--
Randomly Generated Tagline:
I'm practicing assertiveness. Do you think that's okay?
Re: "uridnsbl: bogus rr" run ...
Posted by Sidney Markowitz <si...@sidney.com>.
Daniel Quinlan wrote:
> Sidney Markowitz <si...@sidney.com> writes:
>>If we are sure that the problem has been fixed I'm also +1 on removing it.
[...]
> I'm -1 on removing them for now. The cost is infinitesimal compared to
> the cost of doing DNS queries.
Ack, I don't know what my fingers were doing when I was typing. I meant to
say "ambivalent" not +1. Which means I have no argument with your -1.
-- sidney
Re: "uridnsbl: bogus rr" run ...
Posted by Daniel Quinlan <qu...@pathname.com>.
Sidney Markowitz <si...@sidney.com> writes:
> If we are sure that the problem has been fixed I'm also +1 on removing it.
> It would be nice to know if it happens, but if the problem has been fixed it
> is just some extra code that will never be run.
I'm -1 on removing them for now. The cost is infinitesimal compared to
the cost of doing DNS queries.
Daniel
--
Daniel Quinlan
http://www.pathname.com/~quinlan/
Re: "uridnsbl: bogus rr" run ...
Posted by Sidney Markowitz <si...@sidney.com>.
Theo Van Dinter wrote:
> I have ~300K of them. http://www.kluge.net/~felicity/set1.txt
This should not be happening anymore since the patch for bug #4260 was
committed to trunk. Are you still getting them? The warning was only there
to help us track down that problem.
If we are sure that the problem has been fixed I'm also +1 on removing it.
It would be nice to know if it happens, but if the problem has been fixed it
is just some extra code that will never be run.
-- sidney