You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2005/04/13 19:16:28 UTC

[Bug 4260] New: rewrite DNS code to use a single socket, event-based model

http://bugzilla.spamassassin.org/show_bug.cgi?id=4260

           Summary: rewrite DNS code to use a single socket, event-based
                    model
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Libraries
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: jm@jmason.org


This would fix DNS answers getting mixed up (bug 3997), and be more efficient in
terms of fd usage (bug 4016).

>From bug 3997:

Getting more relevant documentation in this one place... Here is Matt Sergeant's
post to sa-dev from last January with a recipe for the single-socket solution
which would have fixed this problem if we had only understood it back then:

Matt, I have a question about this. These four lines

        my $packet = $self->{res}->make_query_packet($host);
        my $packet_data = $packet->data;
        my $h = $packet->header;
        my $id = $h->id;

seem to imply that make_query_packet generates a unique ID when it makes a
header. Is that the case? Then we don't need all the counter code we just put in
to generate unique IDs?

   ------------

OK, after a bit of a chat with Tony Finch I had a go at making Net::DNS use one
socket. Tony is working on patching Net::DNS to facilitate this, but I decided
that was too much work and decided to try doing it without a patch.

I can't give you code, for two reasons - 1) IP (sorry), but mostly it's because
of 2) I use Danga::Socket to do my asynch I/O, and my code is very tied to that
architecture.

So here's a basic recipe:

I use two objects. One which is the resolver, and one singleton which is a high
level interface onto making the DNS queries and getting responses.

A query method accepts a number of hosts/IPs to query for. This calls
make_query_packet on a Net::DNS::Resolver class to create the query packet. Out
of that query packet you get the packet_data, and the ID from the header.

Use the ID to map back to the "asker" (an object that encapsulates the query)

Here's my query() code (in the high level interface object):

sub query {
    my Danga::DNS::Resolver $self = shift;
    my ($asker, @hosts) = @_;

    foreach my $host (@hosts) {
        my $packet = $self->{res}->make_query_packet($host);
        my $packet_data = $packet->data;

        my $h = $packet->header;
        my $id = $h->id;

# dst here is:
#   sockaddr_in($self->{res}{port}, inet_aton($self->{res}{nameservers}[0]));
        if (!$self->sock->send($packet_data, 0, $self->{dst})) {
            return;
        }

        # print "Query: $host ($id)\n";

        $self->{id_to_asker}->{$id} = $asker;
    }

    return 1;
}

Then you select()/poll() on $self->sock, and when you can_read, you call:

    my $packet = $self->{res}->bgread($self->sock);
    my $err = $self->{res}->errorstring;
    my $answers = 0;
    if (defined($packet)) {
        my $header = $packet->header;
        my $id = $header->id;

        my $asker = delete $self->{id_to_asker}->{$id};
        if (!$asker) {
            warn("No asker for id: $id");
            return;
        }

        my @questions = $packet->question;
        #print STDERR "response to ", $questions[0]->string, "\n";
        foreach my $rr ($packet->answer) {
            # do something with the answers, calling into $asker
            ...
            $answers++;
        }
    }

That's all there is to it. You need to add in some timeout code of your own
(though this might be sufficient in the SA architecture to just use the select()
timeout), and it only queries one nameserver, but I'm sure you can probably
adapt it.

Let me know if you have any questions about integrating this into SA. I'm sorry
I can't just code it in for you.

Enjoy!

Matt.



------- Additional Comment #98 From Tony Finch 2005-04-13 07:34 [reply] -------

Subject: Re:  DNS answers get mixed up

>         my $packet = $self->{res}->make_query_packet($host);
>         my $packet_data = $packet->data;
>         my $h = $packet->header;
>         my $id = $h->id;
>
> seem to imply that make_query_packet generates a unique ID when it makes a
> header. Is that the case?

Yes, make_query_packet calls Net::DNS::Packet::new which calls
Net::DNS::Header::new which calls Net::DNS::Header::nextid to fill in the
id field.

Tony.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From sidney@sidney.com  2005-05-08 12:42 -------
I'm running a test with randomised source ports now. I'll know in a few hours if
it works.




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From sidney@sidney.com  2005-04-26 15:49 -------
John Gardiner Myers wrote in sa-dev:
> If a new socket is created for each message, you might find the port
> wrapping, leading to collisions.

I worked out the math for that. Assuming on the order of 50 DNS queries per
message (using the test message in t/dnsbl.t as an example) right now what
happens is that the 16 bit ID wraps after about 1300 messages. The real problem
is that as long as the process is alive it is listening on the same port. All
the old replies get read and processed. If there is an ID collision, it has an
effect.

With each message having a new socket, the port number will wrap 50 times less
often than the IDs are now wrapping. If an old reply comes in it is not likely
that there is a listener active on the same port, since the lifetime of a
listener is one message. With each message having only on the order of 50 DNS
queries, the probability that two messages that have the same port number also
having their 16 bit IDs in the same range is tiny. A port collision has no bad
effect unless the IDs also collide.




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260


sidney@sidney.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pobox@rgrs.com




------- Additional Comments From sidney@sidney.com  2005-04-23 17:05 -------
*** Bug 4016 has been marked as a duplicate of this bug. ***



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From spamassassin@dostech.ca  2005-04-17 19:03 -------
Subject: Re:  rewrite DNS code to use a single socket, event-based
 model

I haven't been able to test it yet, but the code looks good to me.  It's 
actually nearly identical to the implementation I was think of.

I can't think of a reason not to commit it.





------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From sidney@sidney.com  2005-04-17 19:45 -------
It builds under Cygwin and Win32, and performs no worse than the old code does
when tested from behind a firewall that blocks access to outside nameservers :-)

I'll try a more complete test when I am outside the firewall.




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From quinlan@pathname.com  2005-04-20 02:30 -------
Subject: Re:  rewrite DNS code to use a single socket, event-based model

> Will there be a patch that is backward compatible with 3.0.2?

Most likely not, sorry.





------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From sidney@sidney.com  2005-04-28 08:11 -------
Created an attachment (id=2814)
 --> (http://bugzilla.spamassassin.org/attachment.cgi?id=2814&action=view)
patch to reopen socket to get new port with each message

This oatch is on top of the previous patch, which has already been checked in.
This patch was checked in rev 16149

It also adds a timeout to DnsResolver->search which is not necessary for this
bug but is required for bug 4278 which I had to fix in order to test this patch
on my machine.




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From sidney@sidney.com  2005-05-09 01:52 -------
> aren't we already dropping the bogus packets immediately?

Those are just the ones that we can tell are bogus because it is, e.g., a reply
to a query for a domain's nameserver that arrives in response to a URIDNSBL lookup.

We don't have a way of catching collisions causing a wrong URIDNSBL result.

Actually that gives me an idea for the next several hour test: I could use a
hash and generate a warning for all dropped packets to see how much of a problem
there is that we are not seeing.

BTW, I just realized that we don't have to generate a hash explicitly. Right now
we use the ID as the key in a hash table that associates the packet with the
callback function that is supposed to process it:

  $self->{id_to_callback}->{$id} = $cb; # in bgsend
and
  my $cb = delete $self->{id_to_callback}->{$id}; # in poll_responses

If instead we keyed on $id . $host in bgsend and in poll_response get the host
name from the question section to key on $id . $host we would eliminate
collisions without any more code. The extra memory for the strings is only one
string for each backgrounded DNS query in a single message, then they are
cleaned up.

The reason I'm not $suggesting $id . $host . $type is because when I was testing
I noticed that $type sometimes is left to default in the call to bgsend and
shows up as different values in the reply, and I don't now offhand what to set
it to in bgsend. With port, id, and host name all matching I'm sure it will be fine.

Does anyone see a problem with that?



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260


spamassassin@dostech.ca changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dev@spamassassin.apache.org






------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260


sidney@sidney.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED




------- Additional Comments From sidney@sidney.com  2005-04-28 08:15 -------
checked in rev 16149




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From sidney@sidney.com  2005-04-26 18:54 -------
Daniel, wraparound certainly is possible in masscheck. 16 bits is 65K packets.
The message used in t/dnsbl.t by itself generates 52 DNS queries. Theo's mass
check log that he posted showed something like 22,000 messages, which is about
1,100,000 DNS queries.

The initialization of the DNS ID counter was supposed to spread small variations
in the pid by shifting the bits up. So much for home brewed seat of the pants
hash functions :-).

I suppose we could be better off just using srand the way Net::DNS does.

But actually, if we just use a different port for each message by calling
DnsResolver->connect_sock then there is nothing wrong with the counter that
Net::DNS maintains. We could remove all the code that maintains our own counter
and all should work.




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From quinlan@pathname.com  2005-04-13 12:33 -------
Subject: Re:  rewrite DNS code to use a single socket, event-based model

1) I think right before release is the wrong time to move to single
   socket.
2) The single socket stuff should be put (moved, copied) to a separate
   bug (enhancement request).





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From jgmyers@proofpoint.com  2005-05-09 08:58 -------
Instead of $host, why not use question->data ?  It's easy to get to and much
more thorough.



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From jm@jmason.org  2005-04-26 19:14 -------
I have a 173MB logfile here from a mass-check of 10k ham, 10k spam mails using -j=4.

: jm 1050...; grep 58866 nohup.out  | grep "new" | wc -l
     12

so it cycles 12 times in that many messages.  looks like I got bogus RRs here
too...  this log contains records of what ID numbers are used, when they're
received in a response, what the query packets contain, etc.

gzipped, it's 8.5MB -- anyone want a copy?  (private mail only, I'd prefer not
to publish it publically given the amount of detail it's logging.)




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From jm@jmason.org  2005-04-26 15:57 -------
bear in mind, we can pick our local socket's local port number, so ensure that
it doesn't reopen an old port within N days (or similar).   I think this would
work safely:

  - close and reopen on a new local-port at fork
  - close and reopen on a new local-port if we're about to wrap ID space and
start reusing already-used IDs



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From sidney@sidney.com  2005-05-20 12:57 -------
Created an attachment (id=2884)
 --> (http://bugzilla.spamassassin.org/attachment.cgi?id=2884&action=view)
Patch to current trunk svn that uses question data as hash for uniqueness

Here is a patch thatshould take care of everything other than the current
failures in t/dnsbl.t tests. I haven't had a chance to look at that more than
to determine that this patch doesn't make it worse or better. That should be
dealt with in a separate bug.

This patch uses the data from the question section of the packet and the ID as
the hash key for ensuring that a packet reply matches the query. It doesn't
need to use SHA or anything like that because perl's own hashing takes care of
it at a memory cost of storing the query string for each DNS query made. That
makes the code simple too. The patch selects a port at random, which should
avoid what I think is the cause of most of the collisions we have been seeing,
then uses the hash to make it impossible to get a false result even if there is
a collision. The purpose of trying to reduce the collisions in the first place
is that there is a cost to receiving and discarding the packets. The purpose of
hte hash is that we can't completely eliminate collisions, especially since we
don't have control over SPF::Query's use of the DNS -- We can get collisions
from replies to packets that it sends out.

I would like to check this is and then deal with the t/dnsbl.t test failures.
My laptop died again and I don't have as good a development and testing
environment as usual. So please review and comment on this patch before I check
it in.




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From jm@jmason.org  2005-04-17 15:02 -------
anyone got any results with these patches they'd care to share?
will I just go ahead and check in?



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From sidney@sidney.com  2005-04-26 16:02 -------
Theo,

I'm concerned that the factor of 3 slowdown you saw in mass check had to do with
the listener getting the old replies and then rejecting them based on the ID not
matching. What is the overhead of closing a UDP socket and creating a new one?
Doing that once per message will make it unlikely that a listener will be
available when an old reply shows up. As for the chance that the OS will assign
the same port again, 1) when I've looked at port assignments they seem to
usually be sequential; 2) even if they are random rather than sequential (or
when the port numbers eventually wrap), the IDs are not likely to match when the
port numbers happen to.

That said, I don't see a collision problem with your scheme or Justin's
variation of it, and I agree that it is simpler than changing the socket with
each message, as the code change stays inside DnsResolver. It is worth a try,
but if your mass check is still three times slower, we may have to go to
changing the socket with each message.




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From spamassassin@dostech.ca  2005-05-10 21:34 -------
Subject: Re:  rewrite DNS code to use a single socket, event-based
 model

The eval block was to trap die's thrown by Net::DNS.  It's definitely 
needed.





------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
OtherBugsDependingO|                            |3387
              nThis|                            |






------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From sidney@sidney.com  2005-04-13 19:18 -------
"you still need to replace the current select()/bgisready model with a new
polling model"

It might not be that different. Right now you loop through the pending list
looking for a socket that is ready. When you find one you read it and compare
the results with the ID data in the object you found in the pending list.

The new way would be to first check if the one socket is ready. If it is you
read it and use the ID to find the corresponding object in the pending list.
After that the processing is pretty much the same.

Of course that is pretty much off the top of my head. What did I miss ? :-)




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED




------- Additional Comments From jm@jmason.org  2005-05-05 10:49 -------
ah, I thought it wasn't complete.  sounds good -- reclosing



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From jm@jmason.org  2005-04-13 13:15 -------
'The single socket stuff should be put (moved, copied) to a separate
   bug (enhancement request).'

dude -- that's *THIS* bug.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From sidney@sidney.com  2005-04-27 03:33 -------
I have a patch almost ready, but I'm still trying to figure out what to do with
DnsResolver->search()

Here are the issues: Net::Dns::Resolver->search will try each of a list of
nameservers with a timeout and retry a certain number of times with increasing
timeouts before giving up. That's good. DnsResolver->search uses bgsend, which
can only query the first nameserver on the list, and will loop forever without
timing out if there is no response. That's bad, and it happens regularly on my
Win32 and Cygwin systems. I don't know why  I can easily add a timeout, but it
would be a lot of code to emulate the retries with increasing timeout that
Net::DNS::Resolver->search already has. It does look like a retry following a
failure is what does work on my system.

On the other hand, as near as I can tell Net::DNS::Resolver never closes a
socket. If we call Net::DNS::Resolver->search it appears to me that we will
start using up fds again.

I guess this implies that I should put in some kind of timeout and multiple
retry that at least emulates the default behavior of Net::DNS::Resolver->search.



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From sidney@sidney.com  2005-04-27 11:35 -------
I made the mistake of testing my patch on my Cygwin machine -- A mistake because
most of my effort was to get DnsResolver->search() to work without hanging (bug
#4278) and that turned out to be a bug in Net::DNS under some strange otherwise
benign condition that my laptop's network settings have ended up in. I do think
that DnsResolver->search should be written so that people in my circumstance do
not experience a hang without having to count on Net:DNS eventually getting
fixed, and I now know how to do that. I'll discuss that in bug #4278 comments.

With Net::DNS patched to avoid the hang, my patch to DnsResolver and
PerMsgStatus works fine on Fedora Core 3 with no failures, no slowdown, and no
bogus rr warnings. Under Cygwin it fails t/dnsbl.t tests 3 and 16, getting
results for 40 out of 42 backgrounded queries in the message used by t/dnsbl.t

I have to go off to work and school now. I guess I should post the patch for
others to look at when I get the chance before I have time to figure out the
problem with Cygwin and the dnsbl.t tests, but even that will have to wait a
bit, as I'm rushing off.




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From sidney@sidney.com  2005-05-09 05:32 -------
I'm getting a very strange result and can use some advice from people with more
perl expertise.

I made the following changes in DnsResolver.pm

I generalised the use of the id by adding an argument to the closure that is
passed to bgsend. Instead of it having one argument of the answer packet and
then defining it in Dns.pm to save packet->header->id, I made it have two
arguments, the second one being whatever bgsend uses as the id key. That way
when I change the key there is no change required in Dns.pm.

I then changed bgsend and poll_response to use $id . $host instead of just $id.

I'm leaving out a few details, but it all ran ok.

However, it was very slow. Investiating, I found that it was just about as slow
if I simply set $id = $id . "x", i.e., make it a nonumeric string.

The only difference was whether the $id key was an integer or a string in these
two places:

  $self->{id_to_callback}->{$id} = $cb; # in bgsend
and
  my $cb = delete $self->{id_to_callback}->{$id}; # in poll_responses

Debugging output showed that there were between 20 and 30 of these per message.

When $id is an int, the mass-check processes about one message per second.

When $id is a string, mass-check processes about one message every four seconds.

How could 40 to 60 associative hash references take that long?




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From jm@jmason.org  2005-04-13 16:47 -------
'I think, without looking through it very carefully, that what I just described
contains very little change to the existing code while still reducing the number
of sockets to two per process and without introducing an event model.'

yeah, it would certainly be easier -- not *quite* as easy as you think though,
since you still need to replace the current select()/bgisready model with a new
polling model that understands that when a socket is ready, it mightn't be the
desired answer that's ready to be read.  that (IMO) is the big change that we
can't avoid if we do single-socket.

But yes, that's pretty much what I was thinking of implementing for *this* bug
anyway ;)



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260


sidney@sidney.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |FIXED




------- Additional Comments From sidney@sidney.com  2005-05-21 07:07 -------
I want to get this out of the way before committing any fixes to bug 4354, so...

Closing as fixed, committed rev 171210




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From sidney@sidney.com  2005-04-13 12:16 -------
>From another comment I posted in bug #3997 :

"As a transition step it might be easy to create a IO::Socket object when we
call Net::DNS::Resolver->new, save it as an instance variable $self->{socket}
alongside $self->{res}, and in res_bgsend instead of calling bgsend call
$self->{socket}->send. Not much else would have to be changed and we would have
one socket in Dns.pm and one in URIDNSRBL.pm"

That might be simpler enough to put into 3.1.0 even if the full blown O-O
rewrite following Matt's outline is not done that soon.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From sidney@sidney.com  2005-05-10 21:30 -------
I have a coding question:

In DnsResolver there is a sub new_dns_packet that used to do some things in
addition to calling Net::DNS::Packet->new. The extra things it did have gone
away, so I don't see any reason to keep the function, especially since it is
only called in one place in the same file.

The question I have is why new_dns_packet wrapped the call to
Net::DNS::Packet->new in an eval block, and if that would be necessary in bgsend
where new_dns_packet is called. Or is it correct to replace the call to
new_dns_packet with a call to Net::DNS::Packet->new without any eval?




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From schulz@adi.com  2005-04-22 08:51 -------
Created an attachment (id=2799)
 --> (http://bugzilla.spamassassin.org/attachment.cgi?id=2799&action=view)
Patch for 3.0.2

Here is a patch that applies on 3.0.2.	I have not yet run it in production,
but it passes make test.  Note that I applied all the patches in bug 3997
before applying the patch from this bug as they seemed to expected by the
patch for this bug.  Note also that the changes to spamd/spamd.raw from patch
2787 from bug 3997 were not applied as there is no such code in spamd.raw
for 3.0.2.



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From felicity@kluge.net  2005-04-26 21:17 -------
Subject: Re:  rewrite DNS code to use a single socket, event-based model

On Tue, Apr 26, 2005 at 07:23:00PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> Also, I just ran 3.1 w/out any patches and it took ~3h, so I'm thinking the
> timing comparison from yesterday was a fluke.

Another FYI, I just svn updated to the latest trunk w/ quinlan's rand()
change, same problem after roughly the same time, as expected:

[...]
status:  80% ham: 2674   spam: 17862  date: 2005-04-23   now: 2005-04-26 11:42:42 PM
uridnsbl: bogus rr: domain=5jb.net, zone=multi.surbl.org.,
rr=dog.ccpatoncejk.biz.      1200    IN      A    200.149.11.62 at
/home/felicity/SA/corpus/../spamassassin-head/masses//../lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
line 635.
[...]

I'm trying the reinit idea now and seeing how that works.  I made it
counter==orig_counter instead of == 0 since another int isn't really a big
deal.  I'll post tomorrow when I see how it worked out.

I've been thinking about this some more -- why do we care what the
starting ID is?  Why not start at 1, then when we wrap to 0 (16-bit limit)
get a new socket and start anew at 1?  We're not sharing a single socket
between children, so the ID is really only useful to know when responses
from one message are coming in late to the same child.  If we're worried about
another child reusing another's socket, the ID doesn't really help anyway
since the IDs in both children could have been near-similar anyway.





------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

DNS ID initialization

Posted by Daniel Quinlan <qu...@pathname.com>.
bugzilla-daemon@bugzilla.spamassassin.org writes:

> Theo and I speculate that the problem is the 16 bit ID wrapping. I
> think that closing and reopening the socket to get a new port with
> each message will fix the problem.

16 bits is a lot of packets.  I don't see how on earth we could be
wrapping around.  Collisions, now, that I can believe.

I checked and it seems like the initialization function is very poor for
usage in mass-check (and also spamd when spamd first starts) since it
clusters the IDs together for pids that are close in number.

      pid   DNS id
       10      640
       11      704
      100     6400
      101     6464
     1000    64000
     1002    64128
    10000    50185
    10010    50825

I doubt this was the intention of it, but just in case that some
behavior difference (in this respect) between 'my' and 'our' was
expected...  the 'our' variable isn't shared between child and parent as
far as I can tell:

  perl -e 'our $foo = 1; if (!fork()) { $foo++; print "child $foo\n"; \
    exit; } print "parent $foo\n";'

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260


sidney@sidney.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |




------- Additional Comments From sidney@sidney.com  2005-04-26 15:29 -------
The discussion thread '"uridnsbl: bogus rr" run ...' on sa-dev is really about
problems in this bug's patch still showing up in mass check.

I'm reopening this bug to allow the final resolution to be recorded here.

Summary of the discussion so far:

Theo saw bunches of "bogus rr" warnings in a mass check. I can reproduce that. I
generated a test patch that uses sha1 to insure that there are no collisions
between the IDs of the query packets and that stopped the warnings while causing
mass check to take three times longer. Theo and I speculate that the problem is
the 16 bit ID wrapping. I think that closing and reopening the socket to get a
new port with each message will fix the problem.


Could we continue the sa-dev discussion in this bug report?



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|dev@spamassassin.apache.org |jm@jmason.org
             Status|NEW                         |ASSIGNED




------- Additional Comments From jm@jmason.org  2005-04-13 22:35 -------
Created an attachment (id=2791)
 --> (http://bugzilla.spamassassin.org/attachment.cgi?id=2791&action=view)
the patch




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
OtherBugsDependingO|                            |3924
              nThis|                            |






------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From jm@jmason.org  2005-05-09 00:36 -------
'The answer may be to give up on trying to avoid
port/ID collisions and just use a hash to allow us to drop the bogus packets
immediately.'

aren't we already dropping the bogus packets immediately?  that's what the
warn() messages are.



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From sidney@sidney.com  2005-05-08 18:52 -------
Very strange results: I'm running mass-check on about 11,000 messages. Previous
results showed the first two bogus rr messages between 50% and 60%, then about
two more between 605 and 70%, and then some more (I don't have the exact number
handy right now, but I think another 8 or so) in the last deciles. This was
unchanged with the last patch.

I modified connect_sock to use a random port between 1024 and 65535 each time it
creates a socket. The currently checked in code creates a new socket with each
message. The result of the one test run I did was two bogus rr messages between
40% and 50%, and no others for the rest of the run.

I'm going to try to think of how there could have been a collision that showed
up so soon and then no others. The answer may be to give up on trying to avoid
port/ID collisions and just use a hash to allow us to drop the bogus packets
immediately.



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED




------- Additional Comments From jm@jmason.org  2005-04-18 10:40 -------
well, after the *existing* code caused mass-check to run my server out of memory
during the weekly NET run ;)  I decided this can't be much worse than that.  so:

applied!  r161778.



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From sidney@sidney.com  2005-05-09 09:23 -------
Is question->data guaranteed to be the same in the packet after it is created
before being sent as it is in the reply packet that comes back from the server?
It isn't enough that it mean the same, it would have to be the same bits. That
isn't true of host, for example, as the host name that is sent might not have a
"." at the end but the one in the reply packet always does.




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From sidney@sidney.com  2005-05-05 01:47 -------
I don't see any issues, but then I'm the one who marked the bug closed :-)




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From jgmyers@proofpoint.com  2005-05-09 10:03 -------
It's not possible to encode the difference between a final "." and no final "."
in a question section, so that's not an issue.  The only other issue would be
differences in character case.  RFC 1035 section 2.3.3 states "The data
distribution system in the domain system will ensure that consistent
representations are preserved."  Experimentation shows that BIND does preserve
case in the question section.




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From jm@jmason.org  2005-04-13 13:26 -------
actually, I've been thinking about this. there may be a way to do this quite
simply without many code changes, keeping the hard work in one new class. 
basically it doesn't *have* to be event driven at all... let me think some more.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From felicity@kluge.net  2005-04-17 18:36 -------
Subject: Re:  rewrite DNS code to use a single socket, event-based model

On Sun, Apr 17, 2005 at 03:02:08PM -0700, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> anyone got any results with these patches they'd care to share?
> will I just go ahead and check in?

We're still CTR for 3.1. ;)





------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From bret.miller@wcg.org  2005-05-05 08:46 -------
I've been running the code on Windows Server 2003 for about 24 hours (as a fix
to Bug 3924) and it's been running fine here-- no problems that I can see.



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From sidney@sidney.com  2005-04-26 15:52 -------
Theo posted on sa-dev:

On Wed, Apr 27, 2005 at 10:12:39AM +1200, Sidney Markowitz wrote:

>> A fix would be to close and reopen the socket at each message, or as you
>> suggested when the counter wraps. But it should not be when the counter
>> wraps to zero, it should be when it wraps to its initial value.


Yeah.  But then we have to store the initial value, etc, etc.  By wrapping
at 0, we get basically the same functionality, but it occurs more
frequently than it has to.  It's "randomly" in between new socket per
message and 1 socket per child.


>> I think it would be better to create the new socket with each message. If
>> old replies are arriving as they seem to, wouldn't it be more efficient to
>> not have a listener on the socket when they arrive?


True, but 1) that's more overhead, 2) in theory the child (or a different
child) could get the same socket back after processing a single message
(depends how the OS assigns out new sockets).

I like the idea of only getting a new socket when necessary (or
thereabouts), which leaves the issue still possible but very unlikely.

-- Randomly Generated Tagline: My favorite mythical creature? The honest
politician. 




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260


spamassassin@dostech.ca changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
OtherBugsDependingO|                            |3881
              nThis|                            |






------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From sidney@sidney.com  2005-04-26 22:08 -------
Theo, I think the whole thing about the ID is a red herring that we fell into
when we saw that there were collisions.

Here's how I remember our getting to this point:

The original problem was that we did not make use of the ID at all and we had
persistent_udp set. The persistent_udp setting is only used by send(), not by
bg_send. It causes Net::DNS to reuse the udp socket. The only time that caused a
problem was going from the calls to send() in the first queries to the
nameservers, to the calls to bgsend used in the RBL queries. The very first call
to bgsend used the same port as the calls to send(). This resulted in sometimes
a stray reply arriving that was picked up as one of the bgsend replies, because
the ID wasn't checked.

Justin implemented the single-socket code. I didn't notice how Net::DNS
increments the ID counter and I suggested something that involved keeping track
of a counter ourselves. That's where all the ID stuff came from.

We still had collisions that show up in mass check. I think that is caused by
each process being so long lived and using just one socket that the ID numbers
wrap many times.

The solution as I see it is to not have any one socket be so long lived. If we
create a socket for each message and close it when the message is done, then the
 ID counter code that is in Net::DNS already should be sufficient. Creating a
UDP socket is an inexpensive operation compared to processing a whole message,
just allocating a data structure. We would not have to maintain our own counter,
figure out how to initialize it, etc. It could be done with a call to
DnsResolver->connect_sock in new PerMsgStatus, and a call to
DnsResolver->{sock}->close in PerMsgStatus->finish, and ripping out the new
counter code so that Net::DNS sets the packet ID when the packet is created, and
then we get the ID from the packet.

In addition, I think that it is a bug that the socket created in DnsResolver has
ReuseAddr set to 1.




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From spamassassin@dostech.ca  2005-04-26 19:29 -------
Subject: Re:  rewrite DNS code to use a single socket, event-based
 model

> I was just noticing -- when the bogus rr warnings pop up, the record
> being returned seems to always be a nameserver, not a DNSBL or something
> more common as I'd expect:

That may be the DNS server giving up and returning the authoritative NS 
along with the name server's IP.





------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From sidney@sidney.com  2005-05-10 21:56 -------
Ok, then I guess it is better to leave the new_dns_packet function as is and
just change its comments to say that it wraps the call to catch dies, since it
no longer serves the purpose of the current comments.




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From jm@jmason.org  2005-04-13 22:35 -------
OK, here's an implementation! ;)

This adds a new class, Mail::SpamAssassin::DnsResolver, which acts as our own
$self->{res} object doing the right things with respect to:

    1. not reusing ID numbers
    2. using a single socket
    3. efficiently polling and tracking responses

I didn't go the whole hog to making it event-driven throughout, however. It
*is* event-driven, internally -- but the calling code in URIDNSBL.pm and Dns.pm
doesn't yet take advantage of that, to keep down the code changes in those
packages.

This patch obsoletes all patches to date on bug 3997, and also obsoletes patch
2729 on bug 3881 -- it had to ;)

Speed results, mass-check of 30 recent spams, with this patch:

fresh:  125
repeat: 7
repeat: 10


without the patch:

fresh:  163
repeat: 25
repeat: 17


a pretty major speedup.

Note that this patch will still warn if out-of-order responses arrive--
I didn't see any in this testing (for either side) however.

--j.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From lwilton@earthlink.net  2005-05-09 00:17 -------
Subject: Re:  rewrite DNS code to use a single socket, event-based model

> I modified connect_sock to use a random port between 1024 and 65535 each
time it
> creates a socket. The currently checked in code creates a new socket with
each
> message. The result of the one test run I did was two bogus rr messages
between
> 40% and 50%, and no others for the rest of the run.
>
> I'm going to try to think of how there could have been a collision that
showed
> up so soon and then no others. The answer may be to give up on trying to
avoid

Sidney, don't forget that "random" in no way insures that you won't use the
same port number twice in a row, or even 100 times in a row.  (Although the
later is bloody unlikely.)  Or use the same port number 3 requests apart.

If this is indeed the necesary solution, then you really should add some
history code to insure that you won't use the same socket number used in the
last 5 minutes or so.  Or include an id with the messages sent by the
socket, effectively increasing the number with a sequential key.  (And no,
using random there would not necessarily be 100% safe either.)





------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From felicity@kluge.net  2005-04-26 19:23 -------
Subject: Re:  rewrite DNS code to use a single socket, event-based model

I was just noticing -- when the bogus rr warnings pop up, the record
being returned seems to always be a nameserver, not a DNSBL or something
more common as I'd expect:

uridnsbl: bogus rr for domain=finishthesearch.com, rule=URIBL_SBL,
rr=ns3.cbsig.net.    42633   IN      A    170.20.116.136 at
/home/felicity/SA/corpus/../spamassassin-head/masses//../lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
line 624.
uridnsbl: bogus rr for domain=finishthesearch.com, rule=URIBL_SBL,
rr=ns1.cbsig.net.    42633   IN      A    170.20.0.16 at
/home/felicity/SA/corpus/../spamassassin-head/masses//../lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
line 624.
uridnsbl: bogus rr: domain=paylessinks.biz, zone=multi.surbl.org.,
rr=a.ns.interland.net.       19749   IN   A 64.226.28.33 at
/home/felicity/SA/corpus/../spamassassin-head/masses//../lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm
line 635.

Also, I just ran 3.1 w/out any patches and it took ~3h, so I'm thinking the
timing comparison from yesterday was a fluke.





------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
OtherBugsDependingO|                            |3997
              nThis|                            |






------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
OtherBugsDependingO|                            |2352
              nThis|                            |






------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260


jm@jmason.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |
   Target Milestone|Undefined                   |3.1.0




------- Additional Comments From jm@jmason.org  2005-05-05 00:22 -------
reopening just to make sure -- what's the status on this, guys?  is it working
for everyone now?



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From sidney@sidney.com  2005-04-13 15:35 -------
Justin, no I do mean create the socket when $self->{res} is created, and save it
in $self->{sock}. Then instead of calling bgsend, which creates a new socket
calls $sock->send and then returns $sock, you instead call $self->{sock}->send.

The objects that are pushed on the pending list will no longer have to contain
the socket, just the other information including the ID. Instead of going
through a list of sockets to check if any are ready to read, you check only
$self->{sock}. If the socket ever times out, then the pending list is cleared.
Timeout is always measured from the last send.

I think, without looking through it very carefully, that what I just described
contains very little change to the existing code while still reducing the number
of sockets to two per process and without introducing an event model.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From sidney@sidney.com  2005-04-26 18:44 -------
(copied here from sa-dev, my reply in the next comment)

bugzilla-daemon@bugzilla.spamassassin.org writes:


>> Theo and I speculate that the problem is the 16 bit ID wrapping. I
>> think that closing and reopening the socket to get a new port with
>> each message will fix the problem.


16 bits is a lot of packets.  I don't see how on earth we could be
wrapping around.  Collisions, now, that I can believe.

I checked and it seems like the initialization function is very poor for
usage in mass-check (and also spamd when spamd first starts) since it
clusters the IDs together for pids that are close in number.

      pid   DNS id
       10      640
       11      704
      100     6400
      101     6464
     1000    64000
     1002    64128
    10000    50185
    10010    50825

I doubt this was the intention of it, but just in case that some
behavior difference (in this respect) between 'my' and 'our' was
expected...  the 'our' variable isn't shared between child and parent as
far as I can tell:

  perl -e 'our $foo = 1; if (!fork()) { $foo++; print "child $foo\n"; \
    exit; } print "parent $foo\n";'

-- Daniel Quinlan http://www.pathname.com/~quinlan/ 




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From quinlan@pathname.com  2005-04-13 13:52 -------
Subject: Re:  rewrite DNS code to use a single socket, event-based model

> dude -- that's *THIS* bug.

Wow!  I thought this was the mix-up bug.  There are dangers to
commenting on bugs before coffee has kicked in.  :-)





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From quinlan@pathname.com  2005-04-27 01:57 -------
Subject: Re:  rewrite DNS code to use a single socket, event-based model

Sidney:

> The initialization of the DNS ID counter was supposed to spread small
> variations in the pid by shifting the bits up. So much for home brewed
> seat of the pants hash functions :-).

The main issue is that different processes can proceed at different
rates through their IDs and if they're just ~100 apart, it's easy to
collide.

But actually, if we just use a different port for each message by
calling DnsResolver->connect_sock then there is nothing wrong with the
counter that Net::DNS maintains. We could remove all the code that
maintains our own counter and all should work.

Theo:

> I was just noticing -- when the bogus rr warnings pop up, the record
> being returned seems to always be a nameserver, not a DNSBL or
> something more common as I'd expect:

Interesting.

Sidney:

> In Dns.pm in sub rbl_finish I added before the return:
>
> $self->{responder}->connect_sock() if $self-{responder};

One new socket per message?  I suppose that's okay.  I'd prefer one per
child if we can make it work, but maybe it doesn't and if your fix
works, I say let's go with it!  It's still way better than bgsock
originally was.

> I've been thinking about this some more -- why do we care what the
> starting ID is?

Paranoia.  At worst, it doesn't hurt to use rand() here.  :-)





------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From sidney@sidney.com  2005-05-10 17:03 -------
Regarding comment #56 and comment #58 -- I would have to use only the data from
the question section. That turns out not to be so easy to get, as Question->data
takes an argument of the byte offset of the question data packet within
Packet->data. Also, looking at the Net::DNS source code, it appears that the
return value is computed on the fly from the non-binary qname, qtype, and qclass
fields. So there is little reason to try to get Question->data instead. What is
relevant about your suggestion is that Net::DNS does store its canonicalized
values in qname, etc., once the packet is created. So I can use $packet->id .
$ques->qname . $ques->qtype . $ques->qclass to get a perfect key to uniquely
match up the query and reply packets.

Regarding my comment #55, the slowdown I saw seems to have been a bug in my
code. Starting from scratch and proceeding carefully I have found that I can
simply use
 $id = $packet->id . $ques->qname . $ques->qtype . $ques->qclass
as the key, let perl do the hashing, and everything is as fast as using the
numeric packet id alone as the key.

That should completely eliminate the collisions that the bogus RRs are
indicating, as well as all those which are not picked up by the test for bogus
RRs but could still produce incorrect results. Using this I should be able to
test how many of those latter collisions are in a typical mass check run. It
will be some hours of testing to verify all this.

The nice thing about this is that it requires almost no new code to make it work.



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From sidney@sidney.com  2005-04-26 19:56 -------
Here's what I'm trying right now. On my slow machine it will take a while before
I know the results for sure, but the first 40% has run about 1/3 faster than I
recall the average speed of the plain svn trunk test being. On the other hand, I
don't have those logs handy to see if the first 40% ran faster than the the
average speed for the test.

In Dns.pm in sub rbl_finish I added before the return:

 $self->{responder}->connect_sock() if $self-{responder};

In DnsResolver.pm in sub connect_sock I added before making the new socket:

 $self->{sock}->close() if $self->{sock};

and a few lines later I changed ReuseAddr => 1 to be ReuseAddr => 0.

Theo, if you want to try this three line change, I think it will make everything
work. If it does we can discuss where the right place to put the call to
connect_sock is, as I suspect that rbl_finish isn't as aesthetically correct as
something that is called at the start of processing a message.




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260


felicity@kluge.net changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |




------- Additional Comments From felicity@kluge.net  2005-05-07 13:11 -------
still have a problem via mass-check.  see http://www.kluge.net/~felicity/weekly.txt



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From sidney@sidney.com  2005-05-07 14:06 -------
Yuck...

I can think of two possibilities offhand.

First, that your OS does not randomize the choice of port when a UDP IO::Socket
is created, but instead keeps picking the same one as long as there is no active
listener on it.

A second possibility is that there is an interaction with Mail::SPF::Query. It
doesn't use any of the single socket code we have for DNS queries, which means
that every message generates several DNS requests that do not follow our
conventions.

We may have to go to using a real hash as the final protection against collisions.

Theo, sorry for asking you to run a several hour test that may show nothing, but
can you try it with SPF disabled to see if it makes a difference?




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From billl@pointshare.com  2005-04-20 00:40 -------
Will there be a patch that is backward compatible with 3.0.2?



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From jm@jmason.org  2005-04-13 13:22 -------
"As a transition step it might be easy to create a IO::Socket object when we
call Net::DNS::Resolver->new, save it as an instance variable $self->{socket}
alongside $self->{res}, and in res_bgsend instead of calling bgsend call
$self->{socket}->send. Not much else would have to be changed and we would have
one socket in Dns.pm and one in URIDNSRBL.pm"

I assume you mean create the socket whenever we call res_bgsend(), not when
$self->{res} is created -- because each request would need a new socket this way.

also, that doesn't address the "94 fds in use" bug -- bug 4016... actually,
there's a few other reports that I can now mark as dups ;)



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4260] rewrite DNS code to use a single socket, event-based model

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4260





------- Additional Comments From sidney@sidney.com  2005-05-08 11:13 -------
I verified that the problem is still here by running a mass-check on Fedora Core
3. It was way too slow to test on my Cygwin system, much slower than I remember
it being when I was testing this before. But just to make sure I first verified
that mass-check on the Fedora system using an older rev ran at the same rate as
with the current svn.

I disabled SPF and got the same results, except a bit faster. No difference in
the number of bogus rr warnings. It is an easy way to save 10% of the time to
run one of these tests.

A fallacy in my thinking about the solution to this problem could be that
creating a new socket will select an unused port, randomising the port numbers,
adding another 16 bits to the port/ID space for the queries. Now that is made
explicit, I see that of course the port number will not be random unless there
is a way of ensuring that it is. The OS has no reason to pick a different port
than it has before if there is no current listener on the port. This whole
scheme will fall apart if the source ports are cycling through the same, for
example, four numbers.

I see two solutions. One is to go with a hash to ensure no collisions. The other
is to find a way to randomise the source ports that are used.

Another mistake I apparently made was to forget to spend a few hours running
mass-check after my last "fix". Somehow I thought I had checked this.




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.