You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2008/06/23 17:50:54 UTC

[Bug 5930] New: Shortcircuiting before DNS lookups

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5930

           Summary: Shortcircuiting before DNS lookups
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: Other
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P4
         Component: Plugins
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: hege@hege.li


Created an attachment (id=4342)
 --> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4342)
Hacky test patch

Shortcircuiting ALL_TRUSTED, USER_IN_WHITELIST etc could use tiny improvement.
Currently DNS queries are sent and message is decoded before these simple rules
are evaluated, wasting bandwidth and CPU.

I made a quick hack for myself. I set these rules at priority -10000 and run
them manually before run_rbl_eval_tests.

I couldn't come up with any elegant solution for general use. Perhaps we could
allocate some priority-range that are run before DNS? But that would mean users
could accidently prioritize rules that need decoded message, and decoding 
should be done only after DNS launching. Ideas?


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5930] Shortcircuiting before DNS lookups

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5930

--- Comment #12 from AXB <ax...@gmail.com> ---
depending on how many thousand q/s you make it can make a massive difference.

the concept of shortcircuiting is to avoid any other rule apply a score.

if network lookups are NOT bypassed, network hits with high custom scores or
metas with lookup rules could "break" for example, a whitelist_from_rcvd.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 5930] Shortcircuiting before DNS lookups

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5930

--- Comment #6 from Henrik Krohns <he...@hege.li> ---
Here you go, I've been running these patches for years without hiccups.

Then you just need something like:

priority USER_IN_BLACKLIST -10050

Anything <= -10000 is shortcircuited before lookups are launched.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 5930] Shortcircuiting before DNS lookups

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5930

--- Comment #7 from Henrik Krohns <he...@hege.li> ---
For quick reference:

===================================================================
--- Check.pm    (revision 1626743)
+++ Check.pm    (working copy)
@@ -77,10 +77,8 @@
   # rbl calls.
   $pms->extract_message_metadata();

-  # Here, we launch all the DNS RBL queries and let them run while we
-  # inspect the message
-  $self->run_rbl_eval_tests($pms);
   my $needs_dnsbl_harvest_p = 1; # harvest needs to be run
+  my $rbls_running = 0;

   my $decoded = $pms->get_decoded_stripped_body_text_array();
   my $bodytext = $pms->get_decoded_body_text_array();
@@ -108,6 +106,13 @@
       last;
     }

+    # Here, we launch all the DNS RBL queries and let them run while we
+    # inspect the message
+    if (!$rbls_running && $priority > -10000) {
+      $rbls_running = 1;
+      $self->run_rbl_eval_tests($pms);
+    }
+
     my $timer = $self->{main}->time_method("tests_pri_".$priority);
     dbg("check: running tests for priority: $priority");


--- URIDNSBL.pm (revision 1626743)
+++ URIDNSBL.pm (working copy)
@@ -327,18 +327,21 @@
   return $self;
 }

-# this is just a placeholder; in fact the results are dealt with later
-sub check_uridnsbl {
-  return 0;
-}
-
 # ---------------------------------------------------------------------------

 # once the metadata is parsed, we can access the URI list.  So start off
 # the lookups here!
-sub parsed_metadata {
-  my ($self, $opts) = @_;
-  my $pms = $opts->{permsgstatus};
+
+#
+# only parse and run after first check_uridnsbl call
+#
+sub check_uridnsbl {
+  my ($self, $pms, @args) = @_;
+
+  # only parse once
+  return 0  if $pms->{uridnsbl_done};
+  $pms->{uridnsbl_done} = 1;
+
   my $conf = $pms->{conf};

   return 0  if $conf->{skip_uribl_checks};
@@ -494,7 +497,7 @@
   # and query
   $self->query_hosts_or_domains($pms, \%hostlist);

-  return 1;
+  return 0;
 }

 # Accepts argument in one of the following forms: m, n1-n2, or n/m,

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 5930] Shortcircuiting before DNS lookups

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5930

Henrik Krohns <he...@hege.li> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #4342|0                           |1
        is obsolete|                            |
                 CC|                            |hege@hege.li

--- Comment #4 from Henrik Krohns <he...@hege.li> ---
Created attachment 5239
  --> https://issues.apache.org/SpamAssassin/attachment.cgi?id=5239&action=edit
Patch to run rbls after priority -10000

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 5930] Shortcircuiting before DNS lookups

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5930

--- Comment #3 from AXB <ax...@gmail.com> ---
while testing Henrik's TLD patches with blacklist_from, etc, I noticed that net
lookups were still happening - not really necessary.
Could we reactivate this "discussion"

I see quite a bit of potential to save many cycles and network traffic if we
could implement this idea.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 5930] Shortcircuiting before DNS lookups

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5930

--- Comment #8 from Kevin A. McGrail <km...@pccc.com> ---
I don't use short-circuiting nor does the priority really have good
documentation at this time as shown by discussion with Philip Prindeville re:
bug 7060.

What's the path forward you recommend?

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 5930] Shortcircuiting before DNS lookups

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5930

--- Comment #10 from Henrik Krohns <he...@hege.li> ---
Given these defaults already exist, we could just come up with arbitrary
priority like -100 after which RBLs launch..

60_shortcircuit.cf:priority USER_IN_WHITELIST     -1000
60_shortcircuit.cf:priority USER_IN_DEF_WHITELIST -1000
60_shortcircuit.cf:priority USER_IN_ALL_SPAM_TO   -1000
60_shortcircuit.cf:priority SUBJECT_IN_WHITELIST  -1000
60_shortcircuit.cf:priority ALL_TRUSTED            -950
60_shortcircuit.cf:priority SUBJECT_IN_BLACKLIST   -900
60_shortcircuit.cf:priority USER_IN_BLACKLIST_TO   -900
60_shortcircuit.cf:priority USER_IN_BLACKLIST      -900
60_shortcircuit.cf:priority BAYES_99               -400

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 5930] Shortcircuiting before DNS lookups

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5930

Kevin A. McGrail <km...@pccc.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kmcgrail@pccc.com

--- Comment #9 from Kevin A. McGrail <km...@pccc.com> ---
Following my own comment: with years of testing sounds like we get it
committed, get it documented and get a few rules that use it.  If you can do
that, it'll make 3.4.1rc1 for sure.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 5930] Shortcircuiting before DNS lookups

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5930

RW <rw...@googlemail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rwmaillists@googlemail.com

--- Comment #11 from RW <rw...@googlemail.com> ---
The gratuitous DNS requests are not necessarily wasted because they bring the
results into cache. A relatively small number of slow lookups taken off the
critical path would make this a win in terms of throughput.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 5930] Shortcircuiting before DNS lookups

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5930

--- Comment #5 from Henrik Krohns <he...@hege.li> ---
Created attachment 5240
  --> https://issues.apache.org/SpamAssassin/attachment.cgi?id=5240&action=edit
Patch to launch uribl lookups only after check_uridnsbl is actually called

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 5930] Shortcircuiting before DNS lookups

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5930





--- Comment #2 from Michael Parker <pa...@pobox.com>  2008-06-23 09:39:05 PST ---
You could create a new Check plugin, subclass really, and override the
check_main to do what you want.


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 5930] Shortcircuiting before DNS lookups

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5930





--- Comment #1 from Justin Mason <jm...@jmason.org>  2008-06-23 09:31:15 PST ---
(In reply to comment #0)
> I couldn't come up with any elegant solution for general use. Perhaps we could
> allocate some priority-range that are run before DNS? 

That would work for me, I think.

> But that would mean users
> could accidently prioritize rules that need decoded message, and decoding 
> should be done only after DNS launching. Ideas?

If message decoding is "lazily evaluated" -- which it is -- then that wouldn't
really matter...


-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.