You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2005/02/14 11:08:53 UTC

[Bug 4136] New: mass-check --reuse should disable reused rules during mass-check --net

http://bugzilla.spamassassin.org/show_bug.cgi?id=4136

           Summary: mass-check --reuse should disable reused rules during
                    mass-check --net
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Masses
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: quinlan@pathname.com


If --reuse is on, there's absolutely no point in actually running those rules,
so some hackery should be added to mass-check to disable only those rules
that are being skipped (see the %reuse object in tmp/rules.pl).  Perhaps
add to user_prefs.cf in a machine-edited section?

Thoughts?



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4136] mass-check --reuse should disable reused rules during mass-check --net

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4136





------- Additional Comments From peter@unlikejam.dreamhost.com  2005-02-20 17:34 -------
The --reuse feature in mass-check is very useful.  In order to have the reused
rules available to meta rules, I have modified the message check in mass-check
to push/hit the reused rules into the message prior to checking.

This may not be the cleanest approach since it calls into PerMsgStatus directly.
 It also relates to bug3650, since finish() will be called twice on the
resulting message (when $ma and $status are cleaned up), which will fail currently.

--- mass-check.orig     Wed Feb 16 16:39:55 2005
+++ mass-check  Mon Feb 21 12:11:50 2005
@@ -344,7 +341,26 @@

   } else {
     $before = time;
-    $status = $spamtest->check($ma);
+    #  If we have reuse rules, setup a message with reuse hits
+    if ($opt_reuse && grep { $reuse{$_}->{reuse} } @previous) {
+      #  Manually push/hit reuse rules onto the message
+      #  This will assist with meta rule evaluations
+      #  Structure of code taken from SpamAssassin::Check
+      local ($_);
+      $spamtest->init(1);
+      my $msg = Mail::SpamAssassin::PerMsgStatus->new($spamtest, $ma);
+      #  Add reuse hits from previous
+      for (grep { $reuse{$_}->{reuse} } @previous) {
+        $msg->Mail::SpamAssassin::PerMsgStatus::got_pattern_hit($_, "REUSE: ");
+      }
+      $msg->check();
+      #  Still require $ma and $status, but causes problems cleaning up later
+      $ma     = $msg;
+      $status = $msg;
+    }
+    else {
+      $status = $spamtest->check($ma);
+    }
     $after = time;
   }





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4136] mass-check --reuse should disable reused rules during mass-check --net

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4136





------- Additional Comments From quinlan@pathname.com  2005-03-11 00:52 -------
Peter, probably a good idea, but that implementation is going to be pretty
darn slow (if it works?).  We can get away without it, though, because there's
only one meta test that uses a network rule and all of its components are
network rules.

Anyway, checked in code to zero the scores of reused rules, which should work.

Closing as FIXED.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4136] mass-check --reuse should disable reused rules during mass-check --net

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4136





------- Additional Comments From quinlan@pathname.com  2005-02-14 14:49 -------
Subject: Re:  mass-check --reuse should disable reused rules during mass-check --net

> - we're only going to need this when testing out Bayes, and the time
>   consuming piece is the net checks, so only bother reusing net rules.

Yes, the code only reuses network rules (that have been present for some
time and have not changed much; allowing some small changes is probably
still better than using delayed network testing).

> - we want to cause certain rules not to run, but we want the hit to
> show up and the score to be added as if it had been run.  the problem
> is that the score will only be added if the rule actually runs, and
> the rule will run if the score is non-zero.  at the time, it looked
> like a whole lot of hackery would be needed to get this working.

The total score doesn't matter at all in mass-check or in the
perceptron, actually (it's rounded for Pete's sake).  The score is
*only* used for the FP/FN options to hit-frequencies which are
semi-broken and rarely, if ever used.  Now, it matters during the
perceptron run itself, but that's all computed internally.

I talked about this with Henry last night (this morning) and we even
came to agreement that --reuse was safe to always run during nightly
mass-checks that aren't using --net.  There's no reason to not show
network hits if you have them.

> - the main problem, from my pov, with the reuse approach is that what
>   you really want is to know what requests were made and what the answers
>   were at a given point in time.  reuse attempts to deduce what that
>   information is based on rule hit.  ie: it's good to know whether or
>   not spamcop/surbl/etc hit on a given rule (means it ran and hit),
>   but what's more important is to know that it didn't hit but was run.
>   If it didn't run at all, we need to run it ala our current method.

Agreed.  For the future, why don't we add an X-Spam-Status flag similar
to autolearn?  net=yes and net=no -- for now, we'll just have to ask
people to only use --reuse for the sections of the corpus that have been
tagged with network checks on.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4136] mass-check --reuse should disable reused rules during mass-check --net

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4136


quinlan@pathname.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dev@spamassassin.apache.org
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED




------- Additional Comments From quinlan@pathname.com  2005-03-11 00:53 -------
FIXED



------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.

[Bug 4136] mass-check --reuse should disable reused rules during mass-check --net

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4136





------- Additional Comments From felicity@kluge.net  2005-02-14 09:21 -------
Subject: Re:   New: mass-check --reuse should disable reused rules during mass-check --net

On Mon, Feb 14, 2005 at 02:08:53AM -0800, bugzilla-daemon@bugzilla.spamassassin.org wrote:
> If --reuse is on, there's absolutely no point in actually running those rules,
> so some hackery should be added to mass-check to disable only those rules
> that are being skipped (see the %reuse object in tmp/rules.pl).  Perhaps
> add to user_prefs.cf in a machine-edited section?
> 
> Thoughts?

When I was trying to implement this several months ago:

- we're only going to need this when testing out Bayes, and the time consuming
  piece is the net checks, so only bother reusing net rules.

- we want to cause certain rules not to run, but we want the hit to show up
  and the score to be added as if it had been run.  the problem is that the
  score will only be added if the rule actually runs, and the rule will run if
  the score is non-zero.  at the time, it looked like a whole lot of hackery
  would be needed to get this working.

- the main problem, from my pov, with the reuse approach is that what
  you really want is to know what requests were made and what the answers
  were at a given point in time.  reuse attempts to deduce what that
  information is based on rule hit.  ie: it's good to know whether or
  not spamcop/surbl/etc hit on a given rule (means it ran and hit),
  but what's more important is to know that it didn't hit but was run.
  If it didn't run at all, we need to run it ala our current method.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4136] mass-check --reuse should disable reused rules during mass-check --net

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4136


quinlan@pathname.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|dev@spamassassin.apache.org |quinlan@pathname.com




------- Additional Comments From quinlan@pathname.com  2005-03-11 00:53 -------
boo




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4136] mass-check --reuse should disable reused rules during mass-check --net

Posted by bu...@bugzilla.spamassassin.org.
http://bugzilla.spamassassin.org/show_bug.cgi?id=4136





------- Additional Comments From peter@unlikejam.dreamhost.com  2005-02-20 17:38 -------
Created an attachment (id=2659)
 --> (http://bugzilla.spamassassin.org/attachment.cgi?id=2659&action=view)
Patch to apply reuse rules to a message prior to checking




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.