You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@issues.apache.org on 2010/11/05 14:37:11 UTC

[Bug 6510] New: Mass-check generating nightly rules updates with 0 scores for rules with dependencies

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6510

           Summary: Mass-check generating nightly rules updates with 0
                    scores for rules with dependencies
           Product: Spamassassin
           Version: 3.3.1
          Platform: PC
        OS/Version: other
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Score Generation
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: msherman@projectile.ca


The nightly masscheck which generates the scores for production rules updates
has recently started generating 0 scores for rules which have dependencies
(leading to warnings).

Examples:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6504
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6509

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6510] Mass-check generating nightly rules updates with 0 scores for rules with dependencies

Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6510

--- Comment #2 from John Hardin <jh...@impsec.org> 2010-11-05 21:39:55 UTC ---
Created an attachment (id=4820)
 --> (https://issues.apache.org/SpamAssassin/attachment.cgi?id=4820)
Aviod publishing zero scores for published sandbox rules

Q'n'D fix for the symptoms, not the cause.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6510] Mass-check generating nightly rules updates with 0 scores for rules with dependencies

Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6510

--- Comment #3 from Karsten Bräckelmann <gu...@rudersport.de> 2010-11-05 23:49:03 UTC ---
*** Bug 6509 has been marked as a duplicate of this bug. ***

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6510] Mass-check generating nightly rules updates with 0 scores for rules with dependencies

Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6510

John Wilcock <jo...@tradoc.fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |john@tradoc.fr

--- Comment #11 from John Wilcock <jo...@tradoc.fr> 2010-11-08 15:08:08 UTC ---
(In reply to comment #9)
> No doubt a rule can be or become useless by itself but still be useful as part
> of another rule. Perhaps the fix should be to allow meta rules to use a rule
> with a score of 0 without a warning.

Or perhaps more usefully, automatically prepend them with __.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6510] Mass-check generating nightly rules updates with 0 scores for rules with dependencies

Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6510

--- Comment #6 from Daryl C. W. O'Shea <sp...@dostech.ca> 2010-11-06 17:38:49 UTC ---
(In reply to comment #5)
> (In reply to comment #4)
> > (In reply to comment #1)
> > > +  sed -e 's/0\.000/0.001/g' < trunk-rulesrc-scores/72_scores.cf >
> > > trunk/rules/72_scores.cf
> > 
> > Yeah, that won't work well.  It'll enable net rules in non-net scoresets.
> Would that cause problems given the presence of "tflags net" on net rules?
> Surely SA does not rely on the score being zero to disable net rules when run
> in local mode...

Good point, investigating now.

> > I think I'll change the script to abort the update if lint detects dependency
> > issues.  I missed tonight's update (I think there's yet another rule or more
> > that will have zeroing issues)... so I'll look at it tomorrow.
> That would keep the update from breaking things at the cost of possibly
> blocking rule and score updates for extended periods while the rescorer is
> misbehaving.

Yeah.  I'm not sure what's better.  No update or people complaining about the
update.

> If we can put a check into the rescorer that does not allow scores to go to
> zero on published rules (taking into account net vs. non-net as my hack does
> not, if that's indeed critical) then that should be done. As I said earlier, if
> a rule is performing well enough to be published then it should not be assigned
> a zero score.

Perhaps we could use the non-mutable flag to stop the GA from doing this. 
Really, though, in promoting the rules we don't look at whether there's any
point in having the rule (super-redundant, or whatever).

> Any idea why the scores are jumping around so much? That bothers me too, though
> it's not as problematic as scores going to zero for no discernible reason.

I'm not sure they're really changing that much.  A few tenths here and there
and the occasional large jumps are to be expected due to the nature of the GA.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6510] Mass-check generating nightly rules updates with 0 scores for rules with dependencies

Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6510

--- Comment #5 from John Hardin <jh...@impsec.org> 2010-11-06 11:53:00 UTC ---
(In reply to comment #4)
> (In reply to comment #1)
> > +  sed -e 's/0\.000/0.001/g' < trunk-rulesrc-scores/72_scores.cf >
> > trunk/rules/72_scores.cf
> 
> Yeah, that won't work well.  It'll enable net rules in non-net scoresets.

Would that cause problems given the presence of "tflags net" on net rules?
Surely SA does not rely on the score being zero to disable net rules when run
in local mode...

> I think I'll change the script to abort the update if lint detects dependency
> issues.  I missed tonight's update (I think there's yet another rule or more
> that will have zeroing issues)... so I'll look at it tomorrow.

That would keep the update from breaking things at the cost of possibly
blocking rule and score updates for extended periods while the rescorer is
misbehaving.

If we can put a check into the rescorer that does not allow scores to go to
zero on published rules (taking into account net vs. non-net as my hack does
not, if that's indeed critical) then that should be done. As I said earlier, if
a rule is performing well enough to be published then it should not be assigned
a zero score.

Any idea why the scores are jumping around so much? That bothers me too, though
it's not as problematic as scores going to zero for no discernible reason.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6510] Mass-check generating nightly rules updates with 0 scores for rules with dependencies

Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6510

--- Comment #4 from Daryl C. W. O'Shea <sp...@dostech.ca> 2010-11-05 23:54:43 UTC ---
(In reply to comment #1)
> +  sed -e 's/0\.000/0.001/g' < trunk-rulesrc-scores/72_scores.cf >
> trunk/rules/72_scores.cf

Yeah, that won't work well.  It'll enable net rules in non-net scoresets.

I think I'll change the script to abort the update if lint detects dependency
issues.  I missed tonight's update (I think there's yet another rule or more
that will have zeroing issues)... so I'll look at it tomorrow.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6510] Mass-check generating nightly rules updates with 0 scores for rules with dependencies

Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6510

John Hardin <jh...@impsec.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jhardin@impsec.org

--- Comment #1 from John Hardin <jh...@impsec.org> 2010-11-05 21:38:39 UTC ---
Quick and dirty patch suggestion, from total ignorance of the internals of the
sandbox score generation process:

===================================================================
--- build/mkupdates/mkupdate-with-scores    (revision 1031656)
+++ build/mkupdates/mkupdate-with-scores    (working copy)
@@ -150,7 +150,7 @@

   cd ..

-  cp trunk-rulesrc-scores/72_scores.cf trunk/rules/72_scores.cf
+  sed -e 's/0\.000/0.001/g' < trunk-rulesrc-scores/72_scores.cf >
trunk/rules/72_scores.cf

   # note: one of set0 or set1 stats might be incorrect (not all of their rules
   #       are included in the update) I can't remember if we eliminate dropped

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6510] Mass-check generating nightly rules updates with 0 scores for rules with dependencies

Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6510

--- Comment #7 from Daryl C. W. O'Shea <sp...@dostech.ca> 2010-11-06 18:14:33 UTC ---
(In reply to comment #0)
> The nightly masscheck which generates the scores for production rules updates
> has recently started generating 0 scores for rules which have dependencies
> (leading to warnings).
> Examples:
> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6504
> https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6509

OK, I've confirmed that this doesn't prevent sa-update from working, or from SA
from working.  In fact, I would say this (or at least the possibility for it)
is by design.  The GA is deciding (apparently) that rules aren't worth having
(due to overlaps/whatever that makes the rules have no real bearing on the
outcome across the entire corpus) even though they are technically good enough
to be promoted.

Historically, before sa-update, this wasn't an issue since we just wouldn't
ship those rules or rules that depended on them.  Updates via sa-update open up
the possibility of problems, it seems (although I'm not sure why yet, maybe
something is broken).

(In reply to comment #6)
> (In reply to comment #5)
> > (In reply to comment #4)
> > > (In reply to comment #1)
> > > > +  sed -e 's/0\.000/0.001/g' < trunk-rulesrc-scores/72_scores.cf >
> > > > trunk/rules/72_scores.cf
> > > 
> > > Yeah, that won't work well.  It'll enable net rules in non-net scoresets.
> > Would that cause problems given the presence of "tflags net" on net rules?
> > Surely SA does not rely on the score being zero to disable net rules when run
> > in local mode...
> Good point, investigating now.

Looks like the 'sed' is actually fine as long as net rules have their tflags
net (which, if they don't is a different problem).

I'm looking at getting the GA to not 0 them, or if it does, make sure there are
no dependencies.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6510] Mass-check generating nightly rules updates with 0 scores for rules with dependencies

Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6510

--- Comment #8 from Daryl C. W. O'Shea <sp...@dostech.ca> 2010-11-06 20:57:43 UTC ---
[dos@dev new-rule-score-gen]$ svn ci -m "bug 6510: temporarily force all
generated scores to be non-zero"
Sending        new-rule-score-gen/do-nightly-rescore-example
Sending        new-rule-score-gen/merge-scoresets
Transmitting file data ..
Committed revision 1032191.
[dos@dev new-rule-score-gen]$

I've temporarily added code to force all generated scores to be non-zero and
mark the scores-set# files to indicate that the score was changed from 0.000 to
0.0001 with a "# forced non-zero" notation.


I've discovered that most, but not all, of the rules that have problems have
been promoted due to "tflags publish"... which means they weren't necessarily
selected for promotion due to mass-check results (which may explain their low
or zero scores).

I've also found that the script generate-new-scores checks out a copy of
tags/spamassassin_release_3_2_0_rc_2/rules from the svn repos.  I'm not sure
yet what it uses it for and what it should use now that we've split rules out
of the base release.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6510] Mass-check generating nightly rules updates with 0 scores for rules with dependencies

Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6510

Kevin A. McGrail <km...@pccc.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kmcgrail@pccc.com

--- Comment #10 from Kevin A. McGrail <km...@pccc.com> 2010-11-08 11:20:42 UTC ---
(In reply to comment #9)
> > I've discovered that most, but not all, of the rules that have problems have
> > been promoted due to "tflags publish"... which means they weren't necessarily
> > selected for promotion due to mass-check results (which may explain their low
> > or zero scores).
> No doubt a rule can be or become useless by itself but still be useful as part
> of another rule. Perhaps the fix should be to allow meta rules to use a rule
> with
> a score of 0 without a warning.

If a rule has a score of 0, I've always treated that as 100% disabled and we
would need a flag to disable a rule.

KAM

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6510] Mass-check generating nightly rules updates with 0 scores for rules with dependencies

Posted by bu...@issues.apache.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6510

--- Comment #9 from Tom Schulz <sc...@adi.com> 2010-11-08 11:15:39 UTC ---
> I've discovered that most, but not all, of the rules that have problems have
> been promoted due to "tflags publish"... which means they weren't necessarily
> selected for promotion due to mass-check results (which may explain their low
> or zero scores).
No doubt a rule can be or become useless by itself but still be useful as part
of another rule. Perhaps the fix should be to allow meta rules to use a rule
with
a score of 0 without a warning.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.