You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2008/11/04 12:23:11 UTC
[Bug 6010] New: RuleQA: default corpus for QA measurements should
ignore "high scoring" spam
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6010
Summary: RuleQA: default corpus for QA measurements should ignore
"high scoring" spam
Product: Spamassassin
Version: unspecified
Platform: Other
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P5
Component: RuleQA
AssignedTo: dev@spamassassin.apache.org
ReportedBy: jm@jmason.org
Currently, our ruleqa measurements include details of how the rules
perform against _all_ mail in the spam corpora, including the stuff
that's hitting every single rule we have. This means that great rules like:
http://ruleqa.spamassassin.org/20081103-<a href="https://svn.apache.org/viewcvs.cgi?view=rev&rev=710024">r710024</a>-n/PQRTW_4/detail
are hidden.
This is clearly demonstrated by my 2 corpora, "jm" and "bb-jm". "bb-jm"
is my high-scoring spam; on this corpus, PQRTW_4 hits only 0.251% of
spam. But on my low-scoring spam ("jm"), it hits 13.6118%.
Overall it hits 0.7921% of spam. But as you can see it's really good
against the low-scoring stuff.
Now, that's where we _need_ good rules... so in my opinion we should fix
the ruleqa app to highlight those rules by default. We don't need lots of
rules that hit the spam we're already catching.
I suggest the ruleqa scripts are extended to track a new subset of logs,
alongside the current set, for mass-check lines under some score threshold (10
points?). So something like this:
set 0, low-scoring spam
MSECS SPAM% HAM% S/O RANK SCORE NAME
0.00000 10.6118 0.0000 1.000 0.86 1.00 PQRTW_4
set 0, in aggregate
MSECS SPAM% HAM% S/O RANK SCORE NAME
0.00000 0.7921 0.0000 1.000 0.86 1.00 PQRTW_4
set 0, broken down by message age in weeks
MSECS SPAM% HAM% S/O RANK SCORE NAME WHO/AGE
0.00000 0.3335 0.0000 1.000 0.66 1.00 PQRTW_4 0-1
0.00000 0.2991 0.0000 1.000 0.63 1.00 PQRTW_4 1-2
0.00000 0.0000 0.0000 0.500 0.45 1.00 PQRTW_4 2-3
0.00000 0.7855 0.0000 1.000 0.80 1.00 PQRTW_4 3-6
set 0, broken down by contributor
[etc.]
this should be easy enough to do.
I don't think it needs to dictate the promotion criteria; rules like this would
still be promoted, since the SPAM% ratio is over the very low threshold (what
is it? 0.1%? can't recall)
--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.