You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by John Hardin <jh...@impsec.org> on 2010/02/22 21:18:03 UTC
masscheck T_ decision criteria?
Could someone provide (or point to in the source) the criteria used for
the masscheck making the T_ or not to T_ decision?
Why is this being made a T_ rule?
SPAM%
HAM%
S/O RANK SCORE NAME
2.5767 3552 of 137851 messages
0.0748 138 of 184506 messages
0.972 0.77 0.01 T_FROM_MISSPACED
The S/O is pretty good. It's better than this rule that's not being made
T_:
1.3827 1906 of 137851 messages
0.0607 112 of 184506 messages
0.958 0.74 1.00 FORM_FRAUD
Why?
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Your mouse has moved. Your Windows Operating System must be
relicensed due to this hardware change. Please contact Microsoft
to obtain a new activation key. If this hardware change results in
added functionality you may be subject to additional license fees.
Your system will now shut down. Thank you for choosing Microsoft.
-----------------------------------------------------------------------
Today: George Washington's 278th Birthday
Re: masscheck T_ decision criteria?
Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
On 22/02/2010 8:43 PM, John Hardin wrote:
> On Mon, 22 Feb 2010, Daryl C. W. O'Shea wrote:
>> I think both rules are getting bounced in and out due to the fluctuation
>> of who's been submitting results over the last week due to the bad rule
>> that got checked in (plus I don't currently submit Sunday night results
>> right now).
>
> Yeah, there _is_ quite a bit of fluctuation.
I'd imagine that should tone down this week. Although, looking at the
code that Sidney pointed out, I'm thinking that averaging the S/O over
the last 3 days could be improved upon to reduce bounce. However, I
haven't looked at the whole thing to see how it works, so maybe it's fine.
> Might it be a good idea to use a different prefix to indicate rules that
> the automated systems have decided don't score well, to distinguish them
> from rules that the developer has explicitly indicated are for test?
I'm indifferent. I'm not a fan of renaming the rules (I'd prefer
tflags), however I suppose it makes it easy to tell the T_ rules apart
when viewing the bare rule names without a method that could pick up on
the tflags for you.
Daryl
Re: masscheck T_ decision criteria?
Posted by John Hardin <jh...@impsec.org>.
On Mon, 22 Feb 2010, Daryl C. W. O'Shea wrote:
> On 22/02/2010 3:18 PM, John Hardin wrote:
>>
>> Could someone provide (or point to in the source) the criteria used for
>> the masscheck making the T_ or not to T_ decision?
>
> AFAIK the logic is buried somewhere in the ruleqa app.
> build/mkupdates/listpromotable that created the active.list file just
> gets the info from the ruleqa app.
>
>> Why is this being made a T_ rule?
>>
>> SPAM%
>> HAM%
>> S/O RANK SCORE NAME 2.5767 3552 of 137851 messages
>> 0.0748 138 of 184506 messages
>> 0.972 0.77 0.01 T_FROM_MISSPACED
>>
>> The S/O is pretty good. It's better than this rule that's not being made
>> T_:
>>
>> 1.3827 1906 of 137851 messages
>> 0.0607 112 of 184506 messages
>> 0.958 0.74 1.00 FORM_FRAUD
>>
>> Why?
>
> What revision are these stats based on? Are both stats from the same
> revision?
They are both from nightly 20100221-r912319-n:
http://ruleqa.spamassassin.org/20100221-r912319-n?srcpath=jhardin
> 20100222-r912513-n:
>
> MSECS SPAM% HAM% S/O RANK SCORE NAME WHO/AGE
> 0 0.4503 0.1680 0.728 0.65 0.01 T_FROM_MISSPACED
> 0 0.2415 0.0537 0.818 0.70 0.01 T_FORM_FRAUD
>
> I think both rules are getting bounced in and out due to the fluctuation
> of who's been submitting results over the last week due to the bad rule
> that got checked in (plus I don't currently submit Sunday night results
> right now).
Yeah, there _is_ quite a bit of fluctuation.
Might it be a good idea to use a different prefix to indicate rules that
the automated systems have decided don't score well, to distinguish them
from rules that the developer has explicitly indicated are for test?
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
W-w-w-w-w-where did he learn to n-n-negotiate like that?
-----------------------------------------------------------------------
Today: George Washington's 278th Birthday
Re: masscheck T_ decision criteria?
Posted by Justin Mason <jm...@jmason.org>.
On Mon, Feb 22, 2010 at 23:36, Daryl C. W. O'Shea
<sp...@dostech.ca> wrote:
> On 22/02/2010 3:18 PM, John Hardin wrote:
>>
>> Could someone provide (or point to in the source) the criteria used for
>> the masscheck making the T_ or not to T_ decision?
>
> AFAIK the logic is buried somewhere in the ruleqa app.
> build/mkupdates/listpromotable that created the active.list file just
> gets the info from the ruleqa app.
actually , it's in masses/hit-frequencies -- the "-P" switch. see also the doc
at http://wiki.apache.org/spamassassin/RulesProjPromotion.
--j.
Re: masscheck T_ decision criteria?
Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
On 22/02/2010 3:18 PM, John Hardin wrote:
>
> Could someone provide (or point to in the source) the criteria used for
> the masscheck making the T_ or not to T_ decision?
AFAIK the logic is buried somewhere in the ruleqa app.
build/mkupdates/listpromotable that created the active.list file just
gets the info from the ruleqa app.
> Why is this being made a T_ rule?
>
> SPAM%
> HAM%
> S/O RANK SCORE NAME 2.5767 3552 of 137851 messages
> 0.0748 138 of 184506 messages
> 0.972 0.77 0.01 T_FROM_MISSPACED
>
> The S/O is pretty good. It's better than this rule that's not being made
> T_:
>
> 1.3827 1906 of 137851 messages
> 0.0607 112 of 184506 messages
> 0.958 0.74 1.00 FORM_FRAUD
>
> Why?
What revision are these stats based on? Are both stats from the same
revision?
20100222-r912513-n:
MSECS SPAM% HAM% S/O RANK SCORE NAME WHO/AGE
0 0.4503 0.1680 0.728 0.65 0.01 T_FROM_MISSPACED
0 0.2415 0.0537 0.818 0.70 0.01 T_FORM_FRAUD
I think both rules are getting bounced in and out due to the fluctuation
of who's been submitting results over the last week due to the bad rule
that got checked in (plus I don't currently submit Sunday night results
right now).
Daryl
Re: masscheck T_ decision criteria?
Posted by Sidney Markowitz <si...@sidney.com>.
John Hardin wrote, On 23/02/10 9:18 AM:
>
> Could someone provide (or point to in the source) the criteria used for
> the masscheck making the T_ or not to T_ decision?
based on a quick glance it appears to me that the code is in
trunk/masses/rule-qa/list-bad-rules in a series of tests right after the
comment
# base most of our decisions off day 1 (last night's mass-checks).
# note: meta rules must come before their __SUBRULES in this sort;
# default lexical sort will do this.
-- sidney