You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by John Hardin <jh...@impsec.org> on 2010/02/22 21:18:03 UTC

masscheck T_ decision criteria?

Could someone provide (or point to in the source) the criteria used for 
the masscheck making the T_ or not to T_ decision?

Why is this being made a T_ rule?

SPAM%
HAM%
S/O     	RANK     	SCORE     	NAME 
2.5767  3552 of 137851 messages
0.0748  138 of 184506 messages
0.972 	 	0.77 	 	0.01 		T_FROM_MISSPACED

The S/O is pretty good. It's better than this rule that's not being made 
T_:

1.3827  1906 of 137851 messages
0.0607  112 of 184506 messages
0.958 	 	0.74 	 	1.00 		FORM_FRAUD

Why?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Your mouse has moved. Your Windows Operating System must be
   relicensed due to this hardware change. Please contact Microsoft
   to obtain a new activation key. If this hardware change results in
   added functionality you may be subject to additional license fees.
   Your system will now shut down. Thank you for choosing Microsoft.
-----------------------------------------------------------------------
  Today: George Washington's 278th Birthday

Re: masscheck T_ decision criteria?

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
On 22/02/2010 8:43 PM, John Hardin wrote:
> On Mon, 22 Feb 2010, Daryl C. W. O'Shea wrote:
>> I think both rules are getting bounced in and out due to the fluctuation
>> of who's been submitting results over the last week due to the bad rule
>> that got checked in (plus I don't currently submit Sunday night results
>> right now).
> 
> Yeah, there _is_ quite a bit of fluctuation.

I'd imagine that should tone down this week.  Although, looking at the
code that Sidney pointed out, I'm thinking that averaging the S/O over
the last 3 days could be improved upon to reduce bounce.  However, I
haven't looked at the whole thing to see how it works, so maybe it's fine.

> Might it be a good idea to use a different prefix to indicate rules that
> the automated systems have decided don't score well, to distinguish them
> from rules that the developer has explicitly indicated are for test?

I'm indifferent.  I'm not a fan of renaming the rules (I'd prefer
tflags), however I suppose it makes it easy to tell the T_ rules apart
when viewing the bare rule names without a method that could pick up on
the tflags for you.

Daryl


Re: masscheck T_ decision criteria?

Posted by John Hardin <jh...@impsec.org>.
On Mon, 22 Feb 2010, Daryl C. W. O'Shea wrote:

> On 22/02/2010 3:18 PM, John Hardin wrote:
>>
>> Could someone provide (or point to in the source) the criteria used for
>> the masscheck making the T_ or not to T_ decision?
>
> AFAIK the logic is buried somewhere in the ruleqa app.
> build/mkupdates/listpromotable that created the active.list file just
> gets the info from the ruleqa app.
>
>> Why is this being made a T_ rule?
>>
>> SPAM%
>> HAM%
>> S/O         RANK         SCORE         NAME 2.5767  3552 of 137851 messages
>> 0.0748  138 of 184506 messages
>> 0.972          0.77          0.01         T_FROM_MISSPACED
>>
>> The S/O is pretty good. It's better than this rule that's not being made
>> T_:
>>
>> 1.3827  1906 of 137851 messages
>> 0.0607  112 of 184506 messages
>> 0.958          0.74          1.00         FORM_FRAUD
>>
>> Why?
>
> What revision are these stats based on?  Are both stats from the same
> revision?

They are both from nightly 20100221-r912319-n:

http://ruleqa.spamassassin.org/20100221-r912319-n?srcpath=jhardin

> 20100222-r912513-n:
>
>  MSECS    SPAM%     HAM%     S/O    RANK   SCORE  NAME   WHO/AGE
>      0   0.4503   0.1680   0.728    0.65    0.01  T_FROM_MISSPACED
>      0   0.2415   0.0537   0.818    0.70    0.01  T_FORM_FRAUD
>
> I think both rules are getting bounced in and out due to the fluctuation
> of who's been submitting results over the last week due to the bad rule
> that got checked in (plus I don't currently submit Sunday night results
> right now).

Yeah, there _is_ quite a bit of fluctuation.

Might it be a good idea to use a different prefix to indicate rules that 
the automated systems have decided don't score well, to distinguish them 
from rules that the developer has explicitly indicated are for test?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   W-w-w-w-w-where did he learn to n-n-negotiate like that?
-----------------------------------------------------------------------
  Today: George Washington's 278th Birthday

Re: masscheck T_ decision criteria?

Posted by Justin Mason <jm...@jmason.org>.
On Mon, Feb 22, 2010 at 23:36, Daryl C. W. O'Shea
<sp...@dostech.ca> wrote:
> On 22/02/2010 3:18 PM, John Hardin wrote:
>>
>> Could someone provide (or point to in the source) the criteria used for
>> the masscheck making the T_ or not to T_ decision?
>
> AFAIK the logic is buried somewhere in the ruleqa app.
> build/mkupdates/listpromotable that created the active.list file just
> gets the info from the ruleqa app.

actually , it's in masses/hit-frequencies -- the "-P" switch.  see also the doc
at http://wiki.apache.org/spamassassin/RulesProjPromotion.

--j.

Re: masscheck T_ decision criteria?

Posted by "Daryl C. W. O'Shea" <sp...@dostech.ca>.
On 22/02/2010 3:18 PM, John Hardin wrote:
> 
> Could someone provide (or point to in the source) the criteria used for
> the masscheck making the T_ or not to T_ decision?

AFAIK the logic is buried somewhere in the ruleqa app.
build/mkupdates/listpromotable that created the active.list file just
gets the info from the ruleqa app.

> Why is this being made a T_ rule?
> 
> SPAM%
> HAM%
> S/O         RANK         SCORE         NAME 2.5767  3552 of 137851 messages
> 0.0748  138 of 184506 messages
> 0.972          0.77          0.01         T_FROM_MISSPACED
> 
> The S/O is pretty good. It's better than this rule that's not being made
> T_:
> 
> 1.3827  1906 of 137851 messages
> 0.0607  112 of 184506 messages
> 0.958          0.74          1.00         FORM_FRAUD
> 
> Why?

What revision are these stats based on?  Are both stats from the same
revision?

20100222-r912513-n:

  MSECS    SPAM%     HAM%     S/O    RANK   SCORE  NAME   WHO/AGE
      0   0.4503   0.1680   0.728    0.65    0.01  T_FROM_MISSPACED
      0   0.2415   0.0537   0.818    0.70    0.01  T_FORM_FRAUD

I think both rules are getting bounced in and out due to the fluctuation
of who's been submitting results over the last week due to the bad rule
that got checked in (plus I don't currently submit Sunday night results
right now).

Daryl


Re: masscheck T_ decision criteria?

Posted by Sidney Markowitz <si...@sidney.com>.
John Hardin wrote, On 23/02/10 9:18 AM:
> 
> Could someone provide (or point to in the source) the criteria used for 
> the masscheck making the T_ or not to T_ decision?

based on a quick glance it appears to me that the code is in
trunk/masses/rule-qa/list-bad-rules in a series of tests right after the
comment

# base most of our decisions off day 1 (last night's mass-checks).
# note: meta rules must come before their __SUBRULES in this sort;
# default lexical sort will do this.


 -- sidney