You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2007/10/30 13:15:31 UTC

the JM_SOUGHT rules

The JM_SOUGHT ruleset are body rules, extracted automatically from the
previous few days' trapped spam mail.  They typically hit about 90% of the
previous week's spam, with no FPs, according to 

  http://ruleqa.spamassassin.org/20071029-r589545-n/JM_SOUGHT_1/detail
  http://ruleqa.spamassassin.org/20071029-r589545-n/JM_SOUGHT_2/detail
  http://ruleqa.spamassassin.org/20071029-r589545-n/JM_SOUGHT_3/detail

This is achieved with no manual steps required at all, so that's quite
nice ;)

On the other hand, they could potentially be used to cause false
positives; review of the generated rules happens *after* they're
published (in other words they're C-T-R).

I'm currently publishing these as a separate ruleset at
sought.rules.yerp.org -- http://taint.org/2007/08/15/004348a.html

They're also checked into SVN trunk, but that's really to get an idea of
FP/FNs using the rule-QA system.

I would call it stable.

I'm wondering what to do with them now -- I see these options:

  1. leave it at sought.rules.yerp.org, effectively an unofficial side
  project to SpamAssassin.

  2. move it into SpamAssassin SVN, and publish the generated rules into
  the "core" 3.2.x rule updates, changing our rule-update generation
  criteria to support this.
  
  3. move it into SpamAssassin SVN, rename to something without the "JM"
  prefix, and publish the generated rules at a new URL like
  sought.rules.SpamAssassin.org .  This would then be the first of a new
  site of SpamAssassin-hosted add-on rulesets, which are free to use
  different promotion criteria from the default "core" set.

What do people think?

--j.

Re: the JM_SOUGHT rules

Posted by Yet Another Ninja <sa...@alexb.ch>.
On 10/30/2007 1:15 PM, Justin Mason wrote:
> The JM_SOUGHT ruleset are body rules, extracted automatically from the
> previous few days' trapped spam mail.  They typically hit about 90% of the
> previous week's spam, with no FPs, according to 
> 
>   http://ruleqa.spamassassin.org/20071029-r589545-n/JM_SOUGHT_1/detail
>   http://ruleqa.spamassassin.org/20071029-r589545-n/JM_SOUGHT_2/detail
>   http://ruleqa.spamassassin.org/20071029-r589545-n/JM_SOUGHT_3/detail
> 
> This is achieved with no manual steps required at all, so that's quite
> nice ;)
> 
> On the other hand, they could potentially be used to cause false
> positives; review of the generated rules happens *after* they're
> published (in other words they're C-T-R).
> 
> I'm currently publishing these as a separate ruleset at
> sought.rules.yerp.org -- http://taint.org/2007/08/15/004348a.html
> 
> They're also checked into SVN trunk, but that's really to get an idea of
> FP/FNs using the rule-QA system.
> 
> I would call it stable.
> 
> I'm wondering what to do with them now -- I see these options:
> 
>   1. leave it at sought.rules.yerp.org, effectively an unofficial side
>   project to SpamAssassin.
> 
>   2. move it into SpamAssassin SVN, and publish the generated rules into
>   the "core" 3.2.x rule updates, changing our rule-update generation
>   criteria to support this.
>   
>   3. move it into SpamAssassin SVN, rename to something without the "JM"
>   prefix, and publish the generated rules at a new URL like
>   sought.rules.SpamAssassin.org .  This would then be the first of a new
>   site of SpamAssassin-hosted add-on rulesets, which are free to use
>   different promotion criteria from the default "core" set.
> 
> What do people think?

3. : +1

Notes:

- seems some of the rules are VERY long:

body __SEEK__CF3DC  /Our Company is a privately owned and operated 
promoting and marketing firm based in UK, with offices all over the 
world\. We are currently expanding due to client needs\. We are looking 
for candidates that will assist us\. Now we offering positions at the 
entry level for marketing and management role\. We train all candidates 
in: Service Representative Promotions Communications Public Relations 
Marketing We value your goals and your career; so we will connect you 
with mentors who can offer you as much guidance as you need\. This is a 
permanent home based position, so anyone ready for a stable career 
should apply today\! /


or

body __SEEK_GSUNRB / Money Manager - GPS: Online Form \*Important 
information for former The Signature Citizens Internet Banking Users\* 
On November 1, we will be moving to a new Internet Banking system\. You 
will need to print any previous records \(statements, cancelled checks, 
Bill Pay information, etc\.\) you wish to retain since they will not 
move to the new service\. Your Internet Banking access will resume on 
Friday, November 2\. Previous merger with Signature Bank\'s parent 
company, Money Manager GPS, Completed on October 1, 2007\. Payments with 
a scheduled payment date of November 1 or before will be processed and 
should not be resubmitted\. Any payment scheduled for payment after 
November 1 will not be processed and other payment arrangements should 
be made\. If you previously had e-bills or payees setup with Bill Pay, 
Wire, Ach, etc\., you will need to re-apply for the service, and 
re-enter the bill payment information on the new system starting 
November 1\. Beginning on October 29, you can access the new Citizens 
Internet Banking system by clicking here: 
https:\/\/www\.citizensbankmoneymanagergps\.com\/ All information you 
provide to us on our web site is encrypted to ensure your privacy and 
security\./


- don't see the point of body rules containing short lived URIs

- reduce file size massively. Ppl could get surpised by memory used by 
adding an 80kb rule file (as well as possible noticeable speed issues)
(same SA list questions as with blacklist.cf .-)

- maybe rule generation run thru a spam_du_jour corpus only?

my 0.2 $preferred_currency

AXB