You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2007/10/30 13:15:31 UTC
the JM_SOUGHT rules
The JM_SOUGHT ruleset are body rules, extracted automatically from the
previous few days' trapped spam mail. They typically hit about 90% of the
previous week's spam, with no FPs, according to
http://ruleqa.spamassassin.org/20071029-r589545-n/JM_SOUGHT_1/detail
http://ruleqa.spamassassin.org/20071029-r589545-n/JM_SOUGHT_2/detail
http://ruleqa.spamassassin.org/20071029-r589545-n/JM_SOUGHT_3/detail
This is achieved with no manual steps required at all, so that's quite
nice ;)
On the other hand, they could potentially be used to cause false
positives; review of the generated rules happens *after* they're
published (in other words they're C-T-R).
I'm currently publishing these as a separate ruleset at
sought.rules.yerp.org -- http://taint.org/2007/08/15/004348a.html
They're also checked into SVN trunk, but that's really to get an idea of
FP/FNs using the rule-QA system.
I would call it stable.
I'm wondering what to do with them now -- I see these options:
1. leave it at sought.rules.yerp.org, effectively an unofficial side
project to SpamAssassin.
2. move it into SpamAssassin SVN, and publish the generated rules into
the "core" 3.2.x rule updates, changing our rule-update generation
criteria to support this.
3. move it into SpamAssassin SVN, rename to something without the "JM"
prefix, and publish the generated rules at a new URL like
sought.rules.SpamAssassin.org . This would then be the first of a new
site of SpamAssassin-hosted add-on rulesets, which are free to use
different promotion criteria from the default "core" set.
What do people think?
--j.
Re: the JM_SOUGHT rules
Posted by Yet Another Ninja <sa...@alexb.ch>.
On 10/30/2007 1:15 PM, Justin Mason wrote:
> The JM_SOUGHT ruleset are body rules, extracted automatically from the
> previous few days' trapped spam mail. They typically hit about 90% of the
> previous week's spam, with no FPs, according to
>
> http://ruleqa.spamassassin.org/20071029-r589545-n/JM_SOUGHT_1/detail
> http://ruleqa.spamassassin.org/20071029-r589545-n/JM_SOUGHT_2/detail
> http://ruleqa.spamassassin.org/20071029-r589545-n/JM_SOUGHT_3/detail
>
> This is achieved with no manual steps required at all, so that's quite
> nice ;)
>
> On the other hand, they could potentially be used to cause false
> positives; review of the generated rules happens *after* they're
> published (in other words they're C-T-R).
>
> I'm currently publishing these as a separate ruleset at
> sought.rules.yerp.org -- http://taint.org/2007/08/15/004348a.html
>
> They're also checked into SVN trunk, but that's really to get an idea of
> FP/FNs using the rule-QA system.
>
> I would call it stable.
>
> I'm wondering what to do with them now -- I see these options:
>
> 1. leave it at sought.rules.yerp.org, effectively an unofficial side
> project to SpamAssassin.
>
> 2. move it into SpamAssassin SVN, and publish the generated rules into
> the "core" 3.2.x rule updates, changing our rule-update generation
> criteria to support this.
>
> 3. move it into SpamAssassin SVN, rename to something without the "JM"
> prefix, and publish the generated rules at a new URL like
> sought.rules.SpamAssassin.org . This would then be the first of a new
> site of SpamAssassin-hosted add-on rulesets, which are free to use
> different promotion criteria from the default "core" set.
>
> What do people think?
3. : +1
Notes:
- seems some of the rules are VERY long:
body __SEEK__CF3DC /Our Company is a privately owned and operated
promoting and marketing firm based in UK, with offices all over the
world\. We are currently expanding due to client needs\. We are looking
for candidates that will assist us\. Now we offering positions at the
entry level for marketing and management role\. We train all candidates
in: Service Representative Promotions Communications Public Relations
Marketing We value your goals and your career; so we will connect you
with mentors who can offer you as much guidance as you need\. This is a
permanent home based position, so anyone ready for a stable career
should apply today\! /
or
body __SEEK_GSUNRB / Money Manager - GPS: Online Form \*Important
information for former The Signature Citizens Internet Banking Users\*
On November 1, we will be moving to a new Internet Banking system\. You
will need to print any previous records \(statements, cancelled checks,
Bill Pay information, etc\.\) you wish to retain since they will not
move to the new service\. Your Internet Banking access will resume on
Friday, November 2\. Previous merger with Signature Bank\'s parent
company, Money Manager GPS, Completed on October 1, 2007\. Payments with
a scheduled payment date of November 1 or before will be processed and
should not be resubmitted\. Any payment scheduled for payment after
November 1 will not be processed and other payment arrangements should
be made\. If you previously had e-bills or payees setup with Bill Pay,
Wire, Ach, etc\., you will need to re-apply for the service, and
re-enter the bill payment information on the new system starting
November 1\. Beginning on October 29, you can access the new Citizens
Internet Banking system by clicking here:
https:\/\/www\.citizensbankmoneymanagergps\.com\/ All information you
provide to us on our web site is encrypted to ensure your privacy and
security\./
- don't see the point of body rules containing short lived URIs
- reduce file size massively. Ppl could get surpised by memory used by
adding an 80kb rule file (as well as possible noticeable speed issues)
(same SA list questions as with blacklist.cf .-)
- maybe rule generation run thru a spam_du_jour corpus only?
my 0.2 $preferred_currency
AXB