You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by "Lawrence @ Rogers" <la...@nl.rogers.com> on 2011/02/11 19:38:21 UTC

cap on rule scores

Hello,

Like many users, we set our required score to 5.0. Lately, we've seen 
several dozen ham e-mails get mislabeled as spam due to one rule that 
(usually) has a score of 3 or more. For example, please see the attached 
e-mail (raw with headers) where TO_NO_BRKTS_DIRECT contributes over 3 to 
the overall score.

We've already had to create a custom .cf file to rescore rules like 
EXCUSE_REMOVE for this purpose.

What I propose is a limit of 2.5 for any rule that is not a network test 
(RBL, DNSBL) or a Bayesian rule (BAYES_xx). This would allow the rules 
to still effectively tag spam as they should, while reducing the 
possibility of false positives.

Thoughts?

Regards,

Lawrence WIlliams
LCWSoft
www.lcwsoft.com



Re: cap on rule scores

Posted by John Hardin <jh...@impsec.org>.
On Sat, 12 Feb 2011, Lawrence @ Rogers wrote:

> But scoring 3+ for a rule that checks the format of the To: header is a bit 
> excessive, IMO. In an ideal world, everyone would send properly formatted 
> headers, but we don't live in a perfect world and need to account for that.

The format of the To: header _in concert with direct-to-MX_. Per our 
masscheck corpora that is a reasonably good spam sign.

It appears that the newsletters are being generated directly on a 
network-facing host and are being sent directly to subscribers' MX hosts 
rather than via an intervening dedicated MTA. This is very "spammy" 
behavior, and it's not surprising it hits a rule like this. It looks 
exactly like something generated by a spambot.

Perhaps adding "must be in ZEN" to that meta might be justified, but that 
would make it a network test, which for this sort of thing I'm reluctant 
to do.

How many of your FPs are similar?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   After ten years (1998-2008) of draconian gun control in the State
   of Massachusetts, the results are in: firearms-related assaults up
   78%, firearms-related homicides up 67%, assault-related emergency
   room visits up 331%. Gun Control does not reduce violent crime.
-----------------------------------------------------------------------
  Today: Abraham Lincoln's and Charles Darwin's 202nd Birthdays

Re: cap on rule scores

Posted by "Lawrence @ Rogers" <la...@nl.rogers.com>.
On 12/02/2011 6:21 AM, Yet Another Ninja wrote:
> On 2011-02-11 19:38, Lawrence @ Rogers wrote:
>> Hello,
>>
>> Like many users, we set our required score to 5.0. Lately, we've seen
>> several dozen ham e-mails get mislabeled as spam due to one rule that
>> (usually) has a score of 3 or more. For example, please see the attached
>> e-mail (raw with headers) where TO_NO_BRKTS_DIRECT contributes over 3 to
>> the overall score.
>>
>> We've already had to create a custom .cf file to rescore rules like
>> EXCUSE_REMOVE for this purpose.
>>
>> What I propose is a limit of 2.5 for any rule that is not a network test
>> (RBL, DNSBL) or a Bayesian rule (BAYES_xx). This would allow the rules
>> to still effectively tag spam as they should, while reducing the
>> possibility of false positives.
>>
>> Thoughts?
>
> my thought would be to fix the html only mail, header formating, etc 
> in your sample, before working around filter rules
>
Unfortunately, that is not in my power as I did not create the e-mail. 
Only received it. I understand the scoring for HTML-only e-mail, as that 
is a good indicator of possible spaminess.

But scoring 3+ for a rule that checks the format of the To: header is a 
bit excessive, IMO. In an ideal world, everyone would send properly 
formatted headers, but we don't live in a perfect world and need to 
account for that.

- Lawrence

Re: cap on rule scores

Posted by Yet Another Ninja <ax...@gmail.com>.
On 2011-02-11 19:38, Lawrence @ Rogers wrote:
> Hello,
>
> Like many users, we set our required score to 5.0. Lately, we've seen
> several dozen ham e-mails get mislabeled as spam due to one rule that
> (usually) has a score of 3 or more. For example, please see the attached
> e-mail (raw with headers) where TO_NO_BRKTS_DIRECT contributes over 3 to
> the overall score.
>
> We've already had to create a custom .cf file to rescore rules like
> EXCUSE_REMOVE for this purpose.
>
> What I propose is a limit of 2.5 for any rule that is not a network test
> (RBL, DNSBL) or a Bayesian rule (BAYES_xx). This would allow the rules
> to still effectively tag spam as they should, while reducing the
> possibility of false positives.
>
> Thoughts?

my thought would be to fix the html only mail, header formating, etc in 
your sample, before working around filter rules

Re: cap on rule scores

Posted by John Hardin <jh...@impsec.org>.
On Fri, 11 Feb 2011, Lawrence @ Rogers wrote:

> Hello,
>
> Like many users, we set our required score to 5.0. Lately, we've seen 
> several dozen ham e-mails get mislabeled as spam due to one rule that 
> (usually) has a score of 3 or more. For example, please see the attached 
> e-mail (raw with headers) where TO_NO_BRKTS_DIRECT contributes over 3 to 
> the overall score.
>
> We've already had to create a custom .cf file to rescore rules like 
> EXCUSE_REMOVE for this purpose.
>
> What I propose is a limit of 2.5 for any rule that is not a network test 
> (RBL, DNSBL) or a Bayesian rule (BAYES_xx). This would allow the rules 
> to still effectively tag spam as they should, while reducing the 
> possibility of false positives.
>
> Thoughts?

Unfortunately I can't comment on the idea of modifying the scoring system 
to impose an upper limit on individual rule scores, as I'm not that 
familiar with its internal details. I personally don't see any problem 
with the idea, but it's possible it could keep actual spam that only hits 
a few rules from being scored in a way that it would reach 5 points.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   A well educated Electorate, being necessary to the liberty of a
   free State, the Right of the People to Keep and Read Books,
   shall not be infringed.
-----------------------------------------------------------------------
  Today: Abraham Lincoln's and Charles Darwin's 202nd Birthdays