You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Stephan <st...@theched.org> on 2011/07/24 08:06:11 UTC

Beginner's question on rules

Good day gents,

I have been setting up a home mail server recently and it seems that I
cannot get all spam trapped correctly. Example is below for instance:

http://pastebin.com/EBER8iuP

This one that I definitely consider as spam gets classified as non-spam. 
It seems it hits correctly the BAYES_99 but as my threshold still is the
default 5 it gets reported as non spam, depsite other rules to be
triggered.

So my question is, what should I do basically to increase the accuracy of
this detection ? Should I change my thresholds ? Manually create a
blacklist ? Add some custum rulesets (I recently added Khopesh's one)

Any pointer will be welcome,

Thanks,

-Stephan


Re: Beginner's question on rules

Posted by John Hardin <jh...@impsec.org>.
On Sun, 24 Jul 2011, Stephan wrote:

> I have been setting up a home mail server recently and it seems that I
> cannot get all spam trapped correctly. Example is below for instance:
>
> http://pastebin.com/EBER8iuP
>
> This one that I definitely consider as spam gets classified as non-spam.
> It seems it hits correctly the BAYES_99 but as my threshold still is the
> default 5 it gets reported as non spam, depsite other rules to be
> triggered.
>
> So my question is, what should I do basically to increase the accuracy of
> this detection ? Should I change my thresholds ? Manually create a
> blacklist ? Add some custum rulesets (I recently added Khopesh's one)
>
> Any pointer will be welcome,

Tweaking your threshold by itself is not generally a good idea. You want 
to add more rules that detect spam that isn't scored high enough.

Here's a rule that should hit on messages generated by the bulk mailing 
service that message used:

   header      XM_EC_MESSENGER      X-Mailer =~ /\beC-Messenger\b/
   describe    XM_EC_MESSENGER      eC-Messenger bulk mail service

Even though neither of the domains spamvertised in that message are listed 
at SURBL.org, I'd suggest considering the use of greylisting to give 
spamvertised domains a chance to show up in the URIBLs that SA checks. You 
might also want to report news-fashion-shopping.com to SURBL.

Sadly, there are not a lot of rules in the standard corpus for languages 
other than English. If you receive a lot of legitimate mail in French, 
you may want to boost your score for BAYES_99 and ensure you train it 
carefully.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Health Care _is_ a right - the government has no business keeping
   you from getting it. But forcing somebody else to pay for your
   health care at gunpoint (i.e. through taxation) is _not_ a right.
-----------------------------------------------------------------------
  227 days since the first successful private orbital launch (SpaceX)

Re: Beginner's question on rules

Posted by da...@chaosreigns.com.
On 07/24, Stephan wrote:
> I have been setting up a home mail server recently and it seems that I
> cannot get all spam trapped correctly. Example is below for instance:

"All spam"?  You may have unrealistic expectations.  Although I certainly
encourage you to try to do better than what anybody else has managed.
Seriously, that's the only way we get better at this.

For example, in the ideal case where the email you get exactly matches
the email that spamassassin was trained on, in the STATISTICS-set3.txt.gz
(network and bayes tests enabled) file included with spamassassin it says:

# False positives:         8  0.04%
# False negatives:       691  1.57%

1.57% of spam missed.  

> http://pastebin.com/EBER8iuP

> So my question is, what should I do basically to increase the accuracy of
> this detection ? Should I change my thresholds ? Manually create a
> blacklist ? Add some custum rulesets (I recently added Khopesh's one)

It might be useful to tell us exactly what scores you're getting for each
test you're hitting, by using "spamassassin -t".

Do not lower your threshold below 5.  All scores are generated assuming a
threshold of 5 with a target of 1 in 2,500 false positives.  Lowering your
threshold will increase your false positives.  

Sought is the only other rule set I'd recommend:
http://wiki.apache.org/spamassassin/SoughtRules

Do you have Pyzor and Razor installed?

You could increase the score of BAYES_99 if you trust it.  You should check
the scores on all your non-spam that hits BAYES_99 and see how much of them
would become flagged as spam if you increase that score.  I wouldn't
recommend that without disabling auto-training bayes ("bayes_auto_learn 0") 
because that can go wrong (auto-training spam as non-spam and reverse).
And keep in mind, if you only have, say, 100 non-spams to base your score
change on, you risk increasing your false positives from ~1 in 2,500 to ~1
in 101 or worse.

If this is a repeated problem, it might be useful to try coming up with
your own custom rule or two.  And if they help, please share with this
mailing list.  http://wiki.apache.org/spamassassin/WritingRules

Another possibility is to participate in the nightly mass checks -
submitting your rule hit stats (not emails) to the process which calculates
spamassassin scores:  http://wiki.apache.org/spamassassin/NightlyMassCheck
We always need more of that to increase everybody's accuracy, and of course
it'll increase your accuracy more than those who don't participate.

I've started a combination IP white + blacklist, which you're welcome to
contribute to:  http://www.chaosreigns.com/iprep/
I'm kind of excited about it, but it needs more contributors to really be
useful for non-contributors.

-- 
"Let's just say that if complete and utter chaos was lightning, then
he'd be the sort to stand on a hilltop in a thunderstorm wearing wet
copper armour and shouting 'All gods are bastards'." - The Color of Magic
http://www.ChaosReigns.com

Re: Beginner's question on rules

Posted by Axb <ax...@gmail.com>.
On 2011-07-24 8:06, Stephan wrote:
> Good day gents,
>
> I have been setting up a home mail server recently and it seems that I
> cannot get all spam trapped correctly. Example is below for instance:
>
> http://pastebin.com/EBER8iuP
>
> This one that I definitely consider as spam gets classified as non-spam.
> It seems it hits correctly the BAYES_99 but as my threshold still is the
> default 5 it gets reported as non spam, depsite other rules to be
> triggered.
>
> So my question is, what should I do basically to increase the accuracy of
> this detection ? Should I change my thresholds ? Manually create a
> blacklist ? Add some custum rulesets (I recently added Khopesh's one)
>
> Any pointer will be welcome,

there's lots of X headers which you could use for header rules & tag 
stuff from this sender.

Or if that's too drastic, a URI rule...