You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by SA Lists <sa...@troodos.demon.co.uk> on 2007/08/16 11:09:58 UTC

Please help me to improve SA perforformance

Hello all,

I am only a home user, but I am trying to provide a spam-free environment to all members of my household. I have SA (3.2.2) set up with bayesian rules and run sa-learn every night. SA's performance is "average" (having caught c.600 out of the 1,000 or so spam mails currently in my spam folder (used for sa-learning).

Today I had a small rash of those spam mails that have about three lines of random words. SA let about 8 through and caught 2. Here are the headers from one of each:

a) Caught:
	Return-Path: <sq...@magnacarta.nl>
	X-Spam-Flag: YES
	X-Spam-Checker-Version: SpamAssassin 3.2.2 (2007-07-23) on mydomain.org.uk
	X-Spam-Level: ******
	X-Spam-Status: Yes, score=6.2 required=5.0 tests=BAYES_60,HTML_MESSAGE, LONGWORDS,MPART_ALT_DIFF,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_SORBS_WEB,RDNS_NONE autolearn=no version=3.2.2
	X-Spam-Report: *  0.1 RDNS_NONE Delivered to trusted network by a host with no rDNS *  1.0 BAYES_60 BODY: Bayesian spam probability is 60 to 80% *      [score: 0.6335] *  0.0 HTML_MESSAGE BODY: HTML included in message *  0.7 MPART_ALT_DIFF BODY: HTML and text parts are different *  2.0 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net *      [Blocked - see <http://www.spamcop.net/bl.shtml?207.255.180.122>] *  0.6 RCVD_IN_SORBS_WEB RBL: SORBS: sender is a abuseable web server *      [207.255.180.122 listed in dnsbl.sorbs.net] *  1.8 LONGWORDS Long string of long words
	Received: from mydomain.org.uk (localhost.localdomain [127.0.0.1]) by mydomain.org.uk (8.13.8/8.13.8) with ESMTP id l7G6F2ZQ011671 for <ma...@localhost>; Thu, 16 Aug 2007 07:15:02 +0100
	Received: from pop3.mail.demon.net [194.217.242.253] by mydomain.org.uk with POP3 (fetchmail-6.3.6) for <ma...@localhost> (single-drop); Thu, 16 Aug 2007 07:15:02 +0100 (BST)
	Received: from punt3.mail.demon.net by mailstore for petgord34truew@mydomain.demon.co.uk id 1ILYc0-1N2knw-03-9ut; Thu, 16 Aug 2007 06:13:40 +0000
	Received: from [194.217.242.72] (lhlo=anchor-hub.mail.demon.net) by punt3.mail.demon.net with lmtp id 1ILYc0-1N2knw-03 for petgord34truew@mydomain.demon.co.uk; Thu, 16 Aug 2007 06:13:40 +0000
	Received: from [207.255.180.122] (helo=207-255-180-122-dhcp.gsv.md.atlanticbb.net) by anchor-hub.mail.demon.net with smtp id 1ILYby-0001cr-0v for petgord34truew@mydomain.demon.co.uk; Thu, 16 Aug 2007 06:13:40 +0000
	Message-ID: <00...@depuypc>
	From: Donnie Belanger <sq...@magnacarta.nl>
	To: petgord34truew <pe...@mydomain.demon.co.uk>
	Subject: [SPAM] bugle brillouin  bucket
	Date: Thu, 16 Aug 2007 02:09:52 -0400 (07:09 BST)
	MIME-Version: 1.0
	Content-Type: multipart/alternative; boundary="----=_NextPart_000_000D_01C7DFAB.0F337260"
	X-Priority: 3
	X-MSMail-Priority: Normal
	X-Mailer: Microsoft Outlook Express 6.00.3790.1106
	X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0000
	X-Virus-Status: No
	X-Virus-Checker-Version: clamassassin 1.2.4 with clamscan / ClamAV 0.91.1/3966/Thu Aug 16 01:48:06 2007
	X-Spam-Prev-Subject: bugle brillouin  bucket
	X-Evolution-Source: imap://mark@192.168.123.101/

b) Slipped Through:
	Return-Path: <nn...@batelco.com.bh>
	X-Spam-Checker-Version: SpamAssassin 3.2.2 (2007-07-23) on mydomain.org.uk
	X-Spam-Level: ****
	X-Spam-Status: No, score=4.6 required=5.0 tests=BAYES_50,HTML_MESSAGE, LONGWORDS,MPART_ALT_DIFF,RCVD_IN_BL_SPAMCOP_NET,RDNS_NONE autolearn=no version=3.2.2
	Received: from mydomain.org.uk (localhost.localdomain [127.0.0.1]) by mydomain.org.uk (8.13.8/8.13.8) with ESMTP id l7G5v1nR011585 for <ma...@localhost>; Thu, 16 Aug 2007 06:57:01 +0100
	Received: from pop3.mail.demon.net [194.217.242.253] by mydomain.org.uk with POP3 (fetchmail-6.3.6) for <ma...@localhost> (single-drop); Thu, 16 Aug 2007 06:57:01 +0100 (BST)
	Received: from punt3.mail.demon.net by mailstore for petgord34truew@mydomain.demon.co.uk id 1ILYJn-063knw-02-AD6; Thu, 16 Aug 2007 05:54:51 +0000
	Received: from [194.217.242.210] (lhlo=lon1-hub.mail.demon.net) by punt3.mail.demon.net with lmtp id 1ILYJn-063knw-02 for petgord34truew@mydomain.demon.co.uk; Thu, 16 Aug 2007 05:54:51 +0000
	Received: from [76.19.26.227] (helo=user-73a7f5b517.hsd1.ma.comcast.net.) by lon1-hub.mail.demon.net with smtp id 1ILYJn-0002cG-Jd for petgord34truew@mydomain.demon.co.uk; Thu, 16 Aug 2007 05:54:51 +0000
	Message-ID: <00...@user73a7f5b517>
	From: Cleveland Carson <nn...@batelco.com.bh>
	To: petgord34truew <pe...@mydomain.demon.co.uk>
	Subject: blanchard consume  buoy
	Date: Thu, 16 Aug 2007 01:52:28 -0400 (06:52 BST)
	MIME-Version: 1.0
	Content-Type: multipart/alternative; boundary="----=_NextPart_000_0016_01C7DFA8.6C85DB40"
	X-Priority: 3
	X-MSMail-Priority: Normal
	X-Mailer: Microsoft Outlook Express 6.00.2600.1106
	X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.2962
	X-Virus-Status: No
	X-Virus-Checker-Version: clamassassin 1.2.4 with clamscan / ClamAV 0.91.1/3966/Thu Aug 16 01:48:06 2007
	X-Evolution-Source: imap://mark@192.168.123.101/


I thought perhaps that I would increase the scores on the bayes rules but then I read this on the SA wiki:

Note: Scores for "learn" rules, such as BAYES_*, that rate the probability that a message is spam, are scored using the same method. This can produce "confusing" scores, for instance, that have BAYES_80 with a higher score than BAYES_99. There are a few reasons for this. 1) The score generation system does not understand that BAYES_* are related to one another, they're separate rules that need separate scores. 2) More importantly, the higher the probability from a "learn" rule, the higher likelihood that the message also hit a bunch of other rules. This lets the score generation system lower the "learn" rule score due to the inevitable false positive, while also still marking the message as spam via the sum of all rule scores.

...and now I'm not so sure that it's a good idea to change the rules' scores. I suppose I could reduce the threshold to 4.5; but I don't know if that's a good thing either.

What's the best way for me to improve SA performance (bearing in mind that I'm really only an amateur spam fighter).

Thanks in advance....

AD


Re: Please help me to improve SA perforformance

Posted by SA Lists <sa...@troodos.demon.co.uk>.
On Fri, Aug 17, 2007 at 10:09:49AM +0200, Matthias Haegele wrote:
> SA Lists schrieb:
> 
> Dont think so:
> http://www.sanesecurity.co.uk/clamav/
> "Phishing and Scam Signatures for ClamAV"
> 
> >I have just now included many of the SARE rules in my sa-update. I am 
> >almost looking forward to getting some spam to see if they work! :)
> >...I presume that simply adding the rules via sa-update (as per the 
> >instructions on the wiki) is enough - they don't have to be "activated" in 
> >any way do they?
> 
> running sa-update (cron job) should do it...

Yes, I did that, manually first (that seemed to load all the rules) and then from a cron job which will run every night.

The results however are not encouraging. Even as I was compiling my last message to you one of those very same spam messages (one with about three lines of random words) came in and, even with the new rules in place (and yes, I had at that stage restarted spamd), it was scored with the lowest score yet for one of those messages (1.9!).
> 
> >Where can I find out more about the Botnet plugin? (There doesn't seem to 
> >be a reference to it on the wiki).
> 
> http://people.ucsc.edu/~jrudd/spamassassin/
> Download the latest version, untar it and read *txt and INSTALL ;-).
> You could search this mailing lists archive too for more infos on it.
> "spamassassin users botnet plugin" might give some results.
> 
Yes, sorry I should have made my request clearer. When I wrote that I had already downloaded the plugin and read the text files it came with, I had also looked back through some of the posts here. What I meant was should it just be installed as-is? Are there any rules that conflict with others? What does it work well with?... that sort of thing.

However, I just went ahead and installed it anyway.

So far none of the above has helped. In this morning's inbox I had 2 more of the three-line spams missed by SA (scores 2.6 and 1.8) and in my spam-filter 8 messages (mostly the same one trying to sell me watches) which SA was already picking up before these additional rules...

I know on that batch of spam it's an 80% success rate, but yesterday, when the mix of spam was different, it was the other way around. Very frustrating. I must be doing something wrong, especially when other people tell me that SA catches 198 out of 200 for them (as one friend told me).

Thanks again for your help. Much appreciated.

AD


Re: Please help me to improve SA perforformance

Posted by Jerry Durand <jd...@interstellar.com>.
At 01:09 AM 8/17/2007, Matthias Haegele wrote:
>Dont think so:
>http://www.sanesecurity.co.uk/clamav/
>"Phishing and Scam Signatures for ClamAV"

As a reminder to people, check the ClamAV readme file (I think that's 
the one it's in) and copy the SA rules to your local.cf file.


-- 
Jerry Durand, Durand Interstellar, Inc.  www.interstellar.com
tel: +1 408 356-3886, USA toll free: 1 866 356-3886
Skype:  jerrydurand


Re: Please help me to improve SA perforformance

Posted by Matthias Haegele <mh...@linuxrocks.dyndns.org>.
SA Lists schrieb:
> Matthias +all, 
> 
> Thank you very much.
> 
> On Thu, Aug 16, 2007 at 01:02:53PM +0200, Matthias Haegele wrote:
>> You only mentioned running sa-learn on spam you should also learn your 
>> hammessages, both is important. Bayes-Performance will only be good if 
>> learned on both, ham and spam.
> 
> Yes, thank you. For the sake of brevity I did not describe fully what I do, but I do indeed run sa-learn on the ham too (I have a nightly cron job that collects all the mail from a selection of folders and concatenates it into one mbox; I then run sa-learn --ham on that and sa-learn --spam on the spam folder).
> 
>>> ...and now I'm not so sure that it's a good idea to change the rules' 
>>> scores. I suppose I could reduce the threshold to 4.5; but I don't know if 
>>> that's a good thing either.
>> I reduced the treshold too, but also watch quarantine regularly for FPs, 
>> it works fine for me ...
> 
> I haven't yet done this (but am still thinking about it) see below...
> 
>>> What's the best way for me to improve SA performance (bearing in mind that 
>>> I'm really only an amateur spam fighter).

Me too ;-).

>>> Thanks in advance....
>> perhaps you could use:
>> clamav sanesecurity
>> SARE Rules
>> Botnet plugin
>> too ...
> 
> Well thanks for this.
> 
> I do clam checking before SA so I guess clamav sanesecurity would be duplicating that...

Dont think so:
http://www.sanesecurity.co.uk/clamav/
"Phishing and Scam Signatures for ClamAV"

> I have just now included many of the SARE rules in my sa-update. I am almost looking forward to getting some spam to see if they work! :)
> ...I presume that simply adding the rules via sa-update (as per the instructions on the wiki) is enough - they don't have to be "activated" in any way do they?

running sa-update (cron job) should do it...

> Having added all the extra SARE rules I haven't changed the overall threshold until I see what effect they have.

Good idea, some of the rules might hit on HAM too ...

> Where can I find out more about the Botnet plugin? (There doesn't seem to be a reference to it on the wiki).

http://people.ucsc.edu/~jrudd/spamassassin/
Download the latest version, untar it and read *txt and INSTALL ;-).
You could search this mailing lists archive too for more infos on it.
"spamassassin users botnet plugin" might give some results.

> Thanks again.

NP, hf.

> AD


-- 
Greetings & hth
MH


Dont send mail to: ubecatcher@linuxrocks.dyndns.org
--


Re: Please help me to improve SA perforformance

Posted by SA Lists <sa...@troodos.demon.co.uk>.
Matthias +all, 

Thank you very much.

On Thu, Aug 16, 2007 at 01:02:53PM +0200, Matthias Haegele wrote:
> 
> You only mentioned running sa-learn on spam you should also learn your 
> hammessages, both is important. Bayes-Performance will only be good if 
> learned on both, ham and spam.

Yes, thank you. For the sake of brevity I did not describe fully what I do, but I do indeed run sa-learn on the ham too (I have a nightly cron job that collects all the mail from a selection of folders and concatenates it into one mbox; I then run sa-learn --ham on that and sa-learn --spam on the spam folder).

> >...and now I'm not so sure that it's a good idea to change the rules' 
> >scores. I suppose I could reduce the threshold to 4.5; but I don't know if 
> >that's a good thing either.
> 
> I reduced the treshold too, but also watch quarantine regularly for FPs, 
> it works fine for me ...

I haven't yet done this (but am still thinking about it) see below...

> 
> >What's the best way for me to improve SA performance (bearing in mind that 
> >I'm really only an amateur spam fighter).
> >
> >Thanks in advance....
> 
> perhaps you could use:
> clamav sanesecurity
> SARE Rules
> Botnet plugin
> too ...

Well thanks for this.

I do clam checking before SA so I guess clamav sanesecurity would be duplicating that...

I have just now included many of the SARE rules in my sa-update. I am almost looking forward to getting some spam to see if they work! :)
...I presume that simply adding the rules via sa-update (as per the instructions on the wiki) is enough - they don't have to be "activated" in any way do they?

Having added all the extra SARE rules I haven't changed the overall threshold until I see what effect they have.

Where can I find out more about the Botnet plugin? (There doesn't seem to be a reference to it on the wiki).

Thanks again.

AD

Re: Please help me to improve SA perforformance

Posted by Matthias Haegele <mh...@linuxrocks.dyndns.org>.
SA Lists schrieb:
> Hello all,
> 
> I am only a home user, but I am trying to provide a spam-free environment to all members of my household. I have SA (3.2.2) set up with bayesian rules and run sa-learn every night. SA's performance is "average" (having caught c.600 out of the 1,000 or so spam mails currently in my spam folder (used for sa-learning).
> 
> Today I had a small rash of those spam mails that have about three lines of random words. SA let about 8 through and caught 2. Here are the headers from one of each:

You only mentioned running sa-learn on spam you should also learn your 
hammessages, both is important. Bayes-Performance will only be good if 
learned on both, ham and spam.

[Spam Samples removed]

> I thought perhaps that I would increase the scores on the bayes rules but then I read this on the SA wiki:
> 
> Note: Scores for "learn" rules, such as BAYES_*, that rate the probability that a message is spam, are scored using the same method. This can produce "confusing" scores, for instance, that have BAYES_80 with a higher score than BAYES_99. There are a few reasons for this. 1) The score generation system does not understand that BAYES_* are related to one another, they're separate rules that need separate scores. 2) More importantly, the higher the probability from a "learn" rule, the higher likelihood that the message also hit a bunch of other rules. This lets the score generation system lower the "learn" rule score due to the inevitable false positive, while also still marking the message as spam via the sum of all rule scores.
> 
> ...and now I'm not so sure that it's a good idea to change the rules' scores. I suppose I could reduce the threshold to 4.5; but I don't know if that's a good thing either.

I reduced the treshold too, but also watch quarantine regularly for FPs, 
it works fine for me ...


> What's the best way for me to improve SA performance (bearing in mind that I'm really only an amateur spam fighter).
> 
> Thanks in advance....

perhaps you could use:
clamav sanesecurity
SARE Rules
Botnet plugin
too ...

> AD


-- 
GrĂ¼sse/Greetings
MH


Dont send mail to: ubecatcher@linuxrocks.dyndns.org
--