You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Olaf Greve <og...@millennics.com> on 2008/02/28 15:19:28 UTC

How to properly teach SA to recognise the spam that is still getting through, despite the rules updates

Hi,

Firstly: I'm new to this list and also pretty new to SA in general. I did try to find the answers to my questions in the FAQ, but haven't succeeded beyond all doubt at doing so. I do hope, however, that I'm not flogging a dead horse with my below questions (which appear at the end of the message)...:P

Secondly, I'd like to say that SA is a *great* tool, and that "Internet-life" is much better with it, than it used to be without it! :P

The situation:
I run a FreeBSD 5.4-release AMD-64 based server, on which I have installed SA (identified by pkg_info as: "p5-Mail-SpamAssassin-3.2.4_2") through Amavisd-new (precise version, according to pkg_info: "amavisd-new-2.5.2,1"), which is being invoked after mail arrives on the RX side of Sendmail. The RX daemon is split in two, and tunnels the mail locally through amavisd-new (using clamd and SA), and all mail that passes the tests gets delivered, and the rest goes directly to the quarantine.

The problem:
The above set-up was working fine (using SA 3.2.3) for several months, and virtually no spam got through. However, all of a sudden since some two weeks I'm getting about 100 spam mails per day again, and these seem to include spam mails that I have previously seen being filtered out... Still, by far most of the spam does get filtered out, but for some reason (perhaps spammers finding ways around SA?) more and more spam is getting through again.

My approach so far:
Figuring SA or the rules to be outdated (despite the twice-weekly call to sa-update from cron), I first updated SA to 3.2.4. (and performed an sa-update too), but to no real avail: the same amount of spam seemed to be getting through. I then checked into additional channels, and soon came across the SARE (based) ones. I decided to add the saupdates.openprotect.com channel, but still the same amount of spam seems to get through.

The way I perform my updates are as follows:

Cron call:
23 3 * * 2,5 /usr/local/bin/sa-update --allowplugins --gpgkeyfile /root/sa_pgp_keys --channelfile /root/sa_channels && /usr/local/etc/rc.d/sa-spamd.sh restart > /dev/null

(yes, I realise spamd is not actually used by amavisd-new, but I decided to have it running anyway)

My /root/sa_channels file contains the following:
saupdates.openprotect.com
updates.spamassassin.org

Now, my questions are:
1-Am I doing anything wrong, or am I grossly overlooking something?
2-I've never tried to teach SA about which messages are spam and which are ham. From what I gather from the website, I need to set-up a mailbox with solely spam and feed that to sa-learn, and then do the same for a mailbox containing solely ham. However, how can I best go about this? Once spam is misidentified, it gets mixed in the live mailboxes with ham, so I wouldn't want to classify all of it as either ham or spam... Then, I did keep the spam messages from the last few days. Can I perhaps (manually) forward those to a local mailbox, and then run sa-learn on that mailbox, getting it successfully identified as spam, or will that not work due to the new mail headers added by the forward action from my mail client?
3-Are there perhaps other good (preferrably automatic ways) to tell SA about what is spam, and what isn't?
4-Are there perhaps other very efficient rules channels that you can recommend me to add (like using the full set of SARE rules, rather than the openprotect subset of it)?
5-Just a theory, but is it perhaps possible that SA somehow misidentified a spam message as being ham, and that all messages that are similar to that particular spam message are now being misidentified as ham, hence all getting through?

Any and all feedback will be greatly appreciated, and I would like to thank you all for taking the time to read this e-mail and address the questions raised in it.

With kind regards,
Olaf Greve

Re: How to properly teach SA to recognise the spam that is still getting through, despite the rules updates

Posted by Matt Kettler <mk...@verizon.net>.
Olaf Greve wrote:
> Hi,
>  
> Firstly: I'm new to this list and also pretty new to SA in general. I 
> did try to find the answers to my questions in the FAQ, but haven't 
> succeeded beyond all doubt at doing so. I do hope, however, that I'm 
> not flogging a dead horse with my below questions (which appear at the 
> end of the message)...:P
>  
> Secondly, I'd like to say that SA is a *great* tool, and that 
> "Internet-life" is much better with it, than it used to be without it! :P
>  
> The situation:
> I run a FreeBSD 5.4-release AMD-64 based server, on which I have 
> installed SA (identified by pkg_info as: 
> "p5-Mail-SpamAssassin-3.2.4_2") through Amavisd-new (precise version, 
> according to pkg_info: "amavisd-new-2.5.2,1"), which is being invoked 
> after mail arrives on the RX side of Sendmail. The RX daemon is split 
> in two, and tunnels the mail locally through amavisd-new (using clamd 
> and SA), and all mail that passes the tests gets delivered, and the 
> rest goes directly to the quarantine.
>  
> The problem:
> The above set-up was working fine (using SA 3.2.3) for several months, 
> and virtually no spam got through. However, all of a sudden since some 
> two weeks I'm getting about 100 spam mails per day again, and these 
> seem to include spam mails that I have previously seen being filtered 
> out... Still, by far most of the spam does get filtered out, but for 
> some reason (perhaps spammers finding ways around SA?) more and more 
> spam is getting through again.
>  
> My approach so far:
> Figuring SA or the rules to be outdated (despite the twice-weekly call 
> to sa-update from cron), I first updated SA to 3.2.4. (and performed 
> an sa-update too), but to no real avail: the same amount of spam 
> seemed to be getting through. I then checked into additional channels, 
> and soon came across the SARE (based) ones. I decided to add the 
> saupdates.openprotect.com channel, but still the same amount of spam 
> seems to get through.
>  
> The way I perform my updates are as follows:
>  
> Cron call:
> 23 3 * * 2,5 /usr/local/bin/sa-update --allowplugins --gpgkeyfile 
> /root/sa_pgp_keys --channelfile /root/sa_channels && 
> /usr/local/etc/rc.d/sa-spamd.sh restart > /dev/null
> (yes, I realise spamd is not actually used by amavisd-new, but I 
> decided to have it running anyway)
>  
> My /root/sa_channels file contains the following:
> saupdates.openprotect.com
> updates.spamassassin.org
> Now, my questions are:
> 1-Am I doing anything wrong, or am I grossly overlooking something?
It's hard to say.. can you post an X-Spam-Status from one of the missed 
messages? It's not perfect, but there's a lot we can tell from glancing 
at that.. things like BAYES_00 or ALL_TRUSTED are signs of specific 
problems...

> 2-I've never tried to teach SA about which messages are spam and which 
> are ham. From what I gather from the website, I need to set-up a 
> mailbox with solely spam and feed that to sa-learn, and then do the 
> same for a mailbox containing solely ham. However, how can I best go 
> about this? Once spam is misidentified, it gets mixed in the live 
> mailboxes with ham, so I wouldn't want to classify all of it as 
> either ham or spam... Then, I did keep the spam messages from the last 
> few days. Can I perhaps (manually) forward those to a local mailbox, 
> and then run sa-learn on that mailbox, getting it successfully 
> identified as spam, or will that not work due to the new mail headers 
> added by the forward action from my mail client?
You can't forward a message and then feed it to sa-learn. When you 
forward a message, the content might look similar when rendered in a 
mail client, but it's *vastly* different when you look at the complete, 
raw message.
> 3-Are there perhaps other good (preferrably automatic ways) to tell SA 
> about what is spam, and what isn't?
SA has an autolearner built in and enabled by default, but it's not 
perfect.
> 4-Are there perhaps other very efficient rules channels that you can 
> recommend me to add (like using the full set of SARE rules, rather 
> than the openprotect subset of it)?
> 5-Just a theory, but is it perhaps possible that SA somehow 
> misidentified a spam message as being ham, and that all messages that 
> are similar to that particular spam message are now being 
> misidentified as ham, hence all getting through?
Possible.. although it would generally take a lot of mislearning.. 
Seeing a low scoring BAYES_XX rule in the X-Spam-Status would suggest 
this problem..
>  
> Any and all feedback will be greatly appreciated, and I would like to 
> thank you all for taking the time to read this e-mail and address the 
> questions raised in it.
>