You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by David Ronis <ro...@ronispc.chem.mcgill.ca> on 2008/07/19 00:32:40 UTC

spam learning

I use evolution as my mail client.  Evolution supports spamassassin and
in the past I let evolution use spamassassin to filter incoming mail.
Recently, I switched to spam filtering using procmail.  The relevant
section of my my .procmailrc file is:

:0fw: spamc.lock
* < 256000
| spamc

# Mails with a score of 15 or higher are almost certainly spam (with
0.05%
# false positives according to rules/STATISTICS.txt). Let's put them in
a
# different mbox. (This one is optional.)
:0:
* ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
Inbox.Spam

# All mail tagged as spam (eg. with a score higher than the set
threshold)
# is moved to "probably-spam".
:0:
* ^X-Spam-Status: Yes
Likely.Spam

Evolution then simply reads the various mboxes.  

Here's my question.  I tell spamassassin to (re)learn the spam tagged
messages using evolution.  However, the format of the messages now has
the spamc report with the offending message as an attachment.  Is
spamassassin "smart" enough to recognize the differnece between the two
parts of the message?

David



Re: spam learning

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
Back on-list. If my comment may be mis-understood, others might
mis-understand it just as well.


On Sun, 2008-07-20 at 18:42 -0400, David Ronis wrote:
> On Sun, 2008-07-20 at 22:55 +0200, Karsten Bräckelmann wrote:
> [snip]
> 
> > Also, learning (Bayes training) now needs to be done server side. The
> > clients "Junk" buttons won't work.
> 
> In this case, the client and server are the same machine.  Are you
> saying that spamc won't look at my user .spamassassin directory when
> invoked by procmail (via sendmail)?

No. :)  Actually, I did not even talk about spamc, but Bayes training.
That generally means sa-learn.

Your piping through 'spamc' as a filter in procmailrc is just fine, and
spamd (invoked by spamc) will use the users ~/.spamassassin/ files.

What I was talking about is the usage of 'sa-learn' to train Bayes. Also
using per-user database in the very same dir. Since your server and
client is the same machine, SA will continue to use the Bayes and AWL
files it used with your previous setup. No change here. A perfectly
smooth move of your mail processing chain.

The one thing that likely *does* change, however, is the ability to use
your clients "Junk" buttons to actually train your Bayes on mis-
classified mail.


Let me elaborate on this. Previously, you have been using the SA Junk
plugin in Evolution. Hitting the "Junk" or "Not Junk" button in your
client actually called SA to learn the mail.

In the case where your client does NOT equal the server, these buttons
will not work any longer with server side spam filtering.

In your case, where the server and client happens to be identical, it
MIGHT work. It might just as well fail miserably. Evolution supports
multiple spam filtering backends. But you most likely disabled that
plugin in Evo, because you don't want Evo to process mail in your Inbox
with SA a second time. Thus, hitting the "Junk" button will indeed still
set the IMAP flag -- but it most likely will NOT make SA learn the mail.

In a nutshell:  With server side spam filtering, client side "buttons"
will NOT make the server learn the message. [1]  Regardless, if client
and server happen to be the same machine. If you do server side
filtering, you need to do server side training (on error) as well.

  guenther


[1] Unless you got a custom setup, in which case you know it works.

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: spam learning

Posted by Karsten Bräckelmann <gu...@rudersport.de>.
On Fri, 2008-07-18 at 18:32 -0400, David Ronis wrote:
> I use evolution as my mail client.  Evolution supports spamassassin and
> in the past I let evolution use spamassassin to filter incoming mail.
> Recently, I switched to spam filtering using procmail.  The relevant
> section of my my .procmailrc file is:

Good move. ;)

> :0fw: spamc.lock
> * < 256000
> | spamc
> 
> # Mails with a score of 15 or higher are almost certainly spam (with 0.05%
> # false positives according to rules/STATISTICS.txt). Let's put them in a
> # different mbox. (This one is optional.)
> :0:
> * ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
> Inbox.Spam
> 
> # All mail tagged as spam (eg. with a score higher than the set threshold)
> # is moved to "probably-spam".
> :0:
> * ^X-Spam-Status: Yes
> Likely.Spam

Rather weird folder name, huh?

> Evolution then simply reads the various mboxes.  
> 
> Here's my question.  I tell spamassassin to (re)learn the spam tagged
> messages using evolution.  However, the format of the messages now has
> the spamc report with the offending message as an attachment.  Is
> spamassassin "smart" enough to recognize the differnece between the two
> parts of the message?

As Sahil already answered:  Yes, SA will unwrap the original message and
strip it's own headers before learning.


Since you've just moved from client side filtering, some additional
hints:

If you don't like the message wrapped as an attachment, the option
'report_safe 0' will prevent this. All reports are in the headers in
that case, the message will not be altered otherwise. This pretty much
looks like what you are used to -- with the notable exception of
additional report headers.

Also, learning (Bayes training) now needs to be done server side. The
clients "Junk" buttons won't work.

  guenther


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}


Re: spam learning

Posted by Sahil Tandon <sa...@tandon.net>.
David Ronis <ro...@ronispc.chem.mcgill.ca> wrote:

> I use evolution as my mail client.  Evolution supports spamassassin and
> in the past I let evolution use spamassassin to filter incoming mail.
> Recently, I switched to spam filtering using procmail.

[...]
  
> Here's my question.  I tell spamassassin to (re)learn the spam tagged
> messages using evolution.  However, the format of the messages now has
> the spamc report with the offending message as an attachment.  Is
> spamassassin "smart" enough to recognize the differnece between the two
> parts of the message?
                                            
http://wiki.apache.org/spamassassin/BayesInSpamAssassin:

"It's OK to feed emails with Spamassassin markup into the sa-learn command -- 
sa-learn will ignore any standard Spamassassin headers, and if the original 
email has been encapsulated into an attachment it will decapsulate the email. 
In other words sa-learn will undo any changes which Spamassassin has done 
before learning the spam/ham character of the email."

-- 
Sahil Tandon <sa...@tandon.net>