You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Jo <ml...@winfix.IT> on 2005/12/04 18:13:37 UTC

Re: Learning at an MTA

Alan Gutierrez wrote:

>I'd like to install SpamAssassin in Postfix to filter spam for a
>Domino mail server. I'd like to use Bayesian filtering.
>
>How have people solved the problem of training the filter with user
>feedback when SpamAssassin is running at the MTA?
>
>The idea I'm entertaining is wrapping spam messages, delivering them
>as attachments, which SA will do already, and having a Reply-To
>address of "mistakes@mail.domain.com", or if it is a false negative,
>forward the unwrapped message to "mistakes@mail.domain.com".
>
>Then the MTA can teach SA that bit of mail.
>
>Any insight would be greatly appreciated.
>
>--
>Alan Gutierrez - alan@engrm.com - http://engrm.com/blogometer
>  
>
Hi Alan,

What you describe is exactly what we are doing here. 
Postfix/Amavisd-new/Spamassassin filters the mail before it's relayed 
onto the domino mail server. We created two extra mail boxes, one for 
spam and one for ham. Two buttons were added to the user interface which 
move the messages to these boxes respectively. Then we switched on POP 
on the Notes server. We use kmail to fetch the mails and they 
automagically get converted to the standard mailbox format again. (while 
in Notes, they are stored in a proprietary Notes format). Once you 
locate the kmail mailboxes it becomes trivial to write a script to 
process them with sa-learn.
A manual step is involved though. Maybe it's possible to automate it 
entirely with another mail client instead of kmail. I like to have a 
look at what people tag as spam though, before it gets learned.

I hope this helps,

Cheers,

Jo

Re: Learning at an MTA

Posted by mouss <us...@free.fr>.
Alan Gutierrez a écrit :
> 
> Yes, it helps. I'm fortunate in that the Domino mangement will be
> performed by someone who's particularly good at Notes development. I
> need to get a fix on what Domino can do, and that's why I ask.
> 
> Apparently, there's already a "Spam" box on these Domino clients and
> a macro to add mail to the "Spam" box. Using your solution with
> fetchmail instead of kmail, I can automate training of  SA via IMAP or POP.
> 
> But, "Ham" is confusing. I'd suspect that a user would want to
> retain control of folder names, rather than lumping everything of
> value into a "Ham" folder.

I think he was talking about a ham mailbox (one to post False positives 
to), not a ham folder.

> 
> Am I correct in assuming that the user puts mail in the "Ham" folder
> only if it has been incorrectly marked as "Spam"? Then I suppose
> you're running auto-learn maybe, and the "Ham" folder corrects?

I use 4 IMAP folders:
- Junk folder  (people can look here for false positives)
- Junk/Miss for missed spam  (I mean .Junk.Miss but let's use slashes)
- Junk/Error for false positives
- Junk/Trash for confirmed spam (can be purged quickly)

sa-learn is run on Junk/Miss (--spam) and Junk/Error (--ham). after 
that, the messages may be moved (or whatever you want).

if the user didn't copy the FP message (he just moved it to the 
Junk/Error folder, then it should be "redelivered" after sa-learn (but 
one must make sure it is not delivered to the Junk folder again).

(Miss, Error and Trash may be shared folders).


Re: Learning at an MTA

Posted by Alan Gutierrez <al...@engrm.com>.
* Jo <ml...@winfix.IT> [2005-12-04 13:13]:
> Alan Gutierrez wrote:
> 
> >I'd like to install SpamAssassin in Postfix to filter spam for a
> >Domino mail server. I'd like to use Bayesian filtering.
> >
> >How have people solved the problem of training the filter with user
> >feedback when SpamAssassin is running at the MTA?
> >
> >The idea I'm entertaining is wrapping spam messages, delivering them
> >as attachments, which SA will do already, and having a Reply-To
> >address of "mistakes@mail.domain.com", or if it is a false negative,
> >forward the unwrapped message to "mistakes@mail.domain.com".
> >
> >Then the MTA can teach SA that bit of mail.
> >
> >Any insight would be greatly appreciated.

> What you describe is exactly what we are doing here. 
> Postfix/Amavisd-new/Spamassassin filters the mail before it's relayed 
> onto the domino mail server. We created two extra mail boxes, one for 
> spam and one for ham. Two buttons were added to the user interface which 
> move the messages to these boxes respectively. Then we switched on POP 
> on the Notes server. We use kmail to fetch the mails and they 
> automagically get converted to the standard mailbox format again. (while 
> in Notes, they are stored in a proprietary Notes format). Once you 
> locate the kmail mailboxes it becomes trivial to write a script to 
> process them with sa-learn.

> A manual step is involved though. Maybe it's possible to automate it 
> entirely with another mail client instead of kmail. I like to have a 
> look at what people tag as spam though, before it gets learned.

Yes, it helps. I'm fortunate in that the Domino mangement will be
performed by someone who's particularly good at Notes development. I
need to get a fix on what Domino can do, and that's why I ask.

Apparently, there's already a "Spam" box on these Domino clients and
a macro to add mail to the "Spam" box. Using your solution with
fetchmail instead of kmail, I can automate training of  SA via IMAP or POP.

But, "Ham" is confusing. I'd suspect that a user would want to
retain control of folder names, rather than lumping everything of
value into a "Ham" folder.

Am I correct in assuming that the user puts mail in the "Ham" folder
only if it has been incorrectly marked as "Spam"? Then I suppose
you're running auto-learn maybe, and the "Ham" folder corrects?

Thanks for the hand-holding, this is all new stuff to me.

--
Alan Gutierrez - alan@engrm.com - http://engrm.com/blogometer/

Re: Learning at an MTA

Posted by Bill Randle <bi...@neocat.org>.
On Sun, 2005-12-04 at 18:13 +0100, Jo wrote:
> Alan Gutierrez wrote:
> 
> >I'd like to install SpamAssassin in Postfix to filter spam for a
> >Domino mail server. I'd like to use Bayesian filtering.
> >
> >How have people solved the problem of training the filter with user
> >feedback when SpamAssassin is running at the MTA?
> >
> >The idea I'm entertaining is wrapping spam messages, delivering them
> >as attachments, which SA will do already, and having a Reply-To
> >address of "mistakes@mail.domain.com", or if it is a false negative,
> >forward the unwrapped message to "mistakes@mail.domain.com".
> >
> >Then the MTA can teach SA that bit of mail.
> >
> >Any insight would be greatly appreciated.
> >
> >--
> >Alan Gutierrez - alan@engrm.com - http://engrm.com/blogometer
> >  
> >
> Hi Alan,
> 
> What you describe is exactly what we are doing here. 
> Postfix/Amavisd-new/Spamassassin filters the mail before it's relayed 
> onto the domino mail server. We created two extra mail boxes, one for 
> spam and one for ham. Two buttons were added to the user interface which 
> move the messages to these boxes respectively. Then we switched on POP 
> on the Notes server. We use kmail to fetch the mails and they 
> automagically get converted to the standard mailbox format again. (while 
> in Notes, they are stored in a proprietary Notes format). Once you 
> locate the kmail mailboxes it becomes trivial to write a script to 
> process them with sa-learn.
> A manual step is involved though. Maybe it's possible to automate it 
> entirely with another mail client instead of kmail. I like to have a 
> look at what people tag as spam though, before it gets learned.

This is similar to what people do with MS Exchange servers. Check the
archives for "Exchange". To automate the process, you can use
'fetchmail' to retrieve the mail via POP or IMAP and send it to
sa-learn.

For example, in /etc/postfix/aliases, setup two aliases:

# For auto-learning spam/ham
spamlearn:      "|/usr/bin/sa-learn -u amavis --spam --no-sync"
hamlearn:       "|/usr/bin/sa-learn -u amavis --ham --no-sync"

Then in /etc/crontab:

15 * * * * root /usr/bin/fetchmail -s -f /etc/postfix/fetchmail-ham
45 * * * * root /usr/bin/fetchmail -s -f /etc/postfix/fetchmail-spam

The fetchmail scripts look like this:

# Configuration created Thu Dec  4 10:00:49 PST 2003
set no bouncemail
set no spambounce
set properties ""
defaults proto imap
poll mymail.server proto imap
 user "myuser" pass "mypasswd"
  is spamlearn here folder "spammailbox" fetchall no rewrite

This combination will run fetchmail once an hour, grab mail from
spammailbox on mymail.server and send to spamlearn on the filter
box, which in turn gets passed to sa-learn via the aliases file.

	-Bill