You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Richard Ozer <ro...@ois-online.com> on 2004/02/24 21:08:14 UTC

Ad-Hoc Bayesian learning with Microsoft Exchange Server and Outlook

I thought some participants on this list might benefit from this.  Please
feel free to comment or suggest alternatives.

RO

*********************
How to support ad-hoc Bayesian learning with Microsoft Exchange Server and
Outlook

Problem:

Many organizations use Microsoft Exchange, MS Outlook, and Outlook Express
with IMAP for their corporate e-mail. Typically, SpamAssassin is running on
a Linux box that tags the mail and forwards it to the Exchange server for
delivery. One of the challenges in implementing SpamAssassin in this
environment has been to provide a seamless mechanism for end users to train
the bayesian filter. The reason this is difficult is that neither Outlook
nor Outlook Express preserve the original message headers when mail is
forwarded from one mailbox to another. This makes it tedious to send the
necessary information to a spam or ham mailbox. Although mainly a training
problem, most users are unwilling to take the additional time to manually
copy the original headers into a new message, along with the original
message body. It's simply too unwieldy to do so. This often leaves the task
of Bayesian training to the mail admin, who receives forwarded spam message
from the end users (usually without the pre-requisite headers) and is
expected to add the offending email to a blacklist, or to create a new rule.

Solution:
The only time headers are properly preserved in Microsoft Outlook or Outlook
Express, is during a drag and drop operation. This suggests a solution that
takes advantage of Microsoft Exchange's public folder capabilities. A "Spam"
public folder and "Ham" public folder can be created on the exchange server,
allowing users to drag spam or ham into these folders where they will await
retrieval by the SpamAssassin host.

A key piece of this puzzle can be found on Nick Burch's web site at:

http://tirian.magd.ox.ac.uk/~nick/code/

There you will find a perl script called imap-sa-learn.pl. This script will
logon to any server supporting IMAP, retrieve any messages located in any
arbitrarily named folder, process the contents of that folder as either ham
or spam, delete the processed messages, and then run an sa-learn --rebuild.
The script is simple to understand, and you need only predicate your public
folder name with the "Public Folders" directive. For instance, if you create
a public folder called "Spam", you would set the script variable containing
the Spam folder's path to:

my $defspamfolder = 'Public Folders/Spam';

Likewise, you would do something similar for the Ham folder.

On the exchange side, create a domain user called spamassassin with minimal
rights and create an exchange mailbox for it... it should never receive any
mail. The account is there simply to give the account access to the public
folders.

Using Outlook, and while logged in as an administrator, create the Spam and
Ham public folders. Right click on each folder, go to the folder
properties/permissions tab and make the spamassassin user a folder "Owner".
This will give the spamassassin account the necessary privleges to delete
processed messages. The default permissions should allow anyone to post to
the folder, and delete only their items.
In Nick's script, set the login and password to the spamassain user's
account ID and password, and test. By using a non-admin account for the
spamassassin, you avoid the risk of having a plain-text administrator name
and password sitting inside a perl script.

This mechanism works for both Exchange 5.5 SP4, or Exchange 2000+.

Richard Ozer
rozer@ois-online.com