You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Chluz <ch...@free.fr> on 2013/08/19 23:24:58 UTC

RE: sa-learn and exchange integration

Hi all, 
I just registered to be able to post this. I have a working solution for
learning with sa-learn messages placed into a special folders by exchange
2013 users. 
This works for me as I have a small number of users (this is a family
server) but might be adapted to more corporate infrastructure without having
to create login files for each user.

As you may know, public folders cannot be accessed through imap anymore
under exchange 2013. (I haven't been able to and the literature says I
shouldn't be able to, but if someone managed, please tell me.) However, all
mail from each root folder for each user can be downloaded to the
spamassassin mail gateway using offlinebackup. 

The idea is to create a root folder in the mailbox of each user in your
organisation called say 'Learn As Spam'. I did this manually, but you can
probably do a bulk add for large organisation using the info in this link 
http://careexchange.in/create-a-custom-root-folder-in-all-the-mailboxes-bulk-in-exchange-2010/
<http://careexchange.in/create-a-custom-root-folder-in-all-the-mailboxes-bulk-in-exchange-2010/> 
.

Next setup offlineimap to download the contents of that folder only for all
users and store it in '/SpamLearn/' in your mail gateway. Again, I do this
manually, but I suspect people can use the  script I give in post 15 of this
thread  http://www.howtoforge.com/forums/showthread.php?t=60708&page=2
<http://www.howtoforge.com/forums/showthread.php?t=60708&page=2>   to create
a file containing the list of email addresses of valid users of your
exchange organisation, and then write a script to cycle through the list of
those emails, logging in to the email accounts using the mail admin
credentials (note that you need to have given access to all mailboxes with
the mail admin credentials using this link 
http://social.technet.microsoft.com/Forums/exchange/en-US/a88dfbd3-8461-4848-90a8-003044805de0/grant-full-access-to-all-mailboxes-in-particular-domain
<http://social.technet.microsoft.com/Forums/exchange/en-US/a88dfbd3-8461-4848-90a8-003044805de0/grant-full-access-to-all-mailboxes-in-particular-domain> 
. I haven't tested this automated way, but if admin has imap acces it should
work). 

Once you have the emails all stored in a folder of your mail gateway, I then
use this script to learn all mails and delete the learnt mails. Credits for
script go to Freddie Witherden. Mail.domain.com is my mail server running
exchange and gateway.domain.com is my mail gateway server running
spamassassin

#! /bin/sh

[ -x /usr/bin/sa-learn ] || exit 0

# Sync Imap Folder, activate this only if offline imap is not started on
boot
#offlineimap -o

# For every existing user folder in SpamLearn
for i in /SpamLearn/*;
do
    if [ -d "$i/Learn As Spam/cur" ];
    then
    cd "$i/Learn As Spam/cur"
        # Get the mails to train spamassassin
        for f in *; do
                if [ -e "$f" ]; then
                       # Start by removing the headers generated by exchange
and co
                        sed -i '/^Received: from
mail.domain.com/,/^Received: by gateway.domain.com/{d}' "$f"
                        sed -i '/Received:/,$!d' "$f"
                        sed -i
'/^X-MS-Exchange-Organization-Network-Message-Id/,/^X-MS-Exchange-Organization-AuthAs/{d}'
"$f"
                        sed -i
'/^X-domain-MailScanner-Information/,/^X-domain-MailScanner-From/{d}' "$f"
                        # Debian-exim does not have read access to the mails
so we pipe them
                        cat "$f" | su - -s /bin/bash mail -c "sa-learn
--spam" | grep -v "Learned tokens from"
                        # Move files to the Spam dir
                        #mv "$f" ../../.Junk/cur/
                        # Or just delete it
                        rm -f "$f"
                fi
        done
    fi;
    if [ -d "$i/Learn As Spam/new" ];
    then
    cd "$i/Learn As Spam/new"
        # Get the mails to train spamassassin
        for f in *; do
                if [ -e "$f" ]; then
                       # Start by removing the headers generated by exchange
and co
                        sed -i '/^Received: from
Server.domain.com/,/^Received: by gateway.domain.com/{d}' "$f"
                        sed -i '/Received:/,$!d' "$f"
                        sed -i
'/^X-MS-Exchange-Organization-Network-Message-Id/,/^X-MS-Exchange-Organization-AuthAs/{d}'
"$f"
                        sed -i
'/^X-domainMailScanner-Information/,/^X-domain-MailScanner-From/{d}' "$f"

                       # Debian-exim does not have read access to the mails
so we pipe them
                        cat "$f" | su - -s /bin/bash mail -c "sa-learn
--spam" | grep -v "Learned tokens from"
                        # Move files to the Spam dir
                        #mv "$f" ../../.Junk/cur/
                        # Or just delete it
                        rm -f "$f"
                fi
        done
    fi;
done

# Print completed
# echo 'SpamAssassin Learn Completed.'
exit 0

Note that I use a lot of seds to remove all headers added by transferring
the mail from gateway to exchange server (this includes the mailscanner
header entries and exchange entries ,as well as some received by entries).
the email is deleted after being learnt, With offlineimap running in the
background the user mailbox is updated and the learn emails are removed from
'Learn As Spam' folder on mailbox.

I run the sa-learn script with a cronjob every hour.
 



--
View this message in context: http://spamassassin.1065346.n5.nabble.com/sa-learn-and-exchange-integration-tp103510p106260.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.