You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Anthony Kamau <An...@diamondkey.com> on 2007/07/17 02:40:21 UTC

OT Alert: Forward low scoring SPAM to sa-learn.

Hello all.

I'm faced with a dilemma on how to use sa-learn with mail forwarded from
a user's inbox on Exchange to the sendmail server.  Since we just
recently started using sendmail as a front end server, our bayes system
is still in its infancy and spam is getting through to user inboxes with
scores lower than our threshold of 10 and thus not being clearly
identified as spam on the subject line.  My intention is to have a user
forward spam back to sendmail server and use sa-learn to help the
scoring system get better fast.

Here's what I've done so far:
I have created two email addresses for this purpose;
spam@mail.domain.com for spam and ham@mail.domain.com for false
positives.  I have created a connector that forwards all email destined
for mail.domain.com back to the sendmail server and messages are getting
into the appropriate mailboxes.

The next step is what has me stunned - is there a standard marker to
look out for that segregates the attachment from the mail sending the
attachment?

Any help would be mightily appreciated.

Cheers,
AK.


Re: OT Alert: Forward low scoring SPAM to sa-learn.

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
On 17.07.07 10:40, Anthony Kamau wrote:
> I'm faced with a dilemma on how to use sa-learn with mail forwarded from
> a user's inbox on Exchange to the sendmail server.  Since we just
> recently started using sendmail as a front end server, our bayes system
> is still in its infancy and spam is getting through to user inboxes with
> scores lower than our threshold of 10 and thus not being clearly
> identified as spam on the subject line.  My intention is to have a user
> forward spam back to sendmail server and use sa-learn to help the
> scoring system get better fast.

my experience tells that exchange rewrites mails very often in such a
horrible way that mail from exchange should be never used for SA training.

Try to send all copies of received e-mail to special mailbox on your front-end server
and whenever your user reports false positive/negative, run sa-learn (or
spamassasin -r/-k) over the copy.

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Linux is like a teepee: no Windows, no Gates and an apache inside...

RE: OT Alert: Forward low scoring SPAM to sa-learn.

Posted by Anthony Kamau <An...@diamondkey.com>.
> -----Original Message-----
> From: Matt Kettler [mailto:mkettler_sa@verizon.net]
> Sent: Tuesday, 17 July 2007 11:35 AM
> To: Anthony Kamau
> Cc: users@spamassassin.apache.org
> Subject: Re: OT Alert: Forward low scoring SPAM to sa-learn.
> 
> That said, if you're just doing a "forward as attachment" type
> operation, you should be able to get any standard mime attachment
> extractor tool..
> 

Thanks Matt,

I was planning on having the users forward the spam/ham as an
attachment, but that was before I read Michael's post.  All should be
well unless I have other issues with the script...

Cheers,
AK.


Re: OT Alert: Forward low scoring SPAM to sa-learn.

Posted by Matt Kettler <mk...@verizon.net>.
Anthony Kamau wrote:
> Hello all.
>
> I'm faced with a dilemma on how to use sa-learn with mail forwarded from
> a user's inbox on Exchange to the sendmail server.  Since we just
> recently started using sendmail as a front end server, our bayes system
> is still in its infancy and spam is getting through to user inboxes with
> scores lower than our threshold of 10 and thus not being clearly
> identified as spam on the subject line.  My intention is to have a user
> forward spam back to sendmail server and use sa-learn to help the
> scoring system get better fast.
>
> Here's what I've done so far:
> I have created two email addresses for this purpose;
> spam@mail.domain.com for spam and ham@mail.domain.com for false
> positives.  I have created a connector that forwards all email destined
> for mail.domain.com back to the sendmail server and messages are getting
> into the appropriate mailboxes.
>
> The next step is what has me stunned - is there a standard marker to
> look out for that segregates the attachment from the mail sending the
> attachment?
>   
Standard? There's nothing that's standard about forwarding email.

That said, if you're just doing a "forward as attachment" type
operation, you should be able to get any standard mime attachment
extractor tool..

ie: http://search.cpan.org/dist/ppt/bin/mimedecode

If you're using an ordinary "forward", don't bother. The message has
been completely rebuilt and only has a visible-text resemblance to the
original. Generally a normal "forward" does the following, any of which
is more-or-less a different message as far as SA is concerned, but the
header ones are pretty catastrophic unless you can do major reconstruction.

1) discard ALL of the original message headers, and build new ones,
copying a minimal amount of text:
    -The message is now From: the forwardee, not the spammer.
    -All of the Received: headers are new.
    -Any out-of-the-ordinary headers are generally gone (ie: X-Id, 
X-Originating-IP, etc)
    -Even the subject is generally changed to include "Fwd:" or
something similar.
    -Obviously the X-Mailer and/or User-Agent is replaced with the one
for your MUA, not the original.

2) Significant changes to the body text:
    - For multipart/alternative messages, many mail clients will discard
the original text/plain, and build a new one based on the contents of
the text/html
    - Most add some kind of "Forwarded message follows" text
    - Most will re-do any character encodings. ie: a message that was
base64 encoded will probably not be.
    - Most will re-do line-wraps to suit their own tastes.
    - All will generate completely new mime boundaries which will
generally not be remotely similar to the originals.