You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Michael Shiloh <mi...@makingthings.com> on 2004/08/16 20:32:25 UTC

[SPAMASSASSIN-USERS] How does training work on forwarded email?

Hello SA users and experts,

Being a long-time and very happy SA user at home, when I was asked to set
up a spam filter on our new mail server at work I new exactly what I would
use. Having to support more than just the one user, I have had to learn
and use some new features, and have used the archives and other resources
to help, for which I thank you all.

One thing that remains a bit of a mystery is how learning occurs when a message
is forwarded. 

We use postfix, and my Linux distro is Redhat 9. 

For training, I've set up dummy email accounts for mis-identified
ham and spam. I have a very simple shell script for training, calling
sa-learn on the spam and ham, and then deleting the contents. Later, 
I'll have crond do this, but for now I'm doing it manually in order to
keep an eye on things.

I've told my users to forward the mis-identified email to those mailboxes.

The sequence of events is as follows:

1. SA running on our mail server inspects an incoming messages and believes it
   to be spam or non-spam.
2. Mail message is added to user's mailbox 
3. User reads mail, perhaps with Netscape, Windows email client (whatever that is),
   Mac's email program or, in my case, pine.
4. User forwards mis-identified ham or spam to the appropriate user
5. User's mail client wraps the original message in some sort of header (perhaps
   in some cases it's forwarded as an attachement and not as content).
6. Postfix adds that mail message to the ham or spam mailbox
7. Once a day (or so) I run sa-learn on ham and spam with the appropriate flags.

The mystery is, that spam now has a header from my user, who is forwarding the
spam to the spam mail account. How does SA know to train on the contents, and
not my legitimate user's header?

Thanks for any insight,
Michael


Re: [SPAMASSASSIN-USERS] How does training work on forwarded email?

Posted by Roland Roberts <ro...@astrofoto.org>.
-----BEGIN PGP SIGNED MESSAGE-----

>>>>> "ms" == Michael Shiloh <mi...@makingthings.com> writes:

    ms> Yes, please post your perl script, along with the invocation

The script is attacheds, it takes no args and makes the assumption
that the message is either a message/rfc822 attachment which is either

(1) already labelled incorrectly as spam in which case there is a
    Content-description header which says "original message before
    SpamAssassin", or
(2) incorrectly tagged as ham in which case the Content-description
    doesn't match.

For the first case, the inner message is extracted to stdout.  For the
second case, the outermost message is extracted to stdout.  It will
also work if you feed the message in directly, e.g., in mbox format.

I haven't yet installed it because I am still trying to understand how
to bypass the extra SA processing when I forward the mail internally
to the learning account.  All my mail goes through a procmail router,
so adding the script should be as simple as adding a procmail rule
something like this:

    :0
    * ^TO_.*notspam
    | $SOMEPATH/sa-extract.pl | sa-learn -ham


regards,

roland
- -- 
                       PGP Key ID: 66 BC 3B CD
Roland B. Roberts, PhD                             RL Enterprises
roland@rlenter.com                            6818 Madeline Court
roland@astrofoto.org                           Brooklyn, NY 11220


Re: [SPAMASSASSIN-USERS] How does training work on forwarded email?

Posted by Michael Shiloh <mi...@makingthings.com>.
Hi Roland,

Yes, please post your perl script, along with the invocation

Michael


On Tue, 17 Aug 2004, Roland Roberts wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> 
> >>>>> "rm" == Ryan Moore <ry...@perigee.net> writes:
> 
>     >> The mystery is, that spam now has a header from my user, who is
>     >> forwarding the spam to the spam mail account. How does SA know
>     >> to train on the contents, and not my legitimate user's header?
> 
>     rm> The answer is that it doesn't, and it will train tokens based
>     rm> on your users forward, which is bad. You will want to strip
>     rm> out the original message (which should be sent as an
>     rm> attachment) and only learn based on that.
> 
> I also got this answer on the sa-exim list, and I have a perl script
> that will extract the message assuming it is forwarded as an
> attachment of type message/rfc822.
> 
> The problem I need to figure out is how to keep spamassassin from
> running against the message when delivered to the dummy account.  I
> don't completely understand when the local_scan gets invoked---is
> $local_user_id set so I can test based on that?
> 
> I can post the perl script is anyone is interested....
> 
> regards,
> 
> roland
> - -- 
>                        PGP Key ID: 66 BC 3B CD
> Roland B. Roberts, PhD                             RL Enterprises
> roland@rlenter.com                            6818 Madeline Court
> roland@astrofoto.org                           Brooklyn, NY 11220
> 
> -----BEGIN PGP SIGNATURE-----
> Version: 2.6.3ia
> Charset: noconv
> Comment: Processed by Mailcrypt 3.5.4, an Emacs/PGP interface
> 
> iQCVAwUBQSGHh+oW38lmvDvNAQH4cwP/WGD3Y5oPeeBwVQ3vrCAJgjWMa4C9I5fW
> KJu1pBg3IS1zr4mKRyXQF7+m0IkeX/PrdrqeojWw5esR+RmSYgil60T4WQmw7eTH
> 6tWS4oIyF0ZnziDIOMVQlNvVK2oAwkvoyW3SvRlx8akBGNPgMSDRzSRuaaVT5tjC
> pFZtbdLQcQo=
> =hFQs
> -----END PGP SIGNATURE-----
> 


Re: [SPAMASSASSIN-USERS] How does training work on forwarded email?

Posted by Roland Roberts <ro...@astrofoto.org>.
-----BEGIN PGP SIGNED MESSAGE-----

>>>>> "rm" == Ryan Moore <ry...@perigee.net> writes:

    >> The mystery is, that spam now has a header from my user, who is
    >> forwarding the spam to the spam mail account. How does SA know
    >> to train on the contents, and not my legitimate user's header?

    rm> The answer is that it doesn't, and it will train tokens based
    rm> on your users forward, which is bad. You will want to strip
    rm> out the original message (which should be sent as an
    rm> attachment) and only learn based on that.

I also got this answer on the sa-exim list, and I have a perl script
that will extract the message assuming it is forwarded as an
attachment of type message/rfc822.

The problem I need to figure out is how to keep spamassassin from
running against the message when delivered to the dummy account.  I
don't completely understand when the local_scan gets invoked---is
$local_user_id set so I can test based on that?

I can post the perl script is anyone is interested....

regards,

roland
- -- 
                       PGP Key ID: 66 BC 3B CD
Roland B. Roberts, PhD                             RL Enterprises
roland@rlenter.com                            6818 Madeline Court
roland@astrofoto.org                           Brooklyn, NY 11220

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3ia
Charset: noconv
Comment: Processed by Mailcrypt 3.5.4, an Emacs/PGP interface

iQCVAwUBQSGHh+oW38lmvDvNAQH4cwP/WGD3Y5oPeeBwVQ3vrCAJgjWMa4C9I5fW
KJu1pBg3IS1zr4mKRyXQF7+m0IkeX/PrdrqeojWw5esR+RmSYgil60T4WQmw7eTH
6tWS4oIyF0ZnziDIOMVQlNvVK2oAwkvoyW3SvRlx8akBGNPgMSDRzSRuaaVT5tjC
pFZtbdLQcQo=
=hFQs
-----END PGP SIGNATURE-----

Re: [SPAMASSASSIN-USERS] How does training work on forwarded email?

Posted by Ryan Moore <ry...@perigee.net>.
I'm sure there are better ways to do it compared to the dinky little 
bash script I use myself (http://h0b0.net/salearn.txt). I'd be surprised 
if there aren't any perl scripts or something more robust out there that 
do the same. You just need an alias in your MTA that pipes email to the 
script and let it do its thing.

Ryan Moore
----------
Perigee.net Corporation
704-849-8355 (sales)
704-849-8017 (tech)
www.perigee.net

Michael Shiloh wrote:
> Thanks for your rapid reply, Ryan. This is very interesting. I found numerous
> examples showing how to do what I described, but non discussed stripping out
> the original messages. I know this is a bit OT for spamassassin, but does
> anyone have suggestions on how to do this automatically, or a pointer to 
> some recipe to do this?
> 
> Thanks,
> Michael
> 

Re: [SPAMASSASSIN-USERS] How does training work on forwarded email?

Posted by Network Administrator <ne...@mirusbio.com>.
Won't a "Re-Direct" instead of a Forward accomplish what you're trying to
do?


On 8/16/04 1:51 PM, "Michael Shiloh" <mi...@makingthings.com> wrote:

> Thanks for your rapid reply, Ryan. This is very interesting. I found numerous
> examples showing how to do what I described, but non discussed stripping out
> the original messages. I know this is a bit OT for spamassassin, but does
> anyone have suggestions on how to do this automatically, or a pointer to
> some recipe to do this?
> 
> Thanks,
> Michael
> 
> On Mon, 16 Aug 2004, Ryan Moore wrote:
> 
>> Michael Shiloh wrote:
>>> The mystery is, that spam now has a header from my user, who is forwarding
>>> the
>>> spam to the spam mail account. How does SA know to train on the contents,
>>> and
>>> not my legitimate user's header?
>>> 
>>> Thanks for any insight,
>>> Michael
>>> 
>> 
>> The answer is that it doesn't, and it will train tokens based on your
>> users forward, which is bad. You will want to strip out the original
>> message (which should be sent as an attachment) and only learn based on
>> that.
>> 
>> Ryan Moore
>> ----------
>> Perigee.net Corporation
>> 704-849-8355 (sales)
>> 704-849-8017 (tech)
>> www.perigee.net
>> 
> 



Re: [SPAMASSASSIN-USERS] How does training work on forwarded email?

Posted by Michael Shiloh <mi...@makingthings.com>.
Thanks for your rapid reply, Ryan. This is very interesting. I found numerous
examples showing how to do what I described, but non discussed stripping out
the original messages. I know this is a bit OT for spamassassin, but does
anyone have suggestions on how to do this automatically, or a pointer to 
some recipe to do this?

Thanks,
Michael

On Mon, 16 Aug 2004, Ryan Moore wrote:

> Michael Shiloh wrote:
> > The mystery is, that spam now has a header from my user, who is forwarding the
> > spam to the spam mail account. How does SA know to train on the contents, and
> > not my legitimate user's header?
> > 
> > Thanks for any insight,
> > Michael
> > 
> 
> The answer is that it doesn't, and it will train tokens based on your 
> users forward, which is bad. You will want to strip out the original 
> message (which should be sent as an attachment) and only learn based on 
> that.
> 
> Ryan Moore
> ----------
> Perigee.net Corporation
> 704-849-8355 (sales)
> 704-849-8017 (tech)
> www.perigee.net
> 


Re: [SPAMASSASSIN-USERS] How does training work on forwarded email?

Posted by Ryan Moore <ry...@perigee.net>.
Michael Shiloh wrote:
> The mystery is, that spam now has a header from my user, who is forwarding the
> spam to the spam mail account. How does SA know to train on the contents, and
> not my legitimate user's header?
> 
> Thanks for any insight,
> Michael
> 

The answer is that it doesn't, and it will train tokens based on your 
users forward, which is bad. You will want to strip out the original 
message (which should be sent as an attachment) and only learn based on 
that.

Ryan Moore
----------
Perigee.net Corporation
704-849-8355 (sales)
704-849-8017 (tech)
www.perigee.net