You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Sander Holthaus - Orange XL <in...@orangexl.com> on 2005/02/03 01:59:21 UTC

Manually training SpamAssassin by forwarding mail

I've been interested in offering customers to train manually train the
SpamAssassin Bayes filter for ham and spam (to reduce false positives and
negatives). However, I can only find documentation to this for local
mailboxes and IMAP. Most users however, retrieve their mail through POP and
use Outlook (Express) as mail client. Is there a way to train SpamAssassin
with such a setup (e.g. forwarding mail with Outlook (Express) using SMTP)? 
 
Kind Regards,
Sander Holthaus

RE: Manually training SpamAssassin by forwarding mail

Posted by Sander Holthaus - Orange XL <in...@orangexl.com>.
> At 07:59 PM 2/2/2005, Sander Holthaus - Orange XL wrote:
> >I've been interested in offering customers to train manually 
> train the 
> >SpamAssassin Bayes filter for ham and spam (to reduce false 
> positives 
> >and negatives). However, I can only find documentation to this for 
> >local mailboxes and IMAP. Most users however, retrieve their mail 
> >through POP and use Outlook (Express) as mail client. Is 
> there a way to 
> >train SpamAssassin with such a setup (e.g. forwarding mail 
> with Outlook
> >(Express) using SMTP)?
> >

Matt Kettler wrote:
> 
> Only if you can somehow get the users to forward an 
> un-mangled message, 
> complete with original headers, as an attachment. You can then have a 
> script strip off the attachments and feed those to sa-learn.
> 
> The fundamental problem with normal forwarding is that from a SA 
> perspective, the forwarded message looks very little like the 
> original. New 
> headers, different encoding, extra text often added to the 
> body, superflous 
> mime sections dropped, others added.
> 
> Since SA learns from the message headers and some of the 
> message encoding 
> has an impact on learning, these changes cause problems.. 
>

Will Yardley wrote:
> There are various schemes to do this; the tricky part is 
> getting people to submit emails in a consistent format - if 
> you can get them to forward them as mesage/rfc822 
> attachments, it probably wouldn't be too hard to write a 
> program to extract them and train... I imagine this would be 
> too complicated for many users, though.
> 
> One scheme that we've used is to have specially named IMAP 
> folders that users can place mis-classified emails in for 
> training.. then you can have a server-side robot which trains 
> the filter and then discards the emails.


Thanks, I figured that that would a the problem. Makes it pretty hard to
impossible to create such a system for average users. I was hoping that
SpamAssassin would include a system simiar to DSPAM.

On the side, if I would get such a system working (where users are able to
forward emails untouched and I am able to extract those messages to
sa-learn), could I expect problem with some locally added headers? For
instance, added headers when the message passes though a local anti-spam or
anti-virus proxy. Or in case of IMAP, when users flag messages (or if they
are automatically flagged) before moving them to a learn-ham / learn-spam
folder?

Kind Regards,
Sander Holthaus


Re: Manually training SpamAssassin by forwarding mail

Posted by Matt Kettler <mk...@evi-inc.com>.
At 07:59 PM 2/2/2005, Sander Holthaus - Orange XL wrote:
>I've been interested in offering customers to train manually train the 
>SpamAssassin Bayes filter for ham and spam (to reduce false positives and 
>negatives). However, I can only find documentation to this for local 
>mailboxes and IMAP. Most users however, retrieve their mail through POP 
>and use Outlook (Express) as mail client. Is there a way to train 
>SpamAssassin with such a setup (e.g. forwarding mail with Outlook 
>(Express) using SMTP)?
>

Only if you can somehow get the users to forward an un-mangled message, 
complete with original headers, as an attachment. You can then have a 
script strip off the attachments and feed those to sa-learn.

The fundamental problem with normal forwarding is that from a SA 
perspective, the forwarded message looks very little like the original. New 
headers, different encoding, extra text often added to the body, superflous 
mime sections dropped, others added.

Since SA learns from the message headers and some of the message encoding 
has an impact on learning, these changes cause problems.. 


Re: Manually training SpamAssassin by forwarding mail

Posted by Will Yardley <sa...@veggiechinese.net>.
On Thu, Feb 03, 2005 at 01:59:21AM +0100, Sander Holthaus - Orange XL wrote:

> I've been interested in offering customers to train manually train the
> SpamAssassin Bayes filter for ham and spam (to reduce false positives and
> negatives). However, I can only find documentation to this for local
> mailboxes and IMAP. Most users however, retrieve their mail through POP and
> use Outlook (Express) as mail client. Is there a way to train SpamAssassin
> with such a setup (e.g. forwarding mail with Outlook (Express) using SMTP)? 

There are various schemes to do this; the tricky part is getting people
to submit emails in a consistent format - if you can get them to forward
them as mesage/rfc822 attachments, it probably wouldn't be too hard to
write a program to extract them and train... I imagine this would be too
complicated for many users, though.

One scheme that we've used is to have specially named IMAP folders that
users can place mis-classified emails in for training.. then you can
have a server-side robot which trains the filter and then discards the
emails.

RE: Manually training SpamAssassin by forwarding mail

Posted by Sander Holthaus - Orange XL <in...@orangexl.com>.
> --On 02/04/05 16:08:53 +0100 Sander Holthaus - Orange XL wrote:
> > Basically, I've got two option. All mail that is received 
> is backupped 
> > on the mailserver before adding any headers. I could match 
> those with 
> > mail received in the spam-learn and ham-learn accounts. 
> However, mail 
> > is backupped only for a limited amount of time before being moved, 
> > after which the mail-server hasn't got any access to it. So unless 
> > people report mail that found it's way through the filters 
> on a very 
> > regular basis it won't be a full proof sollution.
> 
> You don't really need a 100% solution; something which works 
> 80% of the time would probably be fine.  But you may not want 
> to do the programming needed to automate this.

I don't have the time for it yet, but I should be able t make something in
Perl. Personally, I'm no big fan of the 80% rule in programming as that last
undone 20% usually forms 80% of my problems :-)
 
> > The other option sounds more viable, I would only need to strip off 
> > the X-Scanned-By, X-Spam-* and X-Sanitized headers (which 
> are ignored 
> > in my setup for bayes anyhow), BUT I have no guarentee that the 
> > message is in it's original format. Some MIME-Boundry 
> rewriting may be 
> > done by the mailserver (where necessary), as is converting 8bit to 
> > 7bit where possible. And I think that there are many client-sided 
> > mailfiltering engines, spamscanners and virusscanners out 
> there that 
> > may do some rewriting as well.
> 
> You'll probably find that the various changes don't affect 
> bayes that much. 
> When a re-written message is learned you may make bayes miss 
> email which (in an ideal world) it would have caught, but I 
> think it will tend to classify messages around 50% "I don't 
> know if this is ham or spam" rather than classifying it 
> incorrectly.  And there should be enough unchanged tokens in 
> the messages to let bayes work anyways.
> 
> So I say strip off what you can but don't obsess about the 
> rest.  Feed it into bayes and see how it works, and only try 
> to fix it if you see bayes misclassifying email.

I'm not sure if I know of a good system to check and see if BAYES is
misclassifing, but I should be able to get some of that information from the
logfiles. Perhaps throing away mail that has been rewritten/reformatted
would be a sollution, thouh I don't know if those can be recognized easily.
We'll see :-)

Thanks for all the help and suggestions!

Kind Regards,
Sander Holthaus


RE: Manually training SpamAssassin by forwarding mail

Posted by Kevin Sullivan <ke...@klubkev.org>.
--On 02/04/05 16:08:53 +0100 Sander Holthaus - Orange XL wrote:
> Basically, I've got two option. All mail that is received is backupped on
> the mailserver before adding any headers. I could match those with mail
> received in the spam-learn and ham-learn accounts. However, mail is
> backupped only for a limited amount of time before being moved, after
> which the mail-server hasn't got any access to it. So unless people
> report mail that found it's way through the filters on a very regular
> basis it won't be a full proof sollution.

You don't really need a 100% solution; something which works 80% of the 
time would probably be fine.  But you may not want to do the programming 
needed to automate this.

> The other option sounds more viable, I would only need to strip off the
> X-Scanned-By, X-Spam-* and X-Sanitized headers (which are ignored in my
> setup for bayes anyhow), BUT I have no guarentee that the message is in
> it's original format. Some MIME-Boundry rewriting may be done by the
> mailserver (where necessary), as is converting 8bit to 7bit where
> possible. And I think that there are many client-sided mailfiltering
> engines, spamscanners and virusscanners out there that may do some
> rewriting as well.

You'll probably find that the various changes don't affect bayes that much. 
When a re-written message is learned you may make bayes miss email which 
(in an ideal world) it would have caught, but I think it will tend to 
classify messages around 50% "I don't know if this is ham or spam" rather 
than classifying it incorrectly.  And there should be enough unchanged 
tokens in the messages to let bayes work anyways.

So I say strip off what you can but don't obsess about the rest.  Feed it 
into bayes and see how it works, and only try to fix it if you see bayes 
misclassifying email.

	-Kevin



RE: Manually training SpamAssassin by forwarding mail

Posted by Sander Holthaus - Orange XL <in...@orangexl.com>.
> --On 02/04/05 09:17:55 -0400 Peter Marshall wrote:
> > My question is the same as Henrik, I have a bunch of email that is 
> > spam (either tagged by spam assassin or not tagged at all.  
> I forwared 
> > it as an attachment to a "spam" mail box.  What do I have to do now 
> > before I can get bayes to learn the message ... I read you have to 
> > remove the headers .... Could anyone give me a little more detail ?
> 
> There's no 100% good way to do this; it depends on how the 
> message was mangled by the client (and possibly server).  The 
> only guaranteed way is (as I described) to save a copy at the 
> same point as it is inspected by SpamAssassin so you can use it later.
> 
> That being said, forwarding a message as an attachment will 
> usually preserve the headers pretty well.  The perl MailTools 
> and MIME-tools modules have procedures to pull out 
> attachments and save them in the Unix format which sa-learn wants.
> 
> Sorry I don't have any ready-made scripts for this; my users 
> dump messages into shared IMAP mailboxes which don't need any 
> preprocessing before being fed to sa-learn.
> 
> 	-Kevin

Basically, I've got two option. All mail that is received is backupped on
the mailserver before adding any headers. I could match those with mail
received in the spam-learn and ham-learn accounts. However, mail is
backupped only for a limited amount of time before being moved, after which
the mail-server hasn't got any access to it. So unless people report mail
that found it's way through the filters on a very regular basis it won't be
a full proof sollution.

The other option sounds more viable, I would only need to strip off the
X-Scanned-By, X-Spam-* and X-Sanitized headers (which are ignored in my
setup for bayes anyhow), BUT I have no guarentee that the message is in it's
original format. Some MIME-Boundry rewriting may be done by the mailserver
(where necessary), as is converting 8bit to 7bit where possible. And I think
that there are many client-sided mailfiltering engines, spamscanners and
virusscanners out there that may do some rewriting as well.

>From above, I'm not sure that learning spam-assassin using forwarded
messages that may or may not be in the original format as SpamAssassin
received them the first time is a good idea. But I don't have enough
knowledge of SpamAssassin's internal workings and it's bayes-filter to be
sure...

Kind Regards,
Sander Holthaus


Re: Manually training SpamAssassin by forwarding mail

Posted by Kevin Sullivan <ke...@klubkev.org>.
--On 02/04/05 09:17:55 -0400 Peter Marshall wrote:
> My question is the same as Henrik, I have a bunch of email that is spam
> (either tagged by spam assassin or not tagged at all.  I forwared it as
> an attachment to a "spam" mail box.  What do I have to do now before I
> can get bayes to learn the message ... I read you have to remove the
> headers .... Could anyone give me a little more detail ?

There's no 100% good way to do this; it depends on how the message was 
mangled by the client (and possibly server).  The only guaranteed way is 
(as I described) to save a copy at the same point as it is inspected by 
SpamAssassin so you can use it later.

That being said, forwarding a message as an attachment will usually 
preserve the headers pretty well.  The perl MailTools and MIME-tools 
modules have procedures to pull out attachments and save them in the Unix 
format which sa-learn wants.

Sorry I don't have any ready-made scripts for this; my users dump messages 
into shared IMAP mailboxes which don't need any preprocessing before being 
fed to sa-learn.

	-Kevin

RE: Manually training SpamAssassin by forwarding mail

Posted by Joe Polk <li...@javelinux.com>.
First, I had understood that Bayes can learn previously tagged emails without
stripping Spamassassin tags. Has this changed?

Second, all of my users use a webmail client, though they can use OE if they
wish. It is probably best for them to use IMAP so that server-side scanning
can better be setup. I currently have 2 scripts that run nightly. The first
takes everthing in the user's /home/user/mail/Spam folder and learns it as
spam then empties it. The second does the same for Ham, but moved that mail to
a Cleaned folder. All the user has to do is move untagged spam into Spam and
false-positives into Ham.

--
<<JAV>>


---------- Original Message -----------
From: "Sander Holthaus - Orange XL" <in...@orangexl.com>
To: "'SpamAssassin Users'" <us...@spamassassin.apache.org>
Cc: "'Stuart Johnston'" <st...@ebby.com>, "'Peter Marshall'"
<pe...@caris.com>
Sent: Fri, 4 Feb 2005 19:47:40 +0100
Subject: RE: Manually training SpamAssassin by forwarding mail

> > -----Original Message-----
> > From: Stuart Johnston [mailto:stuart@ebby.com] 
> > Sent: Friday, February 04, 2005 7:35 PM
> > To: Peter Marshall; SpamAssassin Users
> > Subject: Re: Manually training SpamAssassin by forwarding mail
> > 
> > Peter Marshall wrote:
> > > Stuart Johnston wrote:
> > > 
> > >> Peter Marshall wrote:
> > >>
> > >>> Kevin Sullivan wrote:
> > >>>
> > >>>> --On 02/03/05 01:59:21 +0100 Sander Holthaus - Orange XL wrote:
> > >>>>
> > >>>>> I've been interested in offering customers to train 
> > manually train 
> > >>>>> the SpamAssassin Bayes filter for ham and spam (to reduce false 
> > >>>>> positives and negatives). However, I can only find 
> > documentation 
> > >>>>> to this for local mailboxes and IMAP. Most users 
> > however, retrieve 
> > >>>>> their mail through POP and use Outlook (Express) as 
> > mail client. 
> > >>>>> Is there a way to train SpamAssassin with such a setup (e.g. 
> > >>>>> forwarding mail with Outlook
> > >>>>> (Express) using SMTP)?
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> If you want to do a lot of programming, you could save 
> > all incoming 
> > >>>> messages for a few days in a database somewhere.  When a user 
> > >>>> forwards a message to a special "ham" or "spam" mailbox, 
> > you pull 
> > >>>> the message-id from the message and use it to recover 
> > the original 
> > >>>> message from your database.
> > >>>>
> > >>>>     -Kevin
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> My question is the same as Henrik, I have a bunch of 
> > email that is 
> > >>> spam (either tagged by spam assassin or not tagged at all.  I 
> > >>> forwared it as an attachment to a "spam" mail box.  What 
> > do I have 
> > >>> to do now before I can get bayes to learn the message ... 
> > I read you 
> > >>> have to remove the headers .... Could anyone give me a 
> > little more 
> > >>> detail ?
> > >>
> > >>
> > >>
> > >> I use a modified version of the DMZS-sa-learn.pl from: 
> > >> http://www.dmzs.com/tools/files/spam.phtml
> > >>
> > >> When someone forwards a spam to me, I move the message to 
> > a special 
> > >> imap folder that gets processed by the script.  My additions look 
> > >> something like:
> > >>
> > >> use Email::MIME;
> > >> ...
> > >> my $msg = Email::MIME->new($raw_message_body);
> > >>
> > >> my @parts = $msg->parts;
> > >>
> > >> foreach (@parts) {
> > >>   if ($_->content_type =~ m|message/rfc822|) {
> > >>     sa_learn($_->body_raw);
> > >>   }
> > >> }
> > >>
> > >>
> > >> I've tested this with messages forwarded as attachment 
> > from Outlook 
> > >> and Thunderbird.  I'm not sure how effective it is though. 
> >  I'm sure 
> > >> that it still looses something in the translation.  All imap is 
> > >> really the way to go if you can.
> > >>
> > >>
> > >> Stuart Johnston
> > >>
> > >>
> > > But I have no imap .. only pop .. they would forwared (as 
> > attachment) 
> > > to a mailbox, and then I have to run sa-learn ... I assume as root ?
> > > 
> > > Will the stuff you posted work for this setup as well ??
> > > 
> > > Would there be big problems just running it after the forwared as 
> > > attachment. ??
> > 
> > The code I posted only shows how you can extract the attached 
> > spam from the email.  You'll need to write your own code to 
> > integrate it into your particular setup.
> > 
> > BTW, in Outlook, you can easily attach multiple spams to one 
> > message and this code should handle it.
> 
> CTRL-a, right click, "Forward Items" will indeed do the trick.
> 
> > > 
> > > Can users also forwared as attachemtn mail that was sent that was 
> > > already marked as spam ... or is there any advantage to this ?
> > 
> > If you use Bayes auto learn, I suspect that this wouldn't do much. 
> > Otherwise, it might help.
> 
> I would check the headers of the forwarded messages to see if their
> spam-score is above your auto-learning threshold. If it is,
>  relearning is is perhaps quite useless. You might wonder why they 
> received the message anyway
> (I would think that something that is good enough to autolearn is 
> good enough to refuse or discard).
> 
> Kind Regards,
> Sander Holthaus
------- End of Original Message -------


RE: Manually training SpamAssassin by forwarding mail

Posted by Sander Holthaus - Orange XL <in...@orangexl.com>.
 

> -----Original Message-----
> From: Stuart Johnston [mailto:stuart@ebby.com] 
> Sent: Friday, February 04, 2005 7:35 PM
> To: Peter Marshall; SpamAssassin Users
> Subject: Re: Manually training SpamAssassin by forwarding mail
> 
> Peter Marshall wrote:
> > Stuart Johnston wrote:
> > 
> >> Peter Marshall wrote:
> >>
> >>> Kevin Sullivan wrote:
> >>>
> >>>> --On 02/03/05 01:59:21 +0100 Sander Holthaus - Orange XL wrote:
> >>>>
> >>>>> I've been interested in offering customers to train 
> manually train 
> >>>>> the SpamAssassin Bayes filter for ham and spam (to reduce false 
> >>>>> positives and negatives). However, I can only find 
> documentation 
> >>>>> to this for local mailboxes and IMAP. Most users 
> however, retrieve 
> >>>>> their mail through POP and use Outlook (Express) as 
> mail client. 
> >>>>> Is there a way to train SpamAssassin with such a setup (e.g. 
> >>>>> forwarding mail with Outlook
> >>>>> (Express) using SMTP)?
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> If you want to do a lot of programming, you could save 
> all incoming 
> >>>> messages for a few days in a database somewhere.  When a user 
> >>>> forwards a message to a special "ham" or "spam" mailbox, 
> you pull 
> >>>> the message-id from the message and use it to recover 
> the original 
> >>>> message from your database.
> >>>>
> >>>>     -Kevin
> >>>
> >>>
> >>>
> >>>
> >>> My question is the same as Henrik, I have a bunch of 
> email that is 
> >>> spam (either tagged by spam assassin or not tagged at all.  I 
> >>> forwared it as an attachment to a "spam" mail box.  What 
> do I have 
> >>> to do now before I can get bayes to learn the message ... 
> I read you 
> >>> have to remove the headers .... Could anyone give me a 
> little more 
> >>> detail ?
> >>
> >>
> >>
> >> I use a modified version of the DMZS-sa-learn.pl from: 
> >> http://www.dmzs.com/tools/files/spam.phtml
> >>
> >> When someone forwards a spam to me, I move the message to 
> a special 
> >> imap folder that gets processed by the script.  My additions look 
> >> something like:
> >>
> >> use Email::MIME;
> >> ...
> >> my $msg = Email::MIME->new($raw_message_body);
> >>
> >> my @parts = $msg->parts;
> >>
> >> foreach (@parts) {
> >>   if ($_->content_type =~ m|message/rfc822|) {
> >>     sa_learn($_->body_raw);
> >>   }
> >> }
> >>
> >>
> >> I've tested this with messages forwarded as attachment 
> from Outlook 
> >> and Thunderbird.  I'm not sure how effective it is though. 
>  I'm sure 
> >> that it still looses something in the translation.  All imap is 
> >> really the way to go if you can.
> >>
> >>
> >> Stuart Johnston
> >>
> >>
> > But I have no imap .. only pop .. they would forwared (as 
> attachment) 
> > to a mailbox, and then I have to run sa-learn ... I assume as root ?
> > 
> > Will the stuff you posted work for this setup as well ??
> > 
> > Would there be big problems just running it after the forwared as 
> > attachment. ??
> 
> The code I posted only shows how you can extract the attached 
> spam from the email.  You'll need to write your own code to 
> integrate it into your particular setup.
> 
> BTW, in Outlook, you can easily attach multiple spams to one 
> message and this code should handle it.

CTRL-a, right click, "Forward Items" will indeed do the trick.

> > 
> > Can users also forwared as attachemtn mail that was sent that was 
> > already marked as spam ... or is there any advantage to this ?
> 
> If you use Bayes auto learn, I suspect that this wouldn't do much. 
> Otherwise, it might help.

I would check the headers of the forwarded messages to see if their
spam-score is above your auto-learning threshold. If it is, relearning is is
perhaps quite useless. You might wonder why they received the message anyway
(I would think that something that is good enough to autolearn is good
enough to refuse or discard).

Kind Regards,
Sander Holthaus


Re: Manually training SpamAssassin by forwarding mail

Posted by Stuart Johnston <st...@ebby.com>.
Peter Marshall wrote:
> Stuart Johnston wrote:
> 
>> Peter Marshall wrote:
>>
>>> Kevin Sullivan wrote:
>>>
>>>> --On 02/03/05 01:59:21 +0100 Sander Holthaus - Orange XL wrote:
>>>>
>>>>> I've been interested in offering customers to train manually train the
>>>>> SpamAssassin Bayes filter for ham and spam (to reduce false 
>>>>> positives and
>>>>> negatives). However, I can only find documentation to this for local
>>>>> mailboxes and IMAP. Most users however, retrieve their mail through 
>>>>> POP
>>>>> and use Outlook (Express) as mail client. Is there a way to train
>>>>> SpamAssassin with such a setup (e.g. forwarding mail with Outlook
>>>>> (Express) using SMTP)?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> If you want to do a lot of programming, you could save all incoming 
>>>> messages for a few days in a database somewhere.  When a user 
>>>> forwards a message to a special "ham" or "spam" mailbox, you pull 
>>>> the message-id from the message and use it to recover the original 
>>>> message from your database.
>>>>
>>>>     -Kevin
>>>
>>>
>>>
>>>
>>> My question is the same as Henrik, I have a bunch of email that is 
>>> spam (either tagged by spam assassin or not tagged at all.  I 
>>> forwared it as an attachment to a "spam" mail box.  What do I have to 
>>> do now before I can get bayes to learn the message ... I read you 
>>> have to remove the headers .... Could anyone give me a little more 
>>> detail ?
>>
>>
>>
>> I use a modified version of the DMZS-sa-learn.pl from: 
>> http://www.dmzs.com/tools/files/spam.phtml
>>
>> When someone forwards a spam to me, I move the message to a special 
>> imap folder that gets processed by the script.  My additions look 
>> something like:
>>
>> use Email::MIME;
>> ...
>> my $msg = Email::MIME->new($raw_message_body);
>>
>> my @parts = $msg->parts;
>>
>> foreach (@parts) {
>>   if ($_->content_type =~ m|message/rfc822|) {
>>     sa_learn($_->body_raw);
>>   }
>> }
>>
>>
>> I've tested this with messages forwarded as attachment from Outlook 
>> and Thunderbird.  I'm not sure how effective it is though.  I'm sure 
>> that it still looses something in the translation.  All imap is really 
>> the way to go if you can.
>>
>>
>> Stuart Johnston
>>
>>
> But I have no imap .. only pop .. they would forwared (as attachment) to 
> a mailbox, and then I have to run sa-learn ... I assume as root ?
> 
> Will the stuff you posted work for this setup as well ??
> 
> Would there be big problems just running it after the forwared as 
> attachment. ??

The code I posted only shows how you can extract the attached spam from 
the email.  You'll need to write your own code to integrate it into your 
particular setup.

BTW, in Outlook, you can easily attach multiple spams to one message and 
this code should handle it.

> 
> Can users also forwared as attachemtn mail that was sent that was 
> already marked as spam ... or is there any advantage to this ?

If you use Bayes auto learn, I suspect that this wouldn't do much. 
Otherwise, it might help.


Stuart Johnston

Re: Manually training SpamAssassin by forwarding mail

Posted by Peter Marshall <pe...@caris.com>.
Stuart Johnston wrote:
> Peter Marshall wrote:
> 
>> Kevin Sullivan wrote:
>>
>>> --On 02/03/05 01:59:21 +0100 Sander Holthaus - Orange XL wrote:
>>>
>>>> I've been interested in offering customers to train manually train the
>>>> SpamAssassin Bayes filter for ham and spam (to reduce false 
>>>> positives and
>>>> negatives). However, I can only find documentation to this for local
>>>> mailboxes and IMAP. Most users however, retrieve their mail through POP
>>>> and use Outlook (Express) as mail client. Is there a way to train
>>>> SpamAssassin with such a setup (e.g. forwarding mail with Outlook
>>>> (Express) using SMTP)?
>>>
>>>
>>>
>>>
>>> If you want to do a lot of programming, you could save all incoming 
>>> messages for a few days in a database somewhere.  When a user 
>>> forwards a message to a special "ham" or "spam" mailbox, you pull the 
>>> message-id from the message and use it to recover the original 
>>> message from your database.
>>>
>>>     -Kevin
>>
>>
>>
>> My question is the same as Henrik, I have a bunch of email that is 
>> spam (either tagged by spam assassin or not tagged at all.  I forwared 
>> it as an attachment to a "spam" mail box.  What do I have to do now 
>> before I can get bayes to learn the message ... I read you have to 
>> remove the headers .... Could anyone give me a little more detail ?
> 
> 
> I use a modified version of the DMZS-sa-learn.pl from: 
> http://www.dmzs.com/tools/files/spam.phtml
> 
> When someone forwards a spam to me, I move the message to a special imap 
> folder that gets processed by the script.  My additions look something 
> like:
> 
> use Email::MIME;
> ...
> my $msg = Email::MIME->new($raw_message_body);
> 
> my @parts = $msg->parts;
> 
> foreach (@parts) {
>   if ($_->content_type =~ m|message/rfc822|) {
>     sa_learn($_->body_raw);
>   }
> }
> 
> 
> I've tested this with messages forwarded as attachment from Outlook and 
> Thunderbird.  I'm not sure how effective it is though.  I'm sure that it 
> still looses something in the translation.  All imap is really the way 
> to go if you can.
> 
> 
> Stuart Johnston
> 
> 
But I have no imap .. only pop .. they would forwared (as attachment) to 
a mailbox, and then I have to run sa-learn ... I assume as root ?

Will the stuff you posted work for this setup as well ??

Would there be big problems just running it after the forwared as 
attachment. ??

Can users also forwared as attachemtn mail that was sent that was 
already marked as spam ... or is there any advantage to this ?

Thanks,
Peter

RE: Manually training SpamAssassin by forwarding mail

Posted by Sander Holthaus - Orange XL <in...@orangexl.com>.
 

> -----Original Message-----
> From: Stuart Johnston [mailto:stuart@ebby.com] 
> Sent: Friday, February 04, 2005 5:20 PM
> To: users@spamassassin.apache.org
> Cc: Peter Marshall
> Subject: Re: Manually training SpamAssassin by forwarding mail
> 
> Peter Marshall wrote:
> > Kevin Sullivan wrote:
> > 
> >> --On 02/03/05 01:59:21 +0100 Sander Holthaus - Orange XL wrote:
> >>
> >>> I've been interested in offering customers to train 
> manually train 
> >>> the SpamAssassin Bayes filter for ham and spam (to reduce false 
> >>> positives and negatives). However, I can only find 
> documentation to 
> >>> this for local mailboxes and IMAP. Most users however, retrieve 
> >>> their mail through POP and use Outlook (Express) as mail 
> client. Is 
> >>> there a way to train SpamAssassin with such a setup (e.g. 
> forwarding 
> >>> mail with Outlook
> >>> (Express) using SMTP)?
> >>
> >>
> >>
> >> If you want to do a lot of programming, you could save all 
> incoming 
> >> messages for a few days in a database somewhere.  When a user 
> >> forwards a message to a special "ham" or "spam" mailbox, 
> you pull the 
> >> message-id from the message and use it to recover the original 
> >> message from your database.
> >>
> >>     -Kevin
> > 
> > 
> > My question is the same as Henrik, I have a bunch of email that is 
> > spam (either tagged by spam assassin or not tagged at all.  
> I forwared 
> > it as an attachment to a "spam" mail box.  What do I have to do now 
> > before I can get bayes to learn the message ... I read you have to 
> > remove the headers .... Could anyone give me a little more detail ?
> 
> I use a modified version of the DMZS-sa-learn.pl from: 
> http://www.dmzs.com/tools/files/spam.phtml
> 
> When someone forwards a spam to me, I move the message to a 
> special imap folder that gets processed by the script.  My 
> additions look something like:
> 
> use Email::MIME;
> ...
> my $msg = Email::MIME->new($raw_message_body);
> 
> my @parts = $msg->parts;
> 
> foreach (@parts) {
>    if ($_->content_type =~ m|message/rfc822|) {
>      sa_learn($_->body_raw);
>    }
> }
> 
> 
> I've tested this with messages forwarded as attachment from 
> Outlook and Thunderbird.  I'm not sure how effective it is 
> though.  I'm sure that it still looses something in the 
> translation.  All imap is really the way to go if you can.
> 
> 
> Stuart Johnston

Would it be an idea to stip the delivered to-header from the message, as
this will have no meaning to distinct between ham/spam? 

Also, I was wondering if anybody who is using spam-learn and ham-learn has
any protection build in to stop non-system users from mailing to those
addresses? 

Kind Regards,
Sander Holthaus


Re: Manually training SpamAssassin by forwarding mail

Posted by Stuart Johnston <st...@ebby.com>.
Peter Marshall wrote:
> Kevin Sullivan wrote:
> 
>> --On 02/03/05 01:59:21 +0100 Sander Holthaus - Orange XL wrote:
>>
>>> I've been interested in offering customers to train manually train the
>>> SpamAssassin Bayes filter for ham and spam (to reduce false positives 
>>> and
>>> negatives). However, I can only find documentation to this for local
>>> mailboxes and IMAP. Most users however, retrieve their mail through POP
>>> and use Outlook (Express) as mail client. Is there a way to train
>>> SpamAssassin with such a setup (e.g. forwarding mail with Outlook
>>> (Express) using SMTP)?
>>
>>
>>
>> If you want to do a lot of programming, you could save all incoming 
>> messages for a few days in a database somewhere.  When a user forwards 
>> a message to a special "ham" or "spam" mailbox, you pull the 
>> message-id from the message and use it to recover the original message 
>> from your database.
>>
>>     -Kevin
> 
> 
> My question is the same as Henrik, I have a bunch of email that is spam 
> (either tagged by spam assassin or not tagged at all.  I forwared it as 
> an attachment to a "spam" mail box.  What do I have to do now before I 
> can get bayes to learn the message ... I read you have to remove the 
> headers .... Could anyone give me a little more detail ?

I use a modified version of the DMZS-sa-learn.pl from: 
http://www.dmzs.com/tools/files/spam.phtml

When someone forwards a spam to me, I move the message to a special imap 
folder that gets processed by the script.  My additions look something like:

use Email::MIME;
...
my $msg = Email::MIME->new($raw_message_body);

my @parts = $msg->parts;

foreach (@parts) {
   if ($_->content_type =~ m|message/rfc822|) {
     sa_learn($_->body_raw);
   }
}


I've tested this with messages forwarded as attachment from Outlook and 
Thunderbird.  I'm not sure how effective it is though.  I'm sure that it 
still looses something in the translation.  All imap is really the way 
to go if you can.


Stuart Johnston

Re: Manually training SpamAssassin by forwarding mail

Posted by Peter Marshall <pe...@caris.com>.
Kevin Sullivan wrote:

> --On 02/03/05 01:59:21 +0100 Sander Holthaus - Orange XL wrote:
>
>> I've been interested in offering customers to train manually train the
>> SpamAssassin Bayes filter for ham and spam (to reduce false positives 
>> and
>> negatives). However, I can only find documentation to this for local
>> mailboxes and IMAP. Most users however, retrieve their mail through POP
>> and use Outlook (Express) as mail client. Is there a way to train
>> SpamAssassin with such a setup (e.g. forwarding mail with Outlook
>> (Express) using SMTP)?
>
>
> If you want to do a lot of programming, you could save all incoming 
> messages for a few days in a database somewhere.  When a user forwards 
> a message to a special "ham" or "spam" mailbox, you pull the 
> message-id from the message and use it to recover the original message 
> from your database.
>
>     -Kevin

My question is the same as Henrik, I have a bunch of email that is spam 
(either tagged by spam assassin or not tagged at all.  I forwared it as 
an attachment to a "spam" mail box.  What do I have to do now before I 
can get bayes to learn the message ... I read you have to remove the 
headers .... Could anyone give me a little more detail ?

Thanks,
Peter

Re: Manually training SpamAssassin by forwarding mail

Posted by Kevin Sullivan <ke...@klubkev.org>.
--On 02/03/05 01:59:21 +0100 Sander Holthaus - Orange XL wrote:
> I've been interested in offering customers to train manually train the
> SpamAssassin Bayes filter for ham and spam (to reduce false positives and
> negatives). However, I can only find documentation to this for local
> mailboxes and IMAP. Most users however, retrieve their mail through POP
> and use Outlook (Express) as mail client. Is there a way to train
> SpamAssassin with such a setup (e.g. forwarding mail with Outlook
> (Express) using SMTP)?

If you want to do a lot of programming, you could save all incoming 
messages for a few days in a database somewhere.  When a user forwards a 
message to a special "ham" or "spam" mailbox, you pull the message-id from 
the message and use it to recover the original message from your database.

	-Kevin