You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Jonn Taylor <jo...@taylortelephone.com> on 2007/10/25 16:10:31 UTC

sa-learn

Is it ok to send mail to salearn that has been tagged by spamassassin?

Jonn

Re: sa-learn

Posted by Emmanuel Seyman <es...@linagora.com>.
* Jonn Taylor :
>
> Is it ok to send mail to salearn that has been tagged by spamassassin?

Yes (Spamassassin tags will be removed from the mails before they are
learnt from).
http://wiki.apache.org/spamassassin/LearningMarkedUpMessages

Emmanuel


Re: sa-learn

Posted by Matt Kettler <mk...@verizon.net>.
Jonn R Taylor wrote:
>
> Think we got off my original question. All I was asking is if header
> messages like "X-Virus-kav-Scanned: by taylortelephone.com" ,
> "X-Virus-Scanned: by taylortelephone.com" or any other "X-...." header
> that I add to the email will affect how salearn works.
If SA generated it, it will ignore it.

Anything else, use bayes_ignore_header on it.


Re: sa-learn

Posted by Jonn R Taylor <jo...@taylortelephone.com>.
Matt Kettler wrote:
> Matus UHLAR - fantomas wrote:
>> however he must run sa-learn on hams too, otherwise he may get false
>> positives soon...
>>   
> True. I was merely commenting on why it is a good idea to allow mail SA
> has already tagged to be trained. I did not intend to imply this should
> be your sole source of training.
>> The most effective is probably to run sa-learn on false positives and false
>> negatives.
>>   
> The most effective is to run sa-learn on nonspam and spam. Don't
> restrict your training to FPs and FNs. (or did you, like me, mean
> training FPs and FNs as a supplement to more general training?)
> 
> In general, it creates bias in your bayes database when you create any
> kind of artificial restrictions on what you will or will not train, so
> it is best to avoid them where possible. Your decisions should really
> just be "do I consider it spam or not?" Train accordingly. It's just
> that simple.
> 
> The only area I might consider biasing my training in would be in your
> spam to nonspam ratio. SpamAssassin "ideally" works best with a 50/50
> training mix, but is quite tolerant of severe deviations from this.
> (99/1 is more common). If your ratio is severely off, as most folks are,
> you might want to apply a *little* extra effort to get more nonspam
> training. But don't spend a lot of time obsessing over it, I've never
> seen one so imbalanced that it actually caused problems. In general, its
> more important to have fresh training than well balanced training. As
> long as there's a reasonably fresh feed of both spam and nonspam, you
> should be fine.

Think we got off my original question. All I was asking is if header 
messages like "X-Virus-kav-Scanned: by taylortelephone.com" , 
"X-Virus-Scanned: by taylortelephone.com" or any other "X-...." header 
that I add to the email will affect how salearn works.

Jonn

Re: sa-learn

Posted by Matt Kettler <mk...@verizon.net>.
Matus UHLAR - fantomas wrote:
>
> however he must run sa-learn on hams too, otherwise he may get false
> positives soon...
>   
True. I was merely commenting on why it is a good idea to allow mail SA
has already tagged to be trained. I did not intend to imply this should
be your sole source of training.
> The most effective is probably to run sa-learn on false positives and false
> negatives.
>   
The most effective is to run sa-learn on nonspam and spam. Don't
restrict your training to FPs and FNs. (or did you, like me, mean
training FPs and FNs as a supplement to more general training?)

In general, it creates bias in your bayes database when you create any
kind of artificial restrictions on what you will or will not train, so
it is best to avoid them where possible. Your decisions should really
just be "do I consider it spam or not?" Train accordingly. It's just
that simple.

The only area I might consider biasing my training in would be in your
spam to nonspam ratio. SpamAssassin "ideally" works best with a 50/50
training mix, but is quite tolerant of severe deviations from this.
(99/1 is more common). If your ratio is severely off, as most folks are,
you might want to apply a *little* extra effort to get more nonspam
training. But don't spend a lot of time obsessing over it, I've never
seen one so imbalanced that it actually caused problems. In general, its
more important to have fresh training than well balanced training. As
long as there's a reasonably fresh feed of both spam and nonspam, you
should be fine.

Re: sa-learn

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
> Jonn Taylor wrote:
> > Is it ok to send mail to salearn that has been tagged by spamassassin?
> Yes, that's fine.

On 25.10.07 20:28, Matt Kettler wrote:
>  In fact, you *should* send them to sa-learn if they haven't triggered
> the autolearner. Even if they've already hit bayes_99 there's still
> likely to be some token information to be learned from it.
> 
>  If they have already triggered the autolearner, then the sa-learn won't
> do anything. (it will resist learning the same message twice unless you
> manually wipe the bayes_seen database)

however he must run sa-learn on hams too, otherwise he may get false
positives soon...

The most effective is probably to run sa-learn on false positives and false
negatives.
-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
The 3 biggets disasters: Hiroshima 45, Tschernobyl 86, Windows 95

Re: sa-learn

Posted by Matt Kettler <mk...@verizon.net>.
Jonn Taylor wrote:
> Is it ok to send mail to salearn that has been tagged by spamassassin?
Yes, that's fine.

 In fact, you *should* send them to sa-learn if they haven't triggered
the autolearner. Even if they've already hit bayes_99 there's still
likely to be some token information to be learned from it.

 If they have already triggered the autolearner, then the sa-learn won't
do anything. (it will resist learning the same message twice unless you
manually wipe the bayes_seen database)