You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Jeffrey Lee <je...@reflex8.com> on 2005/01/26 17:47:09 UTC

sa-learn

I have been using sa-learn religiously with ALL spam and ham on my 
server. However, I keep getting repeat spam with low scores. How can I 
increase the sa-learn "points"? So that when I learn a message instead 
of increasing some point by .1 or .2 it will increase by .5 or .6?

Thanks,
Jeffrey Lee


Re: sa-learn

Posted by Jeffrey Lee <je...@reflex8.com>.
Here is an example header:

X-Spam-Status: No, score=3.0 required=5.0 tests=AWL,CELL_PHONE_FREE, 
HTML_90_100,HTML_MESSAGE,HTML_TAG_EXIST_TBODY,HTML_TEXT_AFTER_BODY, 
HTML_TEXT_AFTER_HTML,HTML_WEB_BUGS,MIME_HTML_ONLY autolearn=no  
version=3.0.2

On Jan 26, 2005, at 10:59 AM, Matt Kettler wrote:

> At 11:47 AM 1/26/2005, Jeffrey Lee wrote:
>> I have been using sa-learn religiously with ALL spam and ham on my 
>> server. However, I keep getting repeat spam with low scores. How can 
>> I increase the sa-learn "points"? So that when I learn a message 
>> instead of increasing some point by .1 or .2 it will increase by .5 
>> or .6?
>
> Well, sa-learning a message doesn't really work by increasing the 
> "points" of a message, although that's more-or-less the net effect.
>
> In short, you'll want to make sure your inbound messages are hitting 
> BAYES_90 or higher, and increase the scores of those rules in your 
> local.cf.
>
> Also, while you're at it, check for spam messages matching 
> ALL_TRUSTED. If that's happening, check the archives on setting 
> trusted_networks manually. That rule should *never* match spam but 
> will if SA gets confused by your MTA config.
>
> If the spam messages are consistently hitting BAYES_99, sa-learning 
> won't increase the score of that message further, but it does help SA 
> recognize subtle changes over time in spam. So keep up the training as 
> it will keep slight deviations from driving the bayes scores down and 
> causing FN problems that way.
>
> When you sa-learn a message, SA learns that the words in that message 
> are more likely to be in spam or ham than it previously new. When new 
> messages come in, SA looks at it's database of words and calculates a 
> spam probability based on the words in that message. It then matches 
> that probability to one of the BAYES_* rules and that causes the score 
> impact.
>
>
>
>


Re: sa-learn

Posted by Matt Kettler <mk...@evi-inc.com>.
At 12:08 PM 1/26/2005, Jeffrey Lee wrote:
>I understand that. How then does SA treat messages mainly made up of images?

Hmm, in the context of what, bayes?

SA treats messages all in more-or-less the same fashion. embedded image 
based spams are only going to wind up matching bayes if the headers or URIs 
are part of bayes's header learning. SA's bayes doesn't learn from general 
HTML tags.

Really your best tools against image spams are SURBL (for ones that link 
external websites), and DCC or razor (for ones with embedded images).

Also, the HTML percentage rules kick in here, but their scores are pretty 
low these days. 


Re: sa-learn

Posted by Martin Hepworth <ma...@solid-state-logic.com>.
Jeff

in that you are best to use the URI RBLS from surbl.org.

There are also other rules in www.ruleemporium.com/rules.htm that check 
for URI/OEM type stuff, if you haven't already got them.

--
Martin Hepworth
Snr Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300


Jeffrey Lee wrote:
> I understand that. How then does SA treat messages mainly made up of 
> images?
> 
> On Jan 26, 2005, at 10:59 AM, Matt Kettler wrote:
> 
>> At 11:47 AM 1/26/2005, Jeffrey Lee wrote:
>>
>>> I have been using sa-learn religiously with ALL spam and ham on my 
>>> server. However, I keep getting repeat spam with low scores. How can 
>>> I increase the sa-learn "points"? So that when I learn a message 
>>> instead of increasing some point by .1 or .2 it will increase by .5 
>>> or .6?
>>
>>
>> Well, sa-learning a message doesn't really work by increasing the 
>> "points" of a message, although that's more-or-less the net effect.
>>
>> In short, you'll want to make sure your inbound messages are hitting 
>> BAYES_90 or higher, and increase the scores of those rules in your 
>> local.cf.
>>
>> Also, while you're at it, check for spam messages matching 
>> ALL_TRUSTED. If that's happening, check the archives on setting 
>> trusted_networks manually. That rule should *never* match spam but 
>> will if SA gets confused by your MTA config.
>>
>> If the spam messages are consistently hitting BAYES_99, sa-learning 
>> won't increase the score of that message further, but it does help SA 
>> recognize subtle changes over time in spam. So keep up the training as 
>> it will keep slight deviations from driving the bayes scores down and 
>> causing FN problems that way.
>>
>> When you sa-learn a message, SA learns that the words in that message 
>> are more likely to be in spam or ham than it previously new. When new 
>> messages come in, SA looks at it's database of words and calculates a 
>> spam probability based on the words in that message. It then matches 
>> that probability to one of the BAYES_* rules and that causes the score 
>> impact.
>>
>>
>>
>>
> 

**********************************************************************

This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.

**********************************************************************


Re: sa-learn

Posted by Jeffrey Lee <je...@reflex8.com>.
I understand that. How then does SA treat messages mainly made up of 
images?

On Jan 26, 2005, at 10:59 AM, Matt Kettler wrote:

> At 11:47 AM 1/26/2005, Jeffrey Lee wrote:
>> I have been using sa-learn religiously with ALL spam and ham on my 
>> server. However, I keep getting repeat spam with low scores. How can 
>> I increase the sa-learn "points"? So that when I learn a message 
>> instead of increasing some point by .1 or .2 it will increase by .5 
>> or .6?
>
> Well, sa-learning a message doesn't really work by increasing the 
> "points" of a message, although that's more-or-less the net effect.
>
> In short, you'll want to make sure your inbound messages are hitting 
> BAYES_90 or higher, and increase the scores of those rules in your 
> local.cf.
>
> Also, while you're at it, check for spam messages matching 
> ALL_TRUSTED. If that's happening, check the archives on setting 
> trusted_networks manually. That rule should *never* match spam but 
> will if SA gets confused by your MTA config.
>
> If the spam messages are consistently hitting BAYES_99, sa-learning 
> won't increase the score of that message further, but it does help SA 
> recognize subtle changes over time in spam. So keep up the training as 
> it will keep slight deviations from driving the bayes scores down and 
> causing FN problems that way.
>
> When you sa-learn a message, SA learns that the words in that message 
> are more likely to be in spam or ham than it previously new. When new 
> messages come in, SA looks at it's database of words and calculates a 
> spam probability based on the words in that message. It then matches 
> that probability to one of the BAYES_* rules and that causes the score 
> impact.
>
>
>
>


Re: sa-learn

Posted by Matt Kettler <mk...@evi-inc.com>.
At 11:47 AM 1/26/2005, Jeffrey Lee wrote:
>I have been using sa-learn religiously with ALL spam and ham on my server. 
>However, I keep getting repeat spam with low scores. How can I increase 
>the sa-learn "points"? So that when I learn a message instead of 
>increasing some point by .1 or .2 it will increase by .5 or .6?

Well, sa-learning a message doesn't really work by increasing the "points" 
of a message, although that's more-or-less the net effect.

In short, you'll want to make sure your inbound messages are hitting 
BAYES_90 or higher, and increase the scores of those rules in your local.cf.

Also, while you're at it, check for spam messages matching ALL_TRUSTED. If 
that's happening, check the archives on setting trusted_networks manually. 
That rule should *never* match spam but will if SA gets confused by your 
MTA config.

If the spam messages are consistently hitting BAYES_99, sa-learning won't 
increase the score of that message further, but it does help SA recognize 
subtle changes over time in spam. So keep up the training as it will keep 
slight deviations from driving the bayes scores down and causing FN 
problems that way.

When you sa-learn a message, SA learns that the words in that message are 
more likely to be in spam or ham than it previously new. When new messages 
come in, SA looks at it's database of words and calculates a spam 
probability based on the words in that message. It then matches that 
probability to one of the BAYES_* rules and that causes the score impact.