You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Jeffrey Lee <je...@reflex8.com> on 2005/01/26 17:47:09 UTC
sa-learn
I have been using sa-learn religiously with ALL spam and ham on my
server. However, I keep getting repeat spam with low scores. How can I
increase the sa-learn "points"? So that when I learn a message instead
of increasing some point by .1 or .2 it will increase by .5 or .6?
Thanks,
Jeffrey Lee
Re: sa-learn
Posted by Jeffrey Lee <je...@reflex8.com>.
Here is an example header:
X-Spam-Status: No, score=3.0 required=5.0 tests=AWL,CELL_PHONE_FREE,
HTML_90_100,HTML_MESSAGE,HTML_TAG_EXIST_TBODY,HTML_TEXT_AFTER_BODY,
HTML_TEXT_AFTER_HTML,HTML_WEB_BUGS,MIME_HTML_ONLY autolearn=no
version=3.0.2
On Jan 26, 2005, at 10:59 AM, Matt Kettler wrote:
> At 11:47 AM 1/26/2005, Jeffrey Lee wrote:
>> I have been using sa-learn religiously with ALL spam and ham on my
>> server. However, I keep getting repeat spam with low scores. How can
>> I increase the sa-learn "points"? So that when I learn a message
>> instead of increasing some point by .1 or .2 it will increase by .5
>> or .6?
>
> Well, sa-learning a message doesn't really work by increasing the
> "points" of a message, although that's more-or-less the net effect.
>
> In short, you'll want to make sure your inbound messages are hitting
> BAYES_90 or higher, and increase the scores of those rules in your
> local.cf.
>
> Also, while you're at it, check for spam messages matching
> ALL_TRUSTED. If that's happening, check the archives on setting
> trusted_networks manually. That rule should *never* match spam but
> will if SA gets confused by your MTA config.
>
> If the spam messages are consistently hitting BAYES_99, sa-learning
> won't increase the score of that message further, but it does help SA
> recognize subtle changes over time in spam. So keep up the training as
> it will keep slight deviations from driving the bayes scores down and
> causing FN problems that way.
>
> When you sa-learn a message, SA learns that the words in that message
> are more likely to be in spam or ham than it previously new. When new
> messages come in, SA looks at it's database of words and calculates a
> spam probability based on the words in that message. It then matches
> that probability to one of the BAYES_* rules and that causes the score
> impact.
>
>
>
>
Re: sa-learn
Posted by Matt Kettler <mk...@evi-inc.com>.
At 12:08 PM 1/26/2005, Jeffrey Lee wrote:
>I understand that. How then does SA treat messages mainly made up of images?
Hmm, in the context of what, bayes?
SA treats messages all in more-or-less the same fashion. embedded image
based spams are only going to wind up matching bayes if the headers or URIs
are part of bayes's header learning. SA's bayes doesn't learn from general
HTML tags.
Really your best tools against image spams are SURBL (for ones that link
external websites), and DCC or razor (for ones with embedded images).
Also, the HTML percentage rules kick in here, but their scores are pretty
low these days.
Re: sa-learn
Posted by Martin Hepworth <ma...@solid-state-logic.com>.
Jeff
in that you are best to use the URI RBLS from surbl.org.
There are also other rules in www.ruleemporium.com/rules.htm that check
for URI/OEM type stuff, if you haven't already got them.
--
Martin Hepworth
Snr Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300
Jeffrey Lee wrote:
> I understand that. How then does SA treat messages mainly made up of
> images?
>
> On Jan 26, 2005, at 10:59 AM, Matt Kettler wrote:
>
>> At 11:47 AM 1/26/2005, Jeffrey Lee wrote:
>>
>>> I have been using sa-learn religiously with ALL spam and ham on my
>>> server. However, I keep getting repeat spam with low scores. How can
>>> I increase the sa-learn "points"? So that when I learn a message
>>> instead of increasing some point by .1 or .2 it will increase by .5
>>> or .6?
>>
>>
>> Well, sa-learning a message doesn't really work by increasing the
>> "points" of a message, although that's more-or-less the net effect.
>>
>> In short, you'll want to make sure your inbound messages are hitting
>> BAYES_90 or higher, and increase the scores of those rules in your
>> local.cf.
>>
>> Also, while you're at it, check for spam messages matching
>> ALL_TRUSTED. If that's happening, check the archives on setting
>> trusted_networks manually. That rule should *never* match spam but
>> will if SA gets confused by your MTA config.
>>
>> If the spam messages are consistently hitting BAYES_99, sa-learning
>> won't increase the score of that message further, but it does help SA
>> recognize subtle changes over time in spam. So keep up the training as
>> it will keep slight deviations from driving the bayes scores down and
>> causing FN problems that way.
>>
>> When you sa-learn a message, SA learns that the words in that message
>> are more likely to be in spam or ham than it previously new. When new
>> messages come in, SA looks at it's database of words and calculates a
>> spam probability based on the words in that message. It then matches
>> that probability to one of the BAYES_* rules and that causes the score
>> impact.
>>
>>
>>
>>
>
**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.
**********************************************************************
Re: sa-learn
Posted by Jeffrey Lee <je...@reflex8.com>.
I understand that. How then does SA treat messages mainly made up of
images?
On Jan 26, 2005, at 10:59 AM, Matt Kettler wrote:
> At 11:47 AM 1/26/2005, Jeffrey Lee wrote:
>> I have been using sa-learn religiously with ALL spam and ham on my
>> server. However, I keep getting repeat spam with low scores. How can
>> I increase the sa-learn "points"? So that when I learn a message
>> instead of increasing some point by .1 or .2 it will increase by .5
>> or .6?
>
> Well, sa-learning a message doesn't really work by increasing the
> "points" of a message, although that's more-or-less the net effect.
>
> In short, you'll want to make sure your inbound messages are hitting
> BAYES_90 or higher, and increase the scores of those rules in your
> local.cf.
>
> Also, while you're at it, check for spam messages matching
> ALL_TRUSTED. If that's happening, check the archives on setting
> trusted_networks manually. That rule should *never* match spam but
> will if SA gets confused by your MTA config.
>
> If the spam messages are consistently hitting BAYES_99, sa-learning
> won't increase the score of that message further, but it does help SA
> recognize subtle changes over time in spam. So keep up the training as
> it will keep slight deviations from driving the bayes scores down and
> causing FN problems that way.
>
> When you sa-learn a message, SA learns that the words in that message
> are more likely to be in spam or ham than it previously new. When new
> messages come in, SA looks at it's database of words and calculates a
> spam probability based on the words in that message. It then matches
> that probability to one of the BAYES_* rules and that causes the score
> impact.
>
>
>
>
Re: sa-learn
Posted by Matt Kettler <mk...@evi-inc.com>.
At 11:47 AM 1/26/2005, Jeffrey Lee wrote:
>I have been using sa-learn religiously with ALL spam and ham on my server.
>However, I keep getting repeat spam with low scores. How can I increase
>the sa-learn "points"? So that when I learn a message instead of
>increasing some point by .1 or .2 it will increase by .5 or .6?
Well, sa-learning a message doesn't really work by increasing the "points"
of a message, although that's more-or-less the net effect.
In short, you'll want to make sure your inbound messages are hitting
BAYES_90 or higher, and increase the scores of those rules in your local.cf.
Also, while you're at it, check for spam messages matching ALL_TRUSTED. If
that's happening, check the archives on setting trusted_networks manually.
That rule should *never* match spam but will if SA gets confused by your
MTA config.
If the spam messages are consistently hitting BAYES_99, sa-learning won't
increase the score of that message further, but it does help SA recognize
subtle changes over time in spam. So keep up the training as it will keep
slight deviations from driving the bayes scores down and causing FN
problems that way.
When you sa-learn a message, SA learns that the words in that message are
more likely to be in spam or ham than it previously new. When new messages
come in, SA looks at it's database of words and calculates a spam
probability based on the words in that message. It then matches that
probability to one of the BAYES_* rules and that causes the score impact.