You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Jack Gostl <go...@argoscomp.com> on 2007/02/03 14:12:36 UTC

Bayes resolution gettin weaker

I've been watching this for awhile, and there is now a pattern to what I'm seeing.

I'm running a configuration with multiple users sharing a bayes files. This is an interim move to facilitate the spamassassin upgrades, and like many interim moves its been going on for a long time.

When I first build the bayes files from my personal folders and my spam archives, things were great. 99.8% of the spam caught or better. Then, usually after a week or so, the number starts to drop. Right now, its down to 97%, in another day or two it will be down below 95%. With the amount of spam we receive, that is a lot of missed junk mail.

So I blow away my bayes* files, rebuild, and I'm back up to darn near 100% caught. For about a week. Then the deterioration begins again.

Has anyone else encountered this? Is this an artifact of too many users sharing a spam file?

Also.... I retrain each night, feeding any missed spams plus any new hams received back through sa-learn. I can't see how that makes it worse, but who knows.

Thanks - Jack

Re: Bayes resolution gettin weaker

Posted by Duane Hill <d....@yournetplus.com>.

Gene Heskett wrote:
> On Monday 12 February 2007 13:35, Jim Maul wrote:
>> Jack Gostl wrote:
>>> Well... I'm convinced. I turned off autolearn a week ago, and things
>>> have never been smoother. Its a shame really, that's a nice feature,
>>> but for some reason it waters down the Bayes resolution until its
>>> almost useless.
>> Most likely because the autolearn thresholds are too generous.  The
>> possibility to autolearn spam as ham and/or ham as spam is too great.  I
>> have been running with autolearn enabled, my thresholds set to:
>>
>> bayes_auto_learn_threshold_nonspam -0.1
>> bayes_auto_learn_threshold_spam 12.0
>>
> Where are these rules located?

You would define them in the local.cf. By default there will not be 
anything in there (I believe).

See:
http://spamassassin.apache.org/full/3.1.x/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html

for the default settings that are taken when not specified.

>> without any problems for almost 3 years now.  My bayes database has
>> never been better.  I think too many people have problems with it
>> because of the defaults and instead of trying to figure out how to make
>> it work better, they just turn it off and call it "broken".
>>
>> -Jim
>>
>>> ----- Original Message ----- From: "Jack Gostl" <go...@argoscomp.com>
>>> To: "Anthony Peacock" <a....@chime.ucl.ac.uk>; "SpamAssassin"
>>> <us...@spamassassin.apache.org>
>>> Sent: Monday, February 05, 2007 7:06 AM
>>> Subject: Re: Bayes resolution gettin weaker
>>>
>>>> ----- Original Message ----- From: "Anthony Peacock"
>>>> <a....@chime.ucl.ac.uk>
>>>> To: "SpamAssassin" <us...@spamassassin.apache.org>
>>>> Sent: Monday, February 05, 2007 3:56 AM
>>>> Subject: Re: Bayes resolution gettin weaker
>>>>
>>>>> Hi,
>>>>>
>>>>> Jack Gostl wrote:
>>>>>>>> I've been watching this for awhile, and there is now a pattern to
>>>>>>>> what I'm seeing.
>>>>>>>>
>>>>>>>> I'm running a configuration with multiple users sharing a bayes
>>>>>>>> files. This is an interim move to facilitate the spamassassin
>>>>>>>> upgrades, and like many interim moves its been going on for a
>>>>>>>> long time.
>>>>>>>>
>>>>>>>> When I first build the bayes files from my personal folders and
>>>>>>>> my spam archives, things were great. 99.8% of the spam caught or
>>>>>>>> better. Then, usually after a week or so, the number starts to
>>>>>>>> drop. Right now, its down to 97%, in another day or two it will
>>>>>>>> be down below 95%. With the amount of spam we receive, that is a
>>>>>>>> lot of missed junk mail.
>>>>>>>>
>>>>>>>> So I blow away my bayes* files, rebuild, and I'm back up to darn
>>>>>>>> near 100% caught. For about a week. Then the deterioration begins
>>>>>>>> again.
>>>>>>>>
>>>>>>>> Has anyone else encountered this? Is this an artifact of too many
>>>>>>>> users sharing a spam file?
>>>>>>>>
>>>>>>>> Also.... I retrain each night, feeding any missed spams plus any
>>>>>>>> new hams received back through sa-learn. I can't see how that
>>>>>>>> makes it worse, but who knows.
>>>>>>> Do you have autolearn enabled?
>>>>>> Uh... yes? You are suggesting that I turn it off? I had always
>>>>>> assumed that if the Bayes learned something as ham that it
>>>>>> shouldn't, sa-learn was smart enough to undo it.
>>>>> Change the thresholds for auto learning.  Mine are:
>>>>>
>>>>> bayes_auto_learn_threshold_nonspam -0.1
>>>>> bayes_auto_learn_threshold_spam 12.0
>>>> I'm willing to try. I made the change in my user_prefs and we'll see
>>>> what the next week brings.
>>>>
>>>> Thanks
>

Re: Bayes resolution gettin weaker

Posted by Gene Heskett <ge...@verizon.net>.

On Monday 12 February 2007 13:35, Jim Maul wrote:
>Jack Gostl wrote:
>> Well... I'm convinced. I turned off autolearn a week ago, and things
>> have never been smoother. Its a shame really, that's a nice feature,
>> but for some reason it waters down the Bayes resolution until its
>> almost useless.
>
>Most likely because the autolearn thresholds are too generous.  The
>possibility to autolearn spam as ham and/or ham as spam is too great.  I
>have been running with autolearn enabled, my thresholds set to:
>
>bayes_auto_learn_threshold_nonspam -0.1
>bayes_auto_learn_threshold_spam 12.0
>
Where are these rules located?

>without any problems for almost 3 years now.  My bayes database has
>never been better.  I think too many people have problems with it
>because of the defaults and instead of trying to figure out how to make
>it work better, they just turn it off and call it "broken".
>
>-Jim
>
>> ----- Original Message ----- From: "Jack Gostl" <go...@argoscomp.com>
>> To: "Anthony Peacock" <a....@chime.ucl.ac.uk>; "SpamAssassin"
>> <us...@spamassassin.apache.org>
>> Sent: Monday, February 05, 2007 7:06 AM
>> Subject: Re: Bayes resolution gettin weaker
>>
>>> ----- Original Message ----- From: "Anthony Peacock"
>>> <a....@chime.ucl.ac.uk>
>>> To: "SpamAssassin" <us...@spamassassin.apache.org>
>>> Sent: Monday, February 05, 2007 3:56 AM
>>> Subject: Re: Bayes resolution gettin weaker
>>>
>>>> Hi,
>>>>
>>>> Jack Gostl wrote:
>>>>>>> I've been watching this for awhile, and there is now a pattern to
>>>>>>> what I'm seeing.
>>>>>>>
>>>>>>> I'm running a configuration with multiple users sharing a bayes
>>>>>>> files. This is an interim move to facilitate the spamassassin
>>>>>>> upgrades, and like many interim moves its been going on for a
>>>>>>> long time.
>>>>>>>
>>>>>>> When I first build the bayes files from my personal folders and
>>>>>>> my spam archives, things were great. 99.8% of the spam caught or
>>>>>>> better. Then, usually after a week or so, the number starts to
>>>>>>> drop. Right now, its down to 97%, in another day or two it will
>>>>>>> be down below 95%. With the amount of spam we receive, that is a
>>>>>>> lot of missed junk mail.
>>>>>>>
>>>>>>> So I blow away my bayes* files, rebuild, and I'm back up to darn
>>>>>>> near 100% caught. For about a week. Then the deterioration begins
>>>>>>> again.
>>>>>>>
>>>>>>> Has anyone else encountered this? Is this an artifact of too many
>>>>>>> users sharing a spam file?
>>>>>>>
>>>>>>> Also.... I retrain each night, feeding any missed spams plus any
>>>>>>> new hams received back through sa-learn. I can't see how that
>>>>>>> makes it worse, but who knows.
>>>>>>
>>>>>> Do you have autolearn enabled?
>>>>>
>>>>> Uh... yes? You are suggesting that I turn it off? I had always
>>>>> assumed that if the Bayes learned something as ham that it
>>>>> shouldn't, sa-learn was smart enough to undo it.
>>>>
>>>> Change the thresholds for auto learning.  Mine are:
>>>>
>>>> bayes_auto_learn_threshold_nonspam -0.1
>>>> bayes_auto_learn_threshold_spam 12.0
>>>
>>> I'm willing to try. I made the change in my user_prefs and we'll see
>>> what the next week brings.
>>>
>>> Thanks

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2007 by Maurice Eugene Heskett, all rights reserved.

Re: Bayes resolution gettin weaker

Posted by Jim Maul <jm...@elih.org>.

Jack Gostl wrote:
> Well... I'm convinced. I turned off autolearn a week ago, and things 
> have never been smoother. Its a shame really, that's a nice feature, but 
> for some reason it waters down the Bayes resolution until its almost 
> useless.
> 

Most likely because the autolearn thresholds are too generous.  The 
possibility to autolearn spam as ham and/or ham as spam is too great.  I 
have been running with autolearn enabled, my thresholds set to:

bayes_auto_learn_threshold_nonspam -0.1
bayes_auto_learn_threshold_spam 12.0

without any problems for almost 3 years now.  My bayes database has 
never been better.  I think too many people have problems with it 
because of the defaults and instead of trying to figure out how to make 
it work better, they just turn it off and call it "broken".

-Jim


> ----- Original Message ----- From: "Jack Gostl" <go...@argoscomp.com>
> To: "Anthony Peacock" <a....@chime.ucl.ac.uk>; "SpamAssassin" 
> <us...@spamassassin.apache.org>
> Sent: Monday, February 05, 2007 7:06 AM
> Subject: Re: Bayes resolution gettin weaker
> 
> 
>>
>> ----- Original Message ----- From: "Anthony Peacock" 
>> <a....@chime.ucl.ac.uk>
>> To: "SpamAssassin" <us...@spamassassin.apache.org>
>> Sent: Monday, February 05, 2007 3:56 AM
>> Subject: Re: Bayes resolution gettin weaker
>>
>>
>>> Hi,
>>>
>>> Jack Gostl wrote:
>>>>>> I've been watching this for awhile, and there is now a pattern to 
>>>>>> what I'm seeing.
>>>>>>
>>>>>> I'm running a configuration with multiple users sharing a bayes 
>>>>>> files. This is an interim move to facilitate the spamassassin 
>>>>>> upgrades, and like many interim moves its been going on for a long 
>>>>>> time.
>>>>>>
>>>>>> When I first build the bayes files from my personal folders and my 
>>>>>> spam archives, things were great. 99.8% of the spam caught or 
>>>>>> better. Then, usually after a week or so, the number starts to 
>>>>>> drop. Right now, its down to 97%, in another day or two it will be 
>>>>>> down below 95%. With the amount of spam we receive, that is a lot 
>>>>>> of missed junk mail.
>>>>>>
>>>>>> So I blow away my bayes* files, rebuild, and I'm back up to darn 
>>>>>> near 100% caught. For about a week. Then the deterioration begins 
>>>>>> again.
>>>>>>
>>>>>> Has anyone else encountered this? Is this an artifact of too many 
>>>>>> users sharing a spam file?
>>>>>>
>>>>>> Also.... I retrain each night, feeding any missed spams plus any 
>>>>>> new hams received back through sa-learn. I can't see how that 
>>>>>> makes it worse, but who knows.
>>>>
>>>>> Do you have autolearn enabled?
>>>>
>>>> Uh... yes? You are suggesting that I turn it off? I had always 
>>>> assumed that if the Bayes learned something as ham that it 
>>>> shouldn't, sa-learn was smart enough to undo it.
>>>
>>> Change the thresholds for auto learning.  Mine are:
>>>
>>> bayes_auto_learn_threshold_nonspam -0.1
>>> bayes_auto_learn_threshold_spam 12.0
>>
>> I'm willing to try. I made the change in my user_prefs and we'll see 
>> what the next week brings.
>>
>> Thanks
>>
> 
> 
> 
>

Re: Bayes resolution gettin weaker

Posted by Jack Gostl <go...@argoscomp.com>.

Well... I'm convinced. I turned off autolearn a week ago, and things have 
never been smoother. Its a shame really, that's a nice feature, but for some 
reason it waters down the Bayes resolution until its almost useless.

----- Original Message ----- 
From: "Jack Gostl" <go...@argoscomp.com>
To: "Anthony Peacock" <a....@chime.ucl.ac.uk>; "SpamAssassin" 
<us...@spamassassin.apache.org>
Sent: Monday, February 05, 2007 7:06 AM
Subject: Re: Bayes resolution gettin weaker


>
> ----- Original Message ----- 
> From: "Anthony Peacock" <a....@chime.ucl.ac.uk>
> To: "SpamAssassin" <us...@spamassassin.apache.org>
> Sent: Monday, February 05, 2007 3:56 AM
> Subject: Re: Bayes resolution gettin weaker
>
>
>> Hi,
>>
>> Jack Gostl wrote:
>>>>> I've been watching this for awhile, and there is now a pattern to what 
>>>>> I'm seeing.
>>>>>
>>>>> I'm running a configuration with multiple users sharing a bayes files. 
>>>>> This is an interim move to facilitate the spamassassin upgrades, and 
>>>>> like many interim moves its been going on for a long time.
>>>>>
>>>>> When I first build the bayes files from my personal folders and my 
>>>>> spam archives, things were great. 99.8% of the spam caught or better. 
>>>>> Then, usually after a week or so, the number starts to drop. Right 
>>>>> now, its down to 97%, in another day or two it will be down below 95%. 
>>>>> With the amount of spam we receive, that is a lot of missed junk mail.
>>>>>
>>>>> So I blow away my bayes* files, rebuild, and I'm back up to darn near 
>>>>> 100% caught. For about a week. Then the deterioration begins again.
>>>>>
>>>>> Has anyone else encountered this? Is this an artifact of too many 
>>>>> users sharing a spam file?
>>>>>
>>>>> Also.... I retrain each night, feeding any missed spams plus any new 
>>>>> hams received back through sa-learn. I can't see how that makes it 
>>>>> worse, but who knows.
>>>
>>>> Do you have autolearn enabled?
>>>
>>> Uh... yes? You are suggesting that I turn it off? I had always assumed 
>>> that if the Bayes learned something as ham that it shouldn't, sa-learn 
>>> was smart enough to undo it.
>>
>> Change the thresholds for auto learning.  Mine are:
>>
>> bayes_auto_learn_threshold_nonspam -0.1
>> bayes_auto_learn_threshold_spam 12.0
>
> I'm willing to try. I made the change in my user_prefs and we'll see what 
> the next week brings.
>
> Thanks
>

Re: Bayes resolution gettin weaker

Posted by Jack Gostl <go...@argoscomp.com>.

----- Original Message ----- 
From: "Anthony Peacock" <a....@chime.ucl.ac.uk>
To: "SpamAssassin" <us...@spamassassin.apache.org>
Sent: Monday, February 05, 2007 3:56 AM
Subject: Re: Bayes resolution gettin weaker


> Hi,
>
> Jack Gostl wrote:
>>>> I've been watching this for awhile, and there is now a pattern to what 
>>>> I'm seeing.
>>>>
>>>> I'm running a configuration with multiple users sharing a bayes files. 
>>>> This is an interim move to facilitate the spamassassin upgrades, and 
>>>> like many interim moves its been going on for a long time.
>>>>
>>>> When I first build the bayes files from my personal folders and my spam 
>>>> archives, things were great. 99.8% of the spam caught or better. Then, 
>>>> usually after a week or so, the number starts to drop. Right now, its 
>>>> down to 97%, in another day or two it will be down below 95%. With the 
>>>> amount of spam we receive, that is a lot of missed junk mail.
>>>>
>>>> So I blow away my bayes* files, rebuild, and I'm back up to darn near 
>>>> 100% caught. For about a week. Then the deterioration begins again.
>>>>
>>>> Has anyone else encountered this? Is this an artifact of too many users 
>>>> sharing a spam file?
>>>>
>>>> Also.... I retrain each night, feeding any missed spams plus any new 
>>>> hams received back through sa-learn. I can't see how that makes it 
>>>> worse, but who knows.
>>
>>> Do you have autolearn enabled?
>>
>> Uh... yes? You are suggesting that I turn it off? I had always assumed 
>> that if the Bayes learned something as ham that it shouldn't, sa-learn 
>> was smart enough to undo it.
>
> Change the thresholds for auto learning.  Mine are:
>
> bayes_auto_learn_threshold_nonspam -0.1
> bayes_auto_learn_threshold_spam 12.0

I'm willing to try. I made the change in my user_prefs and we'll see what 
the next week brings.

Thanks

Re: Bayes resolution gettin weaker

Posted by Anthony Peacock <a....@chime.ucl.ac.uk>.

Hi,

Jack Gostl wrote:
>>> I've been watching this for awhile, and there is now a pattern to 
>>> what I'm seeing.
>>>
>>> I'm running a configuration with multiple users sharing a bayes 
>>> files. This is an interim move to facilitate the spamassassin 
>>> upgrades, and like many interim moves its been going on for a long time.
>>>
>>> When I first build the bayes files from my personal folders and my 
>>> spam archives, things were great. 99.8% of the spam caught or better. 
>>> Then, usually after a week or so, the number starts to drop. Right 
>>> now, its down to 97%, in another day or two it will be down below 
>>> 95%. With the amount of spam we receive, that is a lot of missed junk 
>>> mail.
>>>
>>> So I blow away my bayes* files, rebuild, and I'm back up to darn near 
>>> 100% caught. For about a week. Then the deterioration begins again.
>>>
>>> Has anyone else encountered this? Is this an artifact of too many 
>>> users sharing a spam file?
>>>
>>> Also.... I retrain each night, feeding any missed spams plus any new 
>>> hams received back through sa-learn. I can't see how that makes it 
>>> worse, but who knows.
> 
>> Do you have autolearn enabled?
> 
> Uh... yes? You are suggesting that I turn it off? I had always assumed 
> that if the Bayes learned something as ham that it shouldn't, sa-learn 
> was smart enough to undo it.

Change the thresholds for auto learning.  Mine are:

bayes_auto_learn_threshold_nonspam -0.1
bayes_auto_learn_threshold_spam 12.0


-- 
Anthony Peacock
CHIME, Royal Free & University College Medical School
WWW:    http://www.chime.ucl.ac.uk/~rmhiajp/
"If you have an apple and I have  an apple and we  exchange apples
then you and I will still each have  one apple. But  if you have an
idea and I have an idea and we exchange these ideas, then each of us
will have two ideas." -- George Bernard Shaw

Re: Bayes resolution gettin weaker

Posted by Nigel Frankcom <ni...@blue-canoe.net>.

On Sat, 3 Feb 2007 23:33:29 -0500, "Jack Gostl" <go...@argoscomp.com>
wrote:

>>>I've been watching this for awhile, and there is now a pattern to what I'm 
>>>seeing.
>>>
>>>I'm running a configuration with multiple users sharing a bayes files. 
>>>This is an interim move to facilitate the spamassassin upgrades, and like 
>>>many interim moves its been going on for a long time.
>>>
>>>When I first build the bayes files from my personal folders and my spam 
>>>archives, things were great. 99.8% of the spam caught or better. Then, 
>>>usually after a week or so, the number starts to drop. Right now, its down 
>>>to 97%, in another day or two it will be down below 95%. With the amount 
>>>of spam we receive, that is a lot of missed junk mail.
>>>
>>>So I blow away my bayes* files, rebuild, and I'm back up to darn near 100% 
>>>caught. For about a week. Then the deterioration begins again.
>>>
>>>Has anyone else encountered this? Is this an artifact of too many users 
>>>sharing a spam file?
>>>
>>>Also.... I retrain each night, feeding any missed spams plus any new hams 
>>>received back through sa-learn. I can't see how that makes it worse, but 
>>>who knows.
>
>> Do you have autolearn enabled?
>
>Uh... yes? You are suggesting that I turn it off? I had always assumed that 
>if the Bayes learned something as ham that it shouldn't, sa-learn was smart 
>enough to undo it.
>

It might be worth a try, or at least vary your values higher and lower
for spam/ham. I've had issues with autolearn before and no longer use
it. My mail flow is low enough that manual learning suffices.

Regards

Nigel

Re: Bayes resolution gettin weaker

Posted by Jack Gostl <go...@argoscomp.com>.

>>I've been watching this for awhile, and there is now a pattern to what I'm 
>>seeing.
>>
>>I'm running a configuration with multiple users sharing a bayes files. 
>>This is an interim move to facilitate the spamassassin upgrades, and like 
>>many interim moves its been going on for a long time.
>>
>>When I first build the bayes files from my personal folders and my spam 
>>archives, things were great. 99.8% of the spam caught or better. Then, 
>>usually after a week or so, the number starts to drop. Right now, its down 
>>to 97%, in another day or two it will be down below 95%. With the amount 
>>of spam we receive, that is a lot of missed junk mail.
>>
>>So I blow away my bayes* files, rebuild, and I'm back up to darn near 100% 
>>caught. For about a week. Then the deterioration begins again.
>>
>>Has anyone else encountered this? Is this an artifact of too many users 
>>sharing a spam file?
>>
>>Also.... I retrain each night, feeding any missed spams plus any new hams 
>>received back through sa-learn. I can't see how that makes it worse, but 
>>who knows.

> Do you have autolearn enabled?

Uh... yes? You are suggesting that I turn it off? I had always assumed that 
if the Bayes learned something as ham that it shouldn't, sa-learn was smart 
enough to undo it.

Re: Bayes resolution gettin weaker

Posted by Nigel Frankcom <ni...@blue-canoe.net>.

On Sat, 3 Feb 2007 08:12:36 -0500, "Jack Gostl" <go...@argoscomp.com>
wrote:

>I've been watching this for awhile, and there is now a pattern to what I'm seeing.
>
>I'm running a configuration with multiple users sharing a bayes files. This is an interim move to facilitate the spamassassin upgrades, and like many interim moves its been going on for a long time.
>
>When I first build the bayes files from my personal folders and my spam archives, things were great. 99.8% of the spam caught or better. Then, usually after a week or so, the number starts to drop. Right now, its down to 97%, in another day or two it will be down below 95%. With the amount of spam we receive, that is a lot of missed junk mail.
>
>So I blow away my bayes* files, rebuild, and I'm back up to darn near 100% caught. For about a week. Then the deterioration begins again.
>
>Has anyone else encountered this? Is this an artifact of too many users sharing a spam file?
>
>Also.... I retrain each night, feeding any missed spams plus any new hams received back through sa-learn. I can't see how that makes it worse, but who knows.
>
>Thanks - Jack


Do you have autolearn enabled?