You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Steven Scotten <vg...@gmail.com> on 2006/07/29 00:31:15 UTC

sa-learn killed, bayes not available

The bayesian filter seems super-delicate. If I run sa-learn on a
mailbox with more than about 200 messages in it, it gets killed, I'm
not sure why:

$ sa-learn --spam --dir Maildir/.spam/cur/
Killed
$

If sa-learn gets killed in the middle, it leaves a database that it
thinks is empty.

Before a killed process:

debug: bayes: found bayes db version 3
debug: bayes corpus size: nspam = 592, nham = 562

After a killed process:

debug: bayes: found bayes db version 3
debug: bayes: Not available for scanning, only 0 spam(s) in Bayes DB < 200

rescanning doesn't do any good, because sa-learn still knows about the
messages it's already looked at. I have to start training all over by
deleting bayes_seen and bayes_toks. Furthermore, this kills my
bayesian filter and Spamassassin lets through about 75% of my incoming
spam without it.

I've got thousands of spams and hams ready to feed to sa-learn, but
having to feed them 100 at a time is cumbersome and starting over
again a dozen times in the last few days

Other than backing up my .spamassassin directory before I run sa-learn
each time, are there any suggestions? I'm running 3.0.3, but it's a
hosted box so upgrading isn't my call.

Thanks,


Steve
-- 
Steven M. Scotten
<vg...@gmail.com>
The future will blow your mind

Re: sa-learn killed, bayes not available

Posted by Leander Koornneef <le...@ic-s.nl>.
Or perhaps there is some other form of resource control in place.
What's the output of "ulimit -a"?

Leander

On 29-jul-2006, at 14:22, Leander Koornneef wrote:

> It looks like the process is getting killed from an external signal.
> Maybe this is the Linux OOM killer in action? What is the memory/swap
> status of this machine? Have you tried running sa-learn with the -D  
> option?
>
> Leander
>
>
> On 29-jul-2006, at 0:31, Steven Scotten wrote:
>
>> The bayesian filter seems super-delicate. If I run sa-learn on a
>> mailbox with more than about 200 messages in it, it gets killed, I'm
>> not sure why:
>>
>> $ sa-learn --spam --dir Maildir/.spam/cur/
>> Killed
>> $
>>
>> If sa-learn gets killed in the middle, it leaves a database that it
>> thinks is empty.
>>
>> Before a killed process:
>>
>> debug: bayes: found bayes db version 3
>> debug: bayes corpus size: nspam = 592, nham = 562
>>
>> After a killed process:
>>
>> debug: bayes: found bayes db version 3
>> debug: bayes: Not available for scanning, only 0 spam(s) in Bayes  
>> DB < 200
>>
>> rescanning doesn't do any good, because sa-learn still knows about  
>> the
>> messages it's already looked at. I have to start training all over by
>> deleting bayes_seen and bayes_toks. Furthermore, this kills my
>> bayesian filter and Spamassassin lets through about 75% of my  
>> incoming
>> spam without it.
>>
>> I've got thousands of spams and hams ready to feed to sa-learn, but
>> having to feed them 100 at a time is cumbersome and starting over
>> again a dozen times in the last few days
>>
>> Other than backing up my .spamassassin directory before I run sa- 
>> learn
>> each time, are there any suggestions? I'm running 3.0.3, but it's a
>> hosted box so upgrading isn't my call.
>>
>> Thanks,
>>
>>
>> Steve
>> -- 
>> Steven M. Scotten
>> <vg...@gmail.com>
>> The future will blow your mind
>>
>
>


Re: sa-learn killed, bayes not available

Posted by Leander Koornneef <le...@ic-s.nl>.
It looks like the process is getting killed from an external signal.
Maybe this is the Linux OOM killer in action? What is the memory/swap
status of this machine? Have you tried running sa-learn with the -D  
option?

Leander


On 29-jul-2006, at 0:31, Steven Scotten wrote:

> The bayesian filter seems super-delicate. If I run sa-learn on a
> mailbox with more than about 200 messages in it, it gets killed, I'm
> not sure why:
>
> $ sa-learn --spam --dir Maildir/.spam/cur/
> Killed
> $
>
> If sa-learn gets killed in the middle, it leaves a database that it
> thinks is empty.
>
> Before a killed process:
>
> debug: bayes: found bayes db version 3
> debug: bayes corpus size: nspam = 592, nham = 562
>
> After a killed process:
>
> debug: bayes: found bayes db version 3
> debug: bayes: Not available for scanning, only 0 spam(s) in Bayes  
> DB < 200
>
> rescanning doesn't do any good, because sa-learn still knows about the
> messages it's already looked at. I have to start training all over by
> deleting bayes_seen and bayes_toks. Furthermore, this kills my
> bayesian filter and Spamassassin lets through about 75% of my incoming
> spam without it.
>
> I've got thousands of spams and hams ready to feed to sa-learn, but
> having to feed them 100 at a time is cumbersome and starting over
> again a dozen times in the last few days
>
> Other than backing up my .spamassassin directory before I run sa-learn
> each time, are there any suggestions? I'm running 3.0.3, but it's a
> hosted box so upgrading isn't my call.
>
> Thanks,
>
>
> Steve
> -- 
> Steven M. Scotten
> <vg...@gmail.com>
> The future will blow your mind
>