You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Hendrik Haddorp <he...@gmx.net> on 2018/02/11 18:09:32 UTC

sa-learn

Hi,

I have a maildir with about 20000 mails. In the past this does not seem 
to have been a problem. But since a few weeks my sa-learn process dies 
with an OOM now. My server has only 1GB of memory with another GB for 
swap. sa-learn is eating up pretty much the complete memory for the run 
and is only able to finish when I stop everything else. Why is sa-learn 
using more and more memory even when it learned all those messages 
already in the past? Is there a way to limit the memory usage except 
from making the set of messages smaller?
My problem sounds somewhat like 
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=5141

regards,
Hendrik

Re: sa-learn

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
>>On 11.02.18 19:09, Hendrik Haddorp wrote:
>>>I have a maildir with about 20000 mails. In the past this does 
>>>not seem to have been a problem. But since a few weeks my 
>>>sa-learn process dies with an OOM now.

On 11.02.18 20:10, Hendrik Haddorp wrote:
>so far I was always letting it run once a week over my inbox in --ham 
>mode and over my spam folder in --spam mode. all tutorials I saw did 
>it the same way. this also worked for years but likely with less mail 
>files. I was under the impression that sa-learn would skip messages 
>that it already learned. the debug log also indicated that it 
>recognized those.

The problem with this approach is that all those messages must be opened,
read from, parsed and only then it's possible to find out they have been
already trained so they can be skipped.

even if there's a memory bug in sa-learn and it can be fixed, it's still
very inefficient.

Luckily you have been advised a better approaches. Good luck.

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
I'm not interested in your website anymore.
If you need cookies, bake them yourself.

Re: sa-learn

Posted by Hendrik Haddorp <he...@gmx.net>.
thanks, I'll give that a try

On 11.02.2018 20:15, Reindl Harald wrote:
>
>
> Am 11.02.2018 um 20:10 schrieb Hendrik Haddorp:
>> so far I was always letting it run once a week over my inbox in --ham 
>> mode and over my spam folder in --spam mode. all tutorials I saw did 
>> it the same way. this also worked for years but likely with less mail 
>> files. I was under the impression that sa-learn would skip messages 
>> that it already learned. the debug log also indicated that it 
>> recognized those.
>
> but to recognize it needs to read them
>
> man find
> man xargs
>
> find "$SA_MILTER_HOME/training/spam/" -type f -mtime -$TRAIN_DAYS | 
> xargs -r sa-learn --max-size=0 --no-sync --spam
> find "$SA_MILTER_HOME/training/ham/" -type f -mtime -$TRAIN_DAYS | 
> xargs -r sa-learn --max-size=0 --no-sync --ham
>
>> On 11.02.2018 19:44, Matus UHLAR - fantomas wrote:
>>> On 11.02.18 19:09, Hendrik Haddorp wrote:
>>>> I have a maildir with about 20000 mails. In the past this does not 
>>>> seem to have been a problem. But since a few weeks my sa-learn 
>>>> process dies with an OOM now.
>>>
>>> do you run sa-learn over whole maildir all the time?
>>> why?
>>>
>>>> My server has only 1GB of memory with another GB for swap. sa-learn 
>>>> is eating up pretty much the complete memory for the run and is 
>>>> only able to finish when I stop everything else. Why is sa-learn 
>>>> using more and more memory even when it learned all those messages 
>>>> already in the past? Is there a way to limit the memory usage 
>>>> except from making the set of messages smaller?
>>>
>>> you are not supposed to repeatedly call sa-learn over huge maildir.
>>>
>>> calling over new mail (or, better, false-positives and 
>>> false-negatives) is
>>> faster and won't eat all your memory
>


Re: sa-learn

Posted by Hendrik Haddorp <he...@gmx.net>.
so far I was always letting it run once a week over my inbox in --ham 
mode and over my spam folder in --spam mode. all tutorials I saw did it 
the same way. this also worked for years but likely with less mail 
files. I was under the impression that sa-learn would skip messages that 
it already learned. the debug log also indicated that it recognized those.

On 11.02.2018 19:44, Matus UHLAR - fantomas wrote:
> On 11.02.18 19:09, Hendrik Haddorp wrote:
>> I have a maildir with about 20000 mails. In the past this does not 
>> seem to have been a problem. But since a few weeks my sa-learn 
>> process dies with an OOM now.
>
> do you run sa-learn over whole maildir all the time?
> why?
>
>> My server has only 1GB of memory with another GB for swap. sa-learn 
>> is eating up pretty much the complete memory for the run and is only 
>> able to finish when I stop everything else. Why is sa-learn using 
>> more and more memory even when it learned all those messages already 
>> in the past? Is there a way to limit the memory usage except from 
>> making the set of messages smaller?
>
> you are not supposed to repeatedly call sa-learn over huge maildir.
>
> calling over new mail (or, better, false-positives and 
> false-negatives) is
> faster and won't eat all your memory.
>


Re: sa-learn

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
On 11.02.18 19:09, Hendrik Haddorp wrote:
>I have a maildir with about 20000 mails. In the past this does not 
>seem to have been a problem. But since a few weeks my sa-learn 
>process dies with an OOM now.

do you run sa-learn over whole maildir all the time?
why?

> My server has only 1GB of memory with 
>another GB for swap. sa-learn is eating up pretty much the complete 
>memory for the run and is only able to finish when I stop everything 
>else. Why is sa-learn using more and more memory even when it learned 
>all those messages already in the past? Is there a way to limit the 
>memory usage except from making the set of messages smaller?

you are not supposed to repeatedly call sa-learn over huge maildir.

calling over new mail (or, better, false-positives and false-negatives) is
faster and won't eat all your memory.

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
We are but packets in the Internet of life (userfriendly.org)

Re: sa-learn

Posted by RW <rw...@googlemail.com>.
On Sun, 11 Feb 2018 19:09:32 +0100
Hendrik Haddorp wrote:

> Hi,
> 
> I have a maildir with about 20000 mails. In the past this does not
> seem to have been a problem. But since a few weeks my sa-learn
> process dies with an OOM now. My server has only 1GB of memory with
> another GB for swap. sa-learn is eating up pretty much the complete
> memory for the run and is only able to finish when I stop everything
> else. Why is sa-learn using more and more memory even when it learned
> all those messages already in the past? 

I don't know, it sounds like a bug.

This is a bit of a long shot, but try tuning-off autoexpiry if you are
using it. 

Re: sa-learn

Posted by Hendrik Haddorp <he...@gmx.net>.
it's a small hosted VM running fine for years.

On 11.02.2018 19:35, Reindl Harald wrote:
>
>
> Am 11.02.2018 um 19:09 schrieb Hendrik Haddorp:
>> I have a maildir with about 20000 mails. In the past this does not 
>> seem to have been a problem. But since a few weeks my sa-learn 
>> process dies with an OOM now. My server has only 1GB of memory with 
>> another GB for swap. sa-learn is eating up pretty much the complete 
>> memory for the run and is only able to finish when I stop everything 
>> else. Why is sa-learn using more and more memory even when it learned 
>> all those messages already in the past? Is there a way to limit the 
>> memory usage except from making the set of messages smaller?
>
> from where did you get a machine with 1 GB in the last decade?
> below 1.5 GB i don't even deploy a golden-master VM
>
>> My problem sounds somewhat like 
>> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=5141
> probably - but my coropus is 1500000 mails large, two bayes 
> (sa-builtin and bogofilter) with 425 MB living in tmpfs and so in 
> memory rsyned at boot/shutdown to a persistent location
>
> clamav needs some hundret MB
> dns-cache needs some memory
>
> sorry, but 1 GB is not suitebale in 2018