You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Ingo Reinhart <i....@dung.de> on 2005/05/19 08:42:48 UTC

sa-learn and big messages

Hello!

If I commit a big mail (32 MB) to sa-learn it need long time. I must wait 50 
sec. and the sa-learn process need 332 MB RAM.

What can I do for faster proceed?

Ingo



Re: sa-learn and big messages

Posted by jdow <jd...@earthlink.net>.
From: "Ingo Reinhart" <i....@dung.de>

> Hello!
>
> If I commit a big mail (32 MB) to sa-learn it need long time. I must wait
50
> sec. and the sa-learn process need 332 MB RAM.
>
> What can I do for faster proceed?

In procmail the incantation is something like:

:0 fw: spamassassin.lock
* < 250000
| /usr/bin/spamc -t 150

For other tools it's probably at least vaguely similar. You bypass
spamassassin for large messages. And that's a job for something outside
spamassassin itself.

{^_^}



Re: sa-learn and big messages

Posted by Matt Kettler <mk...@evi-inc.com>.
Jim Maul wrote:
> Ingo Reinhart wrote:
> 
>> Hello!
>>
>> If I commit a big mail (32 MB) to sa-learn it need long time. I must
>> wait 50 sec. and the sa-learn process need 332 MB RAM.
>>
>> What can I do for faster proceed?
>>
>> Ingo
>>
>>
>>
>>
> 
> um..since messages over 250k (default) wont be scanned by SA, why bother
> sa-learning anything over this limit?  Sa isnt going to scan it anyway.
> 
> -Jim
> 




Based on the way bayes works, that doesn't make much sense Jim.

Bayes doesn't learn messages, it learns tokens from within messages.

Really, you don't care if SA is going to scan messages of the same size or not.
You care if it will scan messages with some of the same content.

It's quite possible the 32mb is a large version of a message that's normally
short. For example logwatch output.

The only reason training the 32mb message would be pointless would be if it only
contained content that would be in similarly large messages.



Minor Note of Clarification: that 250k default limit applies to those who use
spamd, which admittedly Ingo does use. But it is not inherent in spamassassin in
general (i.e. those using the API or spamassassin command-line don't have this
feature unless implemented elsewhere)





Re: sa-learn and big messages

Posted by Jim Maul <jm...@elih.org>.
Ingo Reinhart wrote:
> Hello!
> 
> If I commit a big mail (32 MB) to sa-learn it need long time. I must 
> wait 50 sec. and the sa-learn process need 332 MB RAM.
> 
> What can I do for faster proceed?
> 
> Ingo
> 
> 
> 
> 

um..since messages over 250k (default) wont be scanned by SA, why bother 
sa-learning anything over this limit?  Sa isnt going to scan it anyway.

-Jim

Re: sa-learn and big messages

Posted by Matt Kettler <mk...@evi-inc.com>.
Ingo Reinhart wrote:
> Hello!
> 
> If I commit a big mail (32 MB) to sa-learn it need long time. I must
> wait 50 sec. and the sa-learn process need 332 MB RAM.
> 
> What can I do for faster proceed?
> 

Don't commit such a large message to sa-learn?

Seriously, sa-learn isn't designed to handle such a huge input message.

If you're using SA under 3.0.1 you can improve the memory usage somewhat with an
upgrade, but even that's not going to make things fast with such a large input
email.

See bug:
http://bugzilla.spamassassin.org/show_bug.cgi?id=3876