You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Nico Weinreich <we...@posingcrew.de> on 2007/11/18 15:15:13 UTC

sa-learn and message size

Hi there,

I'm looking for an option to limit the file size, sa-learn is learning 
from. I don't use spamd, so I cannot use spamc. When learning with 
sa-learn, it accepts a file or a dir as source. The problem is, when 
there is a message with a size about 3 or 4 MB, the bayes_toks DB grows 
and grows and grows sometimes 10 or 20 MB, so older tokens are lost by 
expire. Where can I tell sa-learn, that if a file or a file in a dir is 
greater than x MB, that this file is passed without learning?

Greetz, Nico


Re: sa-learn and message size

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
> Matus UHLAR - fantomas schrieb:
> >(you can install sa-3.2.3 from debian-volatile archive now)

On 18.11.07 21:56, Nico Weinreich wrote:
> I've upgraded to 3.2.3 from tarball (Debian testing provides only 3.2.1) 
                                              ^^^^^^^
I wrote "volatile", not "testing". See http://packages.debian.org/spamassassin 
and http://volatile.debian.org/.

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
You have the right to remain silent. Anything you say will be misquoted,
then used against you. 

Re: sa-learn and message size

Posted by Nico Weinreich <we...@posingcrew.de>.
Matus UHLAR - fantomas schrieb:
> On 18.11.07 19:45, Nico Weinreich wrote:
>   
>> I've noticed, that messages learned by scanning a maildir are skipped if 
>> their size exceeds 256k. But IMP (from horde) sends the message as raw 
>> text in a pipe to sa-learn and this message is accepted, never mind how 
>> big the message is. I'm using SA 3.1.7-deb. I'm waiting for a respone 
>> from the imp mailing list.
>>     
>
> it accepts them, but does not process them.
>
> (you can install sa-3.2.3 from debian-volatile archive now)
>
>   

I've upgraded to 3.2.3 from tarball (Debian testing provides only 3.2.1) 
and added a "-" to the sa-learn command of imp for STDIN. Now it seems 
to work. Thanks.


Re: sa-learn and message size

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
On 18.11.07 19:45, Nico Weinreich wrote:
> I've noticed, that messages learned by scanning a maildir are skipped if 
> their size exceeds 256k. But IMP (from horde) sends the message as raw 
> text in a pipe to sa-learn and this message is accepted, never mind how 
> big the message is. I'm using SA 3.1.7-deb. I'm waiting for a respone 
> from the imp mailing list.

it accepts them, but does not process them.

(you can install sa-3.2.3 from debian-volatile archive now)

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
"One World. One Web. One Program." - Microsoft promotional advertisement
"Ein Volk, ein Reich, ein Fuhrer!" - Adolf Hitler

Re: sa-learn and message size

Posted by Nico Weinreich <we...@posingcrew.de>.
I've noticed, that messages learned by scanning a maildir are skipped if 
their size exceeds 256k. But IMP (from horde) sends the message as raw 
text in a pipe to sa-learn and this message is accepted, never mind how 
big the message is. I'm using SA 3.1.7-deb. I'm waiting for a respone 
from the imp mailing list.

Nico

Re: sa-learn and message size

Posted by Matt Kettler <mk...@verizon.net>.
Nico Weinreich wrote:
> Hi there,
>
> I'm looking for an option to limit the file size, sa-learn is learning
> from. I don't use spamd, so I cannot use spamc. When learning with
> sa-learn, it accepts a file or a dir as source. The problem is, when
> there is a message with a size about 3 or 4 MB, the bayes_toks DB
> grows and grows and grows sometimes 10 or 20 MB, so older tokens are
> lost by expire. Where can I tell sa-learn, that if a file or a file in
> a dir is greater than x MB, that this file is passed without learning?
sa-learn has no such option.

If you really want that functionality, you'd have to write a script that
checks the size before calling sa-learn.

Personally, I think a better feature, were any such thing to be added to
sa-learn, would be to limit the number of tokens it can generate per
message. This would allow large messages with only a few new tokens to
be learned, without risk of flooding the bayes DB. However, I've never
seen a message flood it myself..