You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Henrique Fernandes <sf...@gmail.com> on 2010/02/23 04:27:18 UTC

Understand Scores and Learn.

Hello everyone,

I am new here and I start to use spamassassin as anti-spam solution where i
work, but i have a few questions about it still, and i am not a fluent
english speaker so sorry for any mistakes that i will make.

I am confusied by how the sa-learn works, not really how it works, but how
it learns. I using the auto-learn and mysql configuration for bayes. I am
using the per user configuration in bayes, it means the auto-learn creates a
user and keep his information there, my sa-learn comand has also the options
'-u' and makes the user learn form any reports that he makes. My question is
wich is more effective and how it would distingui from each user or email.
Let me try explain.

User1 reports that an message marked as spam in spamassassin that it is not
a spam for him, cause he migh subcribe for viagra emails.

User2 do not want any viagra emais, so does nothing.

how it would affect the next reports on spamassassin if there is no distingu
from ech user in the sql base ?

In my case i use per user sql, that means that  the user will not get any
improvement that other reports in other accoutns have made ? Like User1 gets
lot of lot emails, so his base learns a lot so the spamassassin get more
accurency but user2 do not get emails, so his base will not be trained and
will not be accuraced the results?

Using per user configuration in mysql,m there is an way to get a lot of
messagens and train the spamassassin ? that the spamassassin take that
probaliby also in the calculation ?

Now about expirations.

Why my  table bayes_expire still empty ? And it will expire automaticly or i
do have to user sa-lear --force-expire ?

Well i guess it will do!

Sorry again for my bad english

And if some one has an article explaning something like it i will be very
glad!


[]'sf.rique

Re: Understand Scores and Learn.

Posted by Kai Schaetzl <ma...@conactive.com>.
Henrique Fernandes wrote on Tue, 23 Feb 2010 00:27:18 -0300:

> In my case i use per user sql, that means that  the user will not get any
> improvement that other reports in other accoutns have made ?

Correct.

Like User1 gets
> lot of lot emails, so his base learns a lot so the spamassassin get more
> accurency but user2 do not get emails, so his base will not be trained and
> will not be accuraced the results?

Correct.

> 
> Using per user configuration in mysql,m there is an way to get a lot of
> messagens and train the spamassassin ? that the spamassassin take that
> probaliby also in the calculation ?

That doesn't make sense. You would have to train each single user's bayes db 
with the same lot of messages. In the end they are nearly identical and you 
wasted a lot of ressources.

I think (and those using site-wide Bayes will agree) that using user-specific 
Bayes makes only sense if your users get a lot of messages. Otherwise your dbs 
are not trained well and you are better off not using it. Site-wide Bayes 
really works very well, even on hosting platforms where you have a lot of 
diverse clients. What you need is a well-trained starter database and good 
automatic training afterwards. We do that with autolearning and with learning 
spamtrap mailboxes. Users do learning only occasionally.

> Now about expirations.
> 
> Why my  table bayes_expire still empty ?

I'm not sure what it is used for, maybe for locking. Don't worry.

And it will expire automaticly or i
> do have to user sa-lear --force-expire ?

It will expire automatically if that feature is not disabled and once it 
reaches a certain limit (I think it's 100.000 tokens by default). There's more 
on this in spamassassin_conf documentation (*) and on the wiki.
I think it's established good practice to switch it off and do it manually in 
the night, once per day or per week, depending on how fast your db grows. You 
may also want to rise the limit.

(*) http://spamassassin.apache.org/full/3.3.x/doc/

Kai

-- 
Get your web at Conactive Internet Services: http://www.conactive.com