You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by J Doe <ge...@nativemethods.com> on 2018/07/03 16:17:17 UTC

Question regarding auto-learning

Hello,

I have a question regarding autolearning and Bayes functionality.

From reading the documentation, it appears that to train the Bayesian filter I require a minimum of 1,000 pieces of ham and 1,000 pieces of spam.  I am currently collecting spam on one of my servers via a spam trap address and slowly reaching that number.  I was wondering, though, if I can use auto learning (bayes_auto_learn 1), before training the database ?

When autolearn fires on messages at the moment, it is correctly detecting ham and spam based on the default ham and spam thresholds:

    bayes_auto_learn_threshold_nonspam 0.1
    bayes_auto_learn_threshold_spam 12.0

Can this be used before training the database or is it more often used to supplement (on an ongoing basis), a database that has already be trained ?

Thanks,

- J



Re: Question regarding auto-learning

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.
On 03.07.18 12:17, J Doe wrote:
> From reading the documentation, it appears that to train the Bayesian
> filter I require a minimum of 1,000 pieces of ham and 1,000 pieces of
> spam.

no. You need at least 200 hams and spams for bayes to start firing but you
can tune it bu setting bayes_min_ham_num and bayes_min_spam_num.

note that too few mails trained can result in false positives/negatives.

> I am currently collecting spam on one of my servers via a spam trap
> address and slowly reaching that number.  I was wondering, though, if I
> can use auto learning (bayes_auto_learn 1), before training the database ?

autolearning does training instead of you. manual training is still faster
and more precise.

> When autolearn fires on messages at the moment, it is correctly detecting
> ham and spam based on the default ham and spam thresholds:
>
>    bayes_auto_learn_threshold_nonspam 0.1
>    bayes_auto_learn_threshold_spam 12.0
>
> Can this be used before training the database or is it more often used to
> supplement (on an ongoing basis), a database that has already be trained ?

those don't contradict each other.
you can use manual and automatic learning both.

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Chernobyl was an Windows 95 beta test site.