You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Geoff Sweet <li...@whootis.com> on 2005/05/13 08:38:23 UTC

Help with Bayes auto-learn

I would like to enable the Bayes system with auto-learning.  I thought 
that I had my config setup correctly but apparently I don't.  My config 
looks like this:

##########
# How we want to modify the email
rewrite_header subject [**SPAM**]
report_safe 0

#Bayes learning system
use_bayes 1
bayes_auto_learn 1

# Define the sensitivity level. Standard level is 5.
required_hits 6.8

# Enable SpamAssassin's RBL checking features :
skip_rbl_checks 0
rbl_timeout 3
num_check_received 3
score RCVD_IN_BL_SPAMCOP_NET 3
report_header 1
use_terse_report 1
##########

so I thought from the reading in the FAQ and on the wiki that this would 
enable bayes, and turn on its auto_learn for spam that hits higher then 
the default of 12.  But in my logs I end up with this:

2005-05-12 23:30:33.240563500 2005-05-13 06:30:33 [88906] i: connection 
from localhost.whootis.com [127.0.0.1] at port 4737
2005-05-12 23:30:33.333094500 2005-05-13 06:30:33 [88906] i: processing 
message <7o...@k08.kdrv> for qmaild:10004.
2005-05-12 23:30:33.431814500 2005-05-13 06:30:33 [88906] i: identified 
spam (23.2/6.8) for qmaild:10004 in 0.2 seconds, 1311 bytes.
2005-05-12 23:30:33.432514500 2005-05-13 06:30:33 [88906] i: result: Y 
23 - 
BAYES_99,FORGED_MUA_THEBAT_BOUN,FORGED_THEBAT_HTML,FORGED_YAHOO_RCVD,HEAD_ILLEGAL_CHARS,HTML_MESSAGE,HTML_MIME_NO_HTML_TAG,MIME_HTML_ONLY,MIME_HTML_ONLY_MULTI,MSGID_RANDY,NORMAL_HTTP_TO_IP,RCVD_BY_IP,RCVD_DOUBLE_IP_LOOSE,RCVD_HELO_IP_MISMATCH,RCVD_NUMERIC_HELO,SUBJ_ILLEGAL_CHARS 
scantime=0.2,size=1311,mid=<7o...@k08.kdrv>,bayes=0.999999999999999,autolearn=no

Does the "autolearn=no" mean that this message has not been submitted to 
bayes for auto-learn?  And if not, can someone steer me in the right 
direction for getting my config setup correctly?

Thanks very much,
Geoff Sweet

Re: Help with Bayes auto-learn

Posted by wolfgang <me...@gmx.net>.
In an older episode (Friday 13 May 2005 08:38), Geoff Sweet wrote:
> I would like to enable the Bayes system with auto-learning.  I thought 
> that I had my config setup correctly but apparently I don't.  My config 
> looks like this:
> 
> ##########
> # How we want to modify the email
> rewrite_header subject [**SPAM**]
> report_safe 0
> 
> #Bayes learning system
> use_bayes 1
> bayes_auto_learn 1

In an older episode (Friday 13 May 2005 10:17), George Breahna wrote:
> I really recommend you research your question before asking it.

good point, anyway:

man Mail::SpamAssassin::Conf 
and
http://spamassassin.apache.org/full/3.0.x/dist/doc/Mail_SpamAssassin_Conf.html
would tell you:

bayes_min_ham_num (Default: 200)
bayes_min_spam_num (Default: 200)
    To be accurate, the Bayes system does not activate until a certain number 
of ham (non-spam) and spam have been learned. The default is 200 of each ham 
and spam, but you can tune these up or down with these two settings.

for information how to learn the needed amount of mails, see

man sa-learn

regards,

wolfgang


Re: Help with Bayes auto-learn

Posted by Matt Kettler <mk...@comcast.net>.
At 02:38 AM 5/13/2005, Geoff Sweet wrote:
>2005-05-12 23:30:33.432514500 2005-05-13 06:30:33 [88906] i: result: Y 23 
>- 
>BAYES_99,FORGED_MUA_THEBAT_BOUN,FORGED_THEBAT_HTML,FORGED_YAHOO_RCVD,HEAD_ILLEGAL_CHARS,HTML_MESSAGE,HTML_MIME_NO_HTML_TAG,MIME_HTML_ONLY,MIME_HTML_ONLY_MULTI,MSGID_RANDY,NORMAL_HTTP_TO_IP,RCVD_BY_IP,RCVD_DOUBLE_IP_LOOSE,RCVD_HELO_IP_MISMATCH,RCVD_NUMERIC_HELO,SUBJ_ILLEGAL_CHARS 
>scantime=0.2,size=1311,mid=<7o...@k08.kdrv>,bayes=0.999999999999999,autolearn=no
>
>Does the "autolearn=no" mean that this message has not been submitted to 
>bayes for auto-learn?  And if not, can someone steer me in the right 
>direction for getting my config setup correctly?

First, I'm assuming you're using SA 3.0.0 or higher, if not, please specify 
version and I'll correct my message (some of the details differ)

That does mean the message was not autolearned. However, it does not mean 
that no messages will be autolearned. In SA 3.0 if autolearning was 
disabled, or failing, you would have seen "disabled" or "failed", not "no".

The requirements for autolearning are considerably more complex than just 
"total score over xx".

The following things have to happen:

Note: ALL scores referenced below are the learning score. Learning score is 
NOT the same as the final spam score. It is the score recalculated as if 
bayes was disabled, *including* changing scoreset. Also all AWL, whitelist, 
and blacklist rules don't count towards this score.

1) total learning score over bayes_auto_learn_threshold_spam (default 12)
2) learning score of  header rules must be over 3.0
3) learning score of  body rules must be over 3.0
4) existing bayes learning must not be strongly ham (ie: don't learn as 
spam anything that would otherwise get bayes_00'ed)
5) From addresses (including Return-Path, etc) must not match a 
bayes_ignore_from statement
6) To addresses (including Cc, etc) must not match a bayes_ignore_from 
statement
7) The bayes DB must not be locked by some other SA process (another 
learner, expiry, etc). Note: this test results in autolearn=failed.


See also:
http://wiki.apache.org/spamassassin/AutolearningNotWorking




RE: Help with Bayes auto-learn

Posted by George Breahna <sa...@top-consulting.net>.
I can swear I saw this question in at least 20 different messages, not to
mention the website

I really recommend you research your question before asking it.

autolearn=no means that it didn't 'learn' this message.

Other possible states are 'spam, 'ham' and ... 'DISABLED'

If autolearn were to be disabled, you would see this last one.





I would like to enable the Bayes system with auto-learning.  I thought that
I had my config setup correctly but apparently I don't.  My config looks
like this:

##########
# How we want to modify the email
rewrite_header subject [**SPAM**]
report_safe 0

#Bayes learning system
use_bayes 1
bayes_auto_learn 1

# Define the sensitivity level. Standard level is 5.
required_hits 6.8

# Enable SpamAssassin's RBL checking features :
skip_rbl_checks 0
rbl_timeout 3
num_check_received 3
score RCVD_IN_BL_SPAMCOP_NET 3
report_header 1
use_terse_report 1
##########

so I thought from the reading in the FAQ and on the wiki that this would
enable bayes, and turn on its auto_learn for spam that hits higher then the
default of 12.  But in my logs I end up with this:

2005-05-12 23:30:33.240563500 2005-05-13 06:30:33 [88906] i: connection from
localhost.whootis.com [127.0.0.1] at port 4737
2005-05-12 23:30:33.333094500 2005-05-13 06:30:33 [88906] i: processing
message <7o...@k08.kdrv> for qmaild:10004.
2005-05-12 23:30:33.431814500 2005-05-13 06:30:33 [88906] i: identified spam
(23.2/6.8) for qmaild:10004 in 0.2 seconds, 1311 bytes.
2005-05-12 23:30:33.432514500 2005-05-13 06:30:33 [88906] i: result: Y
23 -
BAYES_99,FORGED_MUA_THEBAT_BOUN,FORGED_THEBAT_HTML,FORGED_YAHOO_RCVD,HEAD_IL
LEGAL_CHARS,HTML_MESSAGE,HTML_MIME_NO_HTML_TAG,MIME_HTML_ONLY,MIME_HTML_ONLY
_MULTI,MSGID_RANDY,NORMAL_HTTP_TO_IP,RCVD_BY_IP,RCVD_DOUBLE_IP_LOOSE,RCVD_HE
LO_IP_MISMATCH,RCVD_NUMERIC_HELO,SUBJ_ILLEGAL_CHARS
scantime=0.2,size=1311,mid=<7o...@k08.kdrv>,bayes=0.9
99999999999999,autolearn=no

Does the "autolearn=no" mean that this message has not been submitted to
bayes for auto-learn?  And if not, can someone steer me in the right
direction for getting my config setup correctly?

Thanks very much,
Geoff Sweet