You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by "Richard E. Bewley, Jr." <rb...@parabolainc.com> on 2006/07/05 05:07:53 UTC

Bayes autolearn configuration

Hi,

I'm using SpamAssassin version 3.1.3 running on Perl version 5.8.0.  My 
autolearn is enabled, and I'm getting the below headers, which according 
to spamassassin documentation means that autolearn is enabled, but not 
meeting required criteria to learn.  I am using the default thresholds.  
Can anyone shed some light on why no messages are being autolearned?

My lint is clean. 

When I debug:
[24212] dbg: bayes: database connection established
[24212] dbg: bayes: found bayes db version 3
[24212] dbg: bayes: Using userid: 102
[24212] dbg: bayes: not available for scanning, only 12 spam(s) in bayes 
DB < 100
[24212] dbg: bayes: not scoring message, returning undef
[24212] dbg: bayes: DB expiry: tokens in DB: 2639, Expiry max size: 
180000, Oldest atime: 1117030672, Newest atime: 1151309839, Last expire: 
0, Current time: 1152068718

X-Spam-Status: Yes, score=16.6 required=5.0 tests=SARE_OEM_AND_OTHER,
        
SARE_OEM_PRODS_1,SARE_OEM_PRODS_FEW,SARE_OEM_PRO_DOL,SARE_PRODUCTS_02,
        SARE_PRODUCTS_03,UNPARSEABLE_RELAY,URIBL_JP_SURBL,URIBL_OB_SURBL,
        URIBL_SBL,URIBL_SC_SURBL,URI_NOVOWEL autolearn=no version=3.1.1

-- 
Thanks!
Richard Bewley
Parabola

www.parabolainc.com


Re: Bayes autolearn configuration

Posted by Kris Deugau <kd...@vianet.ca>.
Steven Stern wrote:
 > It appears that you do not yet have enough spam and ham in your
 > database to enable learning.  You need to use sa-learn to push some
 > spam and ham through the system.

That's not quite correct.  There are no "number of learned spam/ham" 
thresholds for autolearning;  the threshold is a combination of a basic 
score (check the Mail::SpamAssassin::Conf man page for the defaults on 
your system - IIRC it's >12 for spam, <0.1 for ham) and a requirement 
that at least 3 points come from header rules, and 3 from body rules. 
Again, check your local man page for the specific details on your local 
install.  (This doesn't seem to have changed since Bayes was introduced.)

The Bayes subsystem will not *return* a score until the "numer of 
messages" thresholds are passed - by default 200 each ham and spam.

Manual training is still highly recommended early on, to make sure you 
get *accurate* training.  I've got a number of systems I paid fairly 
close attention to early on, when I upgraded to SA2.54 and introduced 
them to Bayes support.  I've *never* had to wipe and retrain any of 
them.  (I *do* get customer "missed-spam" reports that occasionally show 
BAYES_{00,01,10} scores, but that's pretty rare, and I feed those 
messages back ASAP to keep things on track.  Checking those messages 
afterward usually shows BAYES_50 or better.)

> Richard E. Bewley, Jr. wrote:
>>SARE_OEM_PRODS_1,SARE_OEM_PRODS_FEW,SARE_OEM_PRO_DOL,SARE_PRODUCTS_02,
>>       SARE_PRODUCTS_03,UNPARSEABLE_RELAY,URIBL_JP_SURBL,URIBL_OB_SURBL,
>>       URIBL_SBL,URIBL_SC_SURBL,URI_NOVOWEL autolearn=no version=3.1.1

Richard, your system didn't autolearn this particular message because 
there weren't enough hits on header rules (UNPARSEABLE_RELAY is it, I 
think;  network tests (eg, URIRBL*) are also ignored for determining 
which scoreset to use to decide whether to autolearn).  The SARE 
rulesets look mostly at the message bodies IIRC.

(from man Mail::SpamAssassin::Conf)
     Note that certain tests are ignored when determining whether a
     message should be trained upon:

      - rules with tflags set to 'learn' (the Bayesian rules)
      - rules with tflags set to 'userconf' (user white/black-listing
        rules, etc)
      - rules with tflags set to 'noautolearn'

     Also note that auto-training occurs using scores from either
     scoreset 0 or 1, depending on what scoreset is used during message
     check.  It is likely that the message check and auto-train scores
     will be different.

-kgd

Re: Bayes autolearn configuration

Posted by "Richard E. Bewley, Jr." <rb...@parabolainc.com>.
Steven Stern wrote:
> Richard E. Bewley, Jr. wrote:
>   
>> Hi,
>>
>> I'm using SpamAssassin version 3.1.3 running on Perl version 5.8.0.  My
>> autolearn is enabled, and I'm getting the below headers, which according
>> to spamassassin documentation means that autolearn is enabled, but not
>> meeting required criteria to learn.  I am using the default thresholds. 
>> Can anyone shed some light on why no messages are being autolearned?
>>
>> My lint is clean.
>> When I debug:
>> [24212] dbg: bayes: database connection established
>> [24212] dbg: bayes: found bayes db version 3
>> [24212] dbg: bayes: Using userid: 102
>> [24212] dbg: bayes: not available for scanning, only 12 spam(s) in bayes
>> DB < 100
>> [24212] dbg: bayes: not scoring message, returning undef
>> [24212] dbg: bayes: DB expiry: tokens in DB: 2639, Expiry max size:
>> 180000, Oldest atime: 1117030672, Newest atime: 1151309839, Last expire:
>> 0, Current time: 1152068718
>>
>> X-Spam-Status: Yes, score=16.6 required=5.0 tests=SARE_OEM_AND_OTHER,
>>       
>> SARE_OEM_PRODS_1,SARE_OEM_PRODS_FEW,SARE_OEM_PRO_DOL,SARE_PRODUCTS_02,
>>        SARE_PRODUCTS_03,UNPARSEABLE_RELAY,URIBL_JP_SURBL,URIBL_OB_SURBL,
>>        URIBL_SBL,URIBL_SC_SURBL,URI_NOVOWEL autolearn=no version=3.1.1
>>
>>     
>
> It appears that you do not yet have enough spam and ham in your database
> to enable learning.  You need to use sa-learn to push some spam and ham
> through the system.
>
>   
>>  not available for scanning, only 12 spam(s) in bayes DB < 100
>>     
>
> There are only 12 spam, but your local.cf file says not to autolearn
> until there are at least 100.
>
>   
Thanks for the quick response.  I was under the impression that only 
scanning was enabled at that threshold, not both scanning and learning.  
I'll give it a shot and see what happens...

-- 
Thanks!
Richard Bewley
Parabola

www.parabolainc.com


Re: Bayes autolearn configuration

Posted by "Richard E. Bewley, Jr." <rb...@parabolainc.com>.
Steven Stern wrote:
> Richard E. Bewley, Jr. wrote:
>   
>> Hi,
>>
>> I'm using SpamAssassin version 3.1.3 running on Perl version 5.8.0.  My
>> autolearn is enabled, and I'm getting the below headers, which according
>> to spamassassin documentation means that autolearn is enabled, but not
>> meeting required criteria to learn.  I am using the default thresholds. 
>> Can anyone shed some light on why no messages are being autolearned?
>>
>> My lint is clean.
>> When I debug:
>> [24212] dbg: bayes: database connection established
>> [24212] dbg: bayes: found bayes db version 3
>> [24212] dbg: bayes: Using userid: 102
>> [24212] dbg: bayes: not available for scanning, only 12 spam(s) in bayes
>> DB < 100
>> [24212] dbg: bayes: not scoring message, returning undef
>> [24212] dbg: bayes: DB expiry: tokens in DB: 2639, Expiry max size:
>> 180000, Oldest atime: 1117030672, Newest atime: 1151309839, Last expire:
>> 0, Current time: 1152068718
>>
>> X-Spam-Status: Yes, score=16.6 required=5.0 tests=SARE_OEM_AND_OTHER,
>>       
>> SARE_OEM_PRODS_1,SARE_OEM_PRODS_FEW,SARE_OEM_PRO_DOL,SARE_PRODUCTS_02,
>>        SARE_PRODUCTS_03,UNPARSEABLE_RELAY,URIBL_JP_SURBL,URIBL_OB_SURBL,
>>        URIBL_SBL,URIBL_SC_SURBL,URI_NOVOWEL autolearn=no version=3.1.1
>>
>>     
>
> It appears that you do not yet have enough spam and ham in your database
> to enable learning.  You need to use sa-learn to push some spam and ham
> through the system.
>
>   
>>  not available for scanning, only 12 spam(s) in bayes DB < 100
>>     
>
> There are only 12 spam, but your local.cf file says not to autolearn
> until there are at least 100.
>
>   
Well, now the database is available for scanning, and it's still not 
autolearning.  Anything else I can check?

-- 
Thanks!
Richard Bewley
Parabola

www.parabolainc.com


Re: Bayes autolearn configuration

Posted by Steven Stern <su...@sterndata.com>.
Richard E. Bewley, Jr. wrote:
> Hi,
> 
> I'm using SpamAssassin version 3.1.3 running on Perl version 5.8.0.  My
> autolearn is enabled, and I'm getting the below headers, which according
> to spamassassin documentation means that autolearn is enabled, but not
> meeting required criteria to learn.  I am using the default thresholds. 
> Can anyone shed some light on why no messages are being autolearned?
> 
> My lint is clean.
> When I debug:
> [24212] dbg: bayes: database connection established
> [24212] dbg: bayes: found bayes db version 3
> [24212] dbg: bayes: Using userid: 102
> [24212] dbg: bayes: not available for scanning, only 12 spam(s) in bayes
> DB < 100
> [24212] dbg: bayes: not scoring message, returning undef
> [24212] dbg: bayes: DB expiry: tokens in DB: 2639, Expiry max size:
> 180000, Oldest atime: 1117030672, Newest atime: 1151309839, Last expire:
> 0, Current time: 1152068718
> 
> X-Spam-Status: Yes, score=16.6 required=5.0 tests=SARE_OEM_AND_OTHER,
>       
> SARE_OEM_PRODS_1,SARE_OEM_PRODS_FEW,SARE_OEM_PRO_DOL,SARE_PRODUCTS_02,
>        SARE_PRODUCTS_03,UNPARSEABLE_RELAY,URIBL_JP_SURBL,URIBL_OB_SURBL,
>        URIBL_SBL,URIBL_SC_SURBL,URI_NOVOWEL autolearn=no version=3.1.1
> 

It appears that you do not yet have enough spam and ham in your database
to enable learning.  You need to use sa-learn to push some spam and ham
through the system.

>  not available for scanning, only 12 spam(s) in bayes DB < 100

There are only 12 spam, but your local.cf file says not to autolearn
until there are at least 100.

-- 

  Steve