You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by "Richard E. Bewley, Jr." <rb...@parabolainc.com> on 2006/07/05 05:07:53 UTC
Bayes autolearn configuration
Hi,
I'm using SpamAssassin version 3.1.3 running on Perl version 5.8.0. My
autolearn is enabled, and I'm getting the below headers, which according
to spamassassin documentation means that autolearn is enabled, but not
meeting required criteria to learn. I am using the default thresholds.
Can anyone shed some light on why no messages are being autolearned?
My lint is clean.
When I debug:
[24212] dbg: bayes: database connection established
[24212] dbg: bayes: found bayes db version 3
[24212] dbg: bayes: Using userid: 102
[24212] dbg: bayes: not available for scanning, only 12 spam(s) in bayes
DB < 100
[24212] dbg: bayes: not scoring message, returning undef
[24212] dbg: bayes: DB expiry: tokens in DB: 2639, Expiry max size:
180000, Oldest atime: 1117030672, Newest atime: 1151309839, Last expire:
0, Current time: 1152068718
X-Spam-Status: Yes, score=16.6 required=5.0 tests=SARE_OEM_AND_OTHER,
SARE_OEM_PRODS_1,SARE_OEM_PRODS_FEW,SARE_OEM_PRO_DOL,SARE_PRODUCTS_02,
SARE_PRODUCTS_03,UNPARSEABLE_RELAY,URIBL_JP_SURBL,URIBL_OB_SURBL,
URIBL_SBL,URIBL_SC_SURBL,URI_NOVOWEL autolearn=no version=3.1.1
--
Thanks!
Richard Bewley
Parabola
www.parabolainc.com
Re: Bayes autolearn configuration
Posted by Kris Deugau <kd...@vianet.ca>.
Steven Stern wrote:
> It appears that you do not yet have enough spam and ham in your
> database to enable learning. You need to use sa-learn to push some
> spam and ham through the system.
That's not quite correct. There are no "number of learned spam/ham"
thresholds for autolearning; the threshold is a combination of a basic
score (check the Mail::SpamAssassin::Conf man page for the defaults on
your system - IIRC it's >12 for spam, <0.1 for ham) and a requirement
that at least 3 points come from header rules, and 3 from body rules.
Again, check your local man page for the specific details on your local
install. (This doesn't seem to have changed since Bayes was introduced.)
The Bayes subsystem will not *return* a score until the "numer of
messages" thresholds are passed - by default 200 each ham and spam.
Manual training is still highly recommended early on, to make sure you
get *accurate* training. I've got a number of systems I paid fairly
close attention to early on, when I upgraded to SA2.54 and introduced
them to Bayes support. I've *never* had to wipe and retrain any of
them. (I *do* get customer "missed-spam" reports that occasionally show
BAYES_{00,01,10} scores, but that's pretty rare, and I feed those
messages back ASAP to keep things on track. Checking those messages
afterward usually shows BAYES_50 or better.)
> Richard E. Bewley, Jr. wrote:
>>SARE_OEM_PRODS_1,SARE_OEM_PRODS_FEW,SARE_OEM_PRO_DOL,SARE_PRODUCTS_02,
>> SARE_PRODUCTS_03,UNPARSEABLE_RELAY,URIBL_JP_SURBL,URIBL_OB_SURBL,
>> URIBL_SBL,URIBL_SC_SURBL,URI_NOVOWEL autolearn=no version=3.1.1
Richard, your system didn't autolearn this particular message because
there weren't enough hits on header rules (UNPARSEABLE_RELAY is it, I
think; network tests (eg, URIRBL*) are also ignored for determining
which scoreset to use to decide whether to autolearn). The SARE
rulesets look mostly at the message bodies IIRC.
(from man Mail::SpamAssassin::Conf)
Note that certain tests are ignored when determining whether a
message should be trained upon:
- rules with tflags set to 'learn' (the Bayesian rules)
- rules with tflags set to 'userconf' (user white/black-listing
rules, etc)
- rules with tflags set to 'noautolearn'
Also note that auto-training occurs using scores from either
scoreset 0 or 1, depending on what scoreset is used during message
check. It is likely that the message check and auto-train scores
will be different.
-kgd
Re: Bayes autolearn configuration
Posted by "Richard E. Bewley, Jr." <rb...@parabolainc.com>.
Steven Stern wrote:
> Richard E. Bewley, Jr. wrote:
>
>> Hi,
>>
>> I'm using SpamAssassin version 3.1.3 running on Perl version 5.8.0. My
>> autolearn is enabled, and I'm getting the below headers, which according
>> to spamassassin documentation means that autolearn is enabled, but not
>> meeting required criteria to learn. I am using the default thresholds.
>> Can anyone shed some light on why no messages are being autolearned?
>>
>> My lint is clean.
>> When I debug:
>> [24212] dbg: bayes: database connection established
>> [24212] dbg: bayes: found bayes db version 3
>> [24212] dbg: bayes: Using userid: 102
>> [24212] dbg: bayes: not available for scanning, only 12 spam(s) in bayes
>> DB < 100
>> [24212] dbg: bayes: not scoring message, returning undef
>> [24212] dbg: bayes: DB expiry: tokens in DB: 2639, Expiry max size:
>> 180000, Oldest atime: 1117030672, Newest atime: 1151309839, Last expire:
>> 0, Current time: 1152068718
>>
>> X-Spam-Status: Yes, score=16.6 required=5.0 tests=SARE_OEM_AND_OTHER,
>>
>> SARE_OEM_PRODS_1,SARE_OEM_PRODS_FEW,SARE_OEM_PRO_DOL,SARE_PRODUCTS_02,
>> SARE_PRODUCTS_03,UNPARSEABLE_RELAY,URIBL_JP_SURBL,URIBL_OB_SURBL,
>> URIBL_SBL,URIBL_SC_SURBL,URI_NOVOWEL autolearn=no version=3.1.1
>>
>>
>
> It appears that you do not yet have enough spam and ham in your database
> to enable learning. You need to use sa-learn to push some spam and ham
> through the system.
>
>
>> not available for scanning, only 12 spam(s) in bayes DB < 100
>>
>
> There are only 12 spam, but your local.cf file says not to autolearn
> until there are at least 100.
>
>
Thanks for the quick response. I was under the impression that only
scanning was enabled at that threshold, not both scanning and learning.
I'll give it a shot and see what happens...
--
Thanks!
Richard Bewley
Parabola
www.parabolainc.com
Re: Bayes autolearn configuration
Posted by "Richard E. Bewley, Jr." <rb...@parabolainc.com>.
Steven Stern wrote:
> Richard E. Bewley, Jr. wrote:
>
>> Hi,
>>
>> I'm using SpamAssassin version 3.1.3 running on Perl version 5.8.0. My
>> autolearn is enabled, and I'm getting the below headers, which according
>> to spamassassin documentation means that autolearn is enabled, but not
>> meeting required criteria to learn. I am using the default thresholds.
>> Can anyone shed some light on why no messages are being autolearned?
>>
>> My lint is clean.
>> When I debug:
>> [24212] dbg: bayes: database connection established
>> [24212] dbg: bayes: found bayes db version 3
>> [24212] dbg: bayes: Using userid: 102
>> [24212] dbg: bayes: not available for scanning, only 12 spam(s) in bayes
>> DB < 100
>> [24212] dbg: bayes: not scoring message, returning undef
>> [24212] dbg: bayes: DB expiry: tokens in DB: 2639, Expiry max size:
>> 180000, Oldest atime: 1117030672, Newest atime: 1151309839, Last expire:
>> 0, Current time: 1152068718
>>
>> X-Spam-Status: Yes, score=16.6 required=5.0 tests=SARE_OEM_AND_OTHER,
>>
>> SARE_OEM_PRODS_1,SARE_OEM_PRODS_FEW,SARE_OEM_PRO_DOL,SARE_PRODUCTS_02,
>> SARE_PRODUCTS_03,UNPARSEABLE_RELAY,URIBL_JP_SURBL,URIBL_OB_SURBL,
>> URIBL_SBL,URIBL_SC_SURBL,URI_NOVOWEL autolearn=no version=3.1.1
>>
>>
>
> It appears that you do not yet have enough spam and ham in your database
> to enable learning. You need to use sa-learn to push some spam and ham
> through the system.
>
>
>> not available for scanning, only 12 spam(s) in bayes DB < 100
>>
>
> There are only 12 spam, but your local.cf file says not to autolearn
> until there are at least 100.
>
>
Well, now the database is available for scanning, and it's still not
autolearning. Anything else I can check?
--
Thanks!
Richard Bewley
Parabola
www.parabolainc.com
Re: Bayes autolearn configuration
Posted by Steven Stern <su...@sterndata.com>.
Richard E. Bewley, Jr. wrote:
> Hi,
>
> I'm using SpamAssassin version 3.1.3 running on Perl version 5.8.0. My
> autolearn is enabled, and I'm getting the below headers, which according
> to spamassassin documentation means that autolearn is enabled, but not
> meeting required criteria to learn. I am using the default thresholds.
> Can anyone shed some light on why no messages are being autolearned?
>
> My lint is clean.
> When I debug:
> [24212] dbg: bayes: database connection established
> [24212] dbg: bayes: found bayes db version 3
> [24212] dbg: bayes: Using userid: 102
> [24212] dbg: bayes: not available for scanning, only 12 spam(s) in bayes
> DB < 100
> [24212] dbg: bayes: not scoring message, returning undef
> [24212] dbg: bayes: DB expiry: tokens in DB: 2639, Expiry max size:
> 180000, Oldest atime: 1117030672, Newest atime: 1151309839, Last expire:
> 0, Current time: 1152068718
>
> X-Spam-Status: Yes, score=16.6 required=5.0 tests=SARE_OEM_AND_OTHER,
>
> SARE_OEM_PRODS_1,SARE_OEM_PRODS_FEW,SARE_OEM_PRO_DOL,SARE_PRODUCTS_02,
> SARE_PRODUCTS_03,UNPARSEABLE_RELAY,URIBL_JP_SURBL,URIBL_OB_SURBL,
> URIBL_SBL,URIBL_SC_SURBL,URI_NOVOWEL autolearn=no version=3.1.1
>
It appears that you do not yet have enough spam and ham in your database
to enable learning. You need to use sa-learn to push some spam and ham
through the system.
> not available for scanning, only 12 spam(s) in bayes DB < 100
There are only 12 spam, but your local.cf file says not to autolearn
until there are at least 100.
--
Steve