You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by John Fleming <jo...@wa9als.com> on 2004/08/28 01:32:20 UTC

another auto_learn

I've read the FAQ about why autolearning might not appear to be working.
However, nothing seems to fit.  Bayes -IS- working, but no ham or spam is
being autolearned.  I've recently installed a new Linux distro and only have
200-some spam and ham learned.  I know there have been messages analyzed
that weren't previously learned.

auto_learn is set to 1 and the emails in question far exceed the set
threshholds.

Any thoughts about why auto_learn might appear not to be working other than
a message already having been learned?  Thanks - John



Re: another auto_learn

Posted by Theo Van Dinter <fe...@kluge.net>.
On Fri, Aug 27, 2004 at 06:32:20PM -0500, John Fleming wrote:
> Any thoughts about why auto_learn might appear not to be working other than
> a message already having been learned?  Thanks - John

There's a number of reasons actually.  The easiest thing to do is run with -D
and see what the debug output says.

-- 
Randomly Generated Tagline:
Life gets boring, someone invents another necessity, and once again we
 turn the crank on the screwjack of progress hoping that nobody gets
 screwed.
              -- Larry Wall in <19...@wall.org>

Re: another auto_learn

Posted by John Fleming <jo...@wa9als.com>.
----- Original Message ----- 
From: "Matt Kettler" <mk...@comcast.net>
To: "John Fleming" <jo...@wa9als.com>;
<sp...@incubator.apache.org>
Sent: Friday, August 27, 2004 10:41 PM
Subject: Re: another auto_learn


> At 06:32 PM 8/27/2004 -0500, John Fleming wrote:
> >auto_learn is set to 1 and the emails in question far exceed the set
> >threshholds.
> >
> >Any thoughts about why auto_learn might appear not to be working other
than
> >a message already having been learned?  Thanks - John
>
> Reasons not to learn, other than threshold and already learned:
>
>          1) lock contention... bayes autolearn is an opportunistic
process,
> if it can't get a R/W lock on the database, autolearning is skipped. (the
> alternative is to logjam the mail queue)
>
>          2) the score calculations are done with bayes disabled, including
> scoreset shift. This can make a HUGE difference. Some rules score high in
> set 3, but not in set 0 or 1.
>
>          3) must have 3.0 worth of header, and 3.0 worth of body points,
> regardless of threshold. Thus a message can match 100 points of body
rules,
> but not autolearn if it only matches 1.2 worth of header rules.
>
>          4) no contradictions rule - don't autolearn as spam anything that
> would otherwise be very low in bayes score. Same for ham and high bayes
> scores. (not 100% working for all cases in all SA versions, but is a
> general design rule)
>

Thanks for the nice summary, Matt.  In 24 hrs, I've not autolearned
anything.  Normally my server would autolearn 1500 or so ham and spam in a
fairly short time.  That would be very unusual for my server, and seems
unlikely to result from any of the above.  I've changed distros from Debian
unstable to Debian testing, but my local.cf file was saved and restored, so
the settings are the same for threshhold, auto_learn etc.  I got Bayes going
by manually feeding it with 200+ ham and spam, and the Bayes is working
fine.  Can't see the auto_learn problem.    - John



Re: another auto_learn

Posted by Matt Kettler <mk...@comcast.net>.
At 06:32 PM 8/27/2004 -0500, John Fleming wrote:
>auto_learn is set to 1 and the emails in question far exceed the set
>threshholds.
>
>Any thoughts about why auto_learn might appear not to be working other than
>a message already having been learned?  Thanks - John

Reasons not to learn, other than threshold and already learned:

         1) lock contention... bayes autolearn is an opportunistic process, 
if it can't get a R/W lock on the database, autolearning is skipped. (the 
alternative is to logjam the mail queue)

         2) the score calculations are done with bayes disabled, including 
scoreset shift. This can make a HUGE difference. Some rules score high in 
set 3, but not in set 0 or 1.

         3) must have 3.0 worth of header, and 3.0 worth of body points, 
regardless of threshold. Thus a message can match 100 points of body rules, 
but not autolearn if it only matches 1.2 worth of header rules.

         4) no contradictions rule - don't autolearn as spam anything that 
would otherwise be very low in bayes score. Same for ham and high bayes 
scores. (not 100% working for all cases in all SA versions, but is a 
general design rule)