You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by "Chr. v. Stuckrad" <st...@mi.fu-berlin.de> on 2006/07/19 17:24:22 UTC

Re: Will bayes-db be 'skewed' by ... autolearning ham?

On Tue, 18 Jul 2006, Dirk Bonengel wrote:
> did you investigate auto-learning? This might let your system learn ham 
> as well as spam. Works fine here (same situation  - gateway server to a 
> Lotus Notes system, no feedback loop possible)

May be I should change the threshholds for autolearning
different from the default? (I never touched them so far).
I just found *lots* 'autolearn=ham' in my log,
and I can not believe that so many are correct.

Out of the current log I see Mail classified as
   21805 ham
   11493 autolearned as ham   (this seems suspiciously high?)
   85963 spam
   52977 autolearned as spam

So I fear the 'skew' in my database comes form autoloearning
'bayes-fodder' of spammers and not fron 'skewed explicite learning'.

WHat may make it even worse is, that 'inhouse mail==ham' is
never learned, because it's never spamchecked (users did complain
too much about the slowdown, so only the 'outside' goes through the
Spamfilter).

Stucki

-- 
Christoph von Stuckrad      * * |nickname |<st...@mi.fu-berlin.de>   \
Freie Universitaet Berlin   |/_*|'stucki' |Tel(days):+49 30 838-5 57 78|
Mathematik & Informatik EDV |\ *|if online|Tel(else):+49 30 77 39 66 00|
Arnimallee 6 / 14195 Berlin * * |on IRCnet|Fax(alle):+49 30 838-75 454/

Re: Will bayes-db be 'skewed' by ... autolearning ham?

Posted by Paul Boven <p....@chello.nl>.
Hi all,

Loren Wilton wrote:
>> May be I should change the threshholds for autolearning
>> different from the default? (I never touched them so far).
> 
> Yes.  Set it to -0.1.   If you have been doing a lot of autolearning 
> without this you may have a moderately sick bayes db, and might want to 
> consider starting over.

Seconded - otherwise spam that doesn't score points gets autolearned. I 
have:
bayes_auto_learn_threshold_nonspam -0.1

So really only stuff that is whitelisted or has ALL_TRUSTED (e.g. 
outgoing mail) has any chance of being autolearned.

Regards, Paul Boven.

Re: Will bayes-db be 'skewed' by ... autolearning ham?

Posted by Loren Wilton <lw...@earthlink.net>.
> May be I should change the threshholds for autolearning
> different from the default? (I never touched them so far).

Yes.  Set it to -0.1.   If you have been doing a lot of autolearning without 
this you may have a moderately sick bayes db, and might want to consider 
starting over.

        Loren