You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Clay Davis <cd...@avionics-specialties.com> on 2007/12/18 14:35:11 UTC

Re: [guinevere-discuss] Lint errors in 3.4

I've see several people write this.  Can someone point me to some debate
I can review?  It seems to me that if you set the autolearn threshold
fairly high and keep any eye on your bayes scoring, it would be a good
thing.

Thanks,
Clay

>>> "Joe Zitnik" <jz...@hfcc.net> 12/18/2007 6:07 AM >>>


  NEVER use autolearn for Bayes.  Autolearn = most evil.

Re: [guinevere-discuss] Lint errors in 3.4

Posted by Joe Zitnik <JZ...@hfcc.net>.
>>> On 12/18/2007 at 10:02 AM, Matt Kettler <mk...@verizon.net>
wrote:
Joe Zitnik wrote:
> >>> On 12/18/2007 at 9:00 AM, Matt Kettler <mk...@verizon.net>
> wrote:
> Clay Davis wrote:
> > I've see several people write this.  Can someone point me to some
debate
> > I can review?  It seems to me that if you set the autolearn
threshold
> > fairly high and keep any eye on your bayes scoring, it would be a
good
> > thing.
> >  
> IMHO, autolearning is a good thing. However, exclusively
autolearning
> without ever providing any manual training is a situation that can
to
> lead to a mislearning disaster. The autolearner is most vulnerable
when
> it has to make judgments and there's no existing training to compare
> against.
>
> It's probably bad experience with that effect which has caused such
> gross over-reactions.
>
>
> You're exactly right, and in numerous posts on that forum, I've
stated
> exactly that.  On at least three different occasions, I have had to
> scrap my bayes database, and resend all e-mail received within a
given
> period because my bayes database became corrupted, either one way or
> the other.  In the years since that has happened, I have manually
fed
> bayes, and between the rules I have added, and some additional
> plugins, not only have I never had that issue again, my spam
catching
> is at an all time high.  All by taking a few minutes every week to
> feed the spam in that's making past the filters.  What may be a
gross
> over-reaction to you seems perfectly sensible to me.  I'm sure there
> are people who have great success with it, but for me, it was
NOTHING
> but trouble.  Mine is not the only story that I have read that has
had
> exactly the same results.
Well, if you had trouble exclusively autolearning with no manual
training. Perhaps the solution is to start using manual training in
addition to autolearning.

Also, generally speaking, you hear about the problems, but rarely hear
about the non-problems.

I've had autolearning enabled on the same bayes database I've been
using
since the bayes feature was introduced in SpamAssassin 2.50 back in
February of 2003. I've never had to scrap my bayes database. Not once.
I'm still using the same database (with a couple format conversions
during various upgrades) that I pre-initialized with several hundred
hand-picked messages.

My only variation is that somewhere around SA 3.0 (Sept 2004) I
lowered
the bayes_auto_learn_threshold_nonspam from the default to -0.001, and
added some rules with -0.001 scores that key off industry keywords.
This
was largely a precautionary measure, but I felt a positive-score for
this option was potentially dangerous. This is especially true if you
let your SA version get a little stale, as it becomes less effective
over time and spam is more likely to hit a 0 score. I wasn't having
any
troubles prior to my change, I was just being paranoid because I knew
I
was letting my SA version slip sometimes, and never switched back.

YMMV, but on an otherwise well maintained SA and bayes database,
auto-learning seems to work just fine.






I never exclusively autolearned, just to get bayes working requires
some manual feeding.  The first time my bayes blew up it had been
running fine for over six months.  During that time I manually fed in
thousands of spam and ham.  The second time it may have even been
longer.  The third time I'll take responsibility for, I had it shut off,
but an upgrade overwrote the value and turned it back on.
It's there for a reason, and much smarter men than me are responsible
for the spamassassin project, so I have to image large numbers of people
have had success with it.  Once again, from my vantage point, I was
burned three different times with it, so I don't use it.

Re: [guinevere-discuss] Lint errors in 3.4

Posted by Matt Kettler <mk...@verizon.net>.
Joe Zitnik wrote:
> >>> On 12/18/2007 at 9:00 AM, Matt Kettler <mk...@verizon.net>
> wrote:
> Clay Davis wrote:
> > I've see several people write this.  Can someone point me to some debate
> > I can review?  It seems to me that if you set the autolearn threshold
> > fairly high and keep any eye on your bayes scoring, it would be a good
> > thing.
> >  
> IMHO, autolearning is a good thing. However, exclusively autolearning
> without ever providing any manual training is a situation that can to
> lead to a mislearning disaster. The autolearner is most vulnerable when
> it has to make judgments and there's no existing training to compare
> against.
>
> It's probably bad experience with that effect which has caused such
> gross over-reactions.
>
>
> You're exactly right, and in numerous posts on that forum, I've stated
> exactly that.  On at least three different occasions, I have had to
> scrap my bayes database, and resend all e-mail received within a given
> period because my bayes database became corrupted, either one way or
> the other.  In the years since that has happened, I have manually fed
> bayes, and between the rules I have added, and some additional
> plugins, not only have I never had that issue again, my spam catching
> is at an all time high.  All by taking a few minutes every week to
> feed the spam in that's making past the filters.  What may be a gross
> over-reaction to you seems perfectly sensible to me.  I'm sure there
> are people who have great success with it, but for me, it was NOTHING
> but trouble.  Mine is not the only story that I have read that has had
> exactly the same results.
Well, if you had trouble exclusively autolearning with no manual
training. Perhaps the solution is to start using manual training in
addition to autolearning.

Also, generally speaking, you hear about the problems, but rarely hear
about the non-problems.

I've had autolearning enabled on the same bayes database I've been using
since the bayes feature was introduced in SpamAssassin 2.50 back in
February of 2003. I've never had to scrap my bayes database. Not once.
I'm still using the same database (with a couple format conversions
during various upgrades) that I pre-initialized with several hundred
hand-picked messages.

My only variation is that somewhere around SA 3.0 (Sept 2004) I lowered
the bayes_auto_learn_threshold_nonspam from the default to -0.001, and
added some rules with -0.001 scores that key off industry keywords. This
was largely a precautionary measure, but I felt a positive-score for
this option was potentially dangerous. This is especially true if you
let your SA version get a little stale, as it becomes less effective
over time and spam is more likely to hit a 0 score. I wasn't having any
troubles prior to my change, I was just being paranoid because I knew I
was letting my SA version slip sometimes, and never switched back.

YMMV, but on an otherwise well maintained SA and bayes database,
auto-learning seems to work just fine.






Re: [guinevere-discuss] Lint errors in 3.4

Posted by Joe Zitnik <JZ...@hfcc.net>.
>>> On 12/18/2007 at 9:00 AM, Matt Kettler <mk...@verizon.net>
wrote:
Clay Davis wrote:
> I've see several people write this.  Can someone point me to some
debate
> I can review?  It seems to me that if you set the autolearn
threshold
> fairly high and keep any eye on your bayes scoring, it would be a
good
> thing.
>   
IMHO, autolearning is a good thing. However, exclusively autolearning
without ever providing any manual training is a situation that can to
lead to a mislearning disaster. The autolearner is most vulnerable
when
it has to make judgments and there's no existing training to compare
against.

It's probably bad experience with that effect which has caused such
gross over-reactions.


You're exactly right, and in numerous posts on that forum, I've stated
exactly that.  On at least three different occasions, I have had to
scrap my bayes database, and resend all e-mail received within a given
period because my bayes database became corrupted, either one way or the
other.  In the years since that has happened, I have manually fed bayes,
and between the rules I have added, and some additional plugins, not
only have I never had that issue again, my spam catching is at an all
time high.  All by taking a few minutes every week to feed the spam in
that's making past the filters.  What may be a gross over-reaction to
you seems perfectly sensible to me.  I'm sure there are people who have
great success with it, but for me, it was NOTHING but trouble.  Mine is
not the only story that I have read that has had exactly the same
results.

Re: [guinevere-discuss] Lint errors in 3.4

Posted by Matt Kettler <mk...@verizon.net>.
Clay Davis wrote:
> I've see several people write this.  Can someone point me to some debate
> I can review?  It seems to me that if you set the autolearn threshold
> fairly high and keep any eye on your bayes scoring, it would be a good
> thing.
>   
IMHO, autolearning is a good thing. However, exclusively autolearning
without ever providing any manual training is a situation that can to
lead to a mislearning disaster. The autolearner is most vulnerable when
it has to make judgments and there's no existing training to compare
against.

It's probably bad experience with that effect which has caused such
gross over-reactions.


Re: [guinevere-discuss] Lint errors in 3.4

Posted by Jim Maul <jm...@elih.org>.
Clay Davis wrote:
> I've see several people write this.  Can someone point me to some debate
> I can review?  It seems to me that if you set the autolearn threshold
> fairly high and keep any eye on your bayes scoring, it would be a good
> thing.
> 
> Thanks,
> Clay
> 
>>>> "Joe Zitnik" <jz...@hfcc.net> 12/18/2007 6:07 AM >>>
> 
> 
>   NEVER use autolearn for Bayes.  Autolearn = most evil.
> 
> 


I never had a problem with autolearn.  I've been using it for years.  Of 
course, i altered the autolearn thresholds.

-Jim