You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Giovanni Bechis <gi...@paclan.it> on 2022/02/07 17:32:18 UTC

bayes_auto_learn default value

Hi,
as per Mail::SpamAssassin::Conf(3), bayes_auto_learn defaults to 1/true.
Is anybody against changing its default value to 0/false on trunk (aka SpamAssassin 4.x) ?

 Cheers
  Giovanni

Re: bayes_auto_learn default value

Posted by Henrik K <he...@hege.li>.
On Tue, Feb 08, 2022 at 05:33:59AM -0500, Kevin A. McGrail wrote:
> 
> Since we don't seem to have consensus on changing the default does anybody
> object to a pre-file that disables it? That would be more clearly documented in
> people will look at the pre-file for V4.

Good grief, bundled pre-files are not meant for config clauses.  They are
supposed to load modules.  Defaults are to be changed in the _codebase_
(Conf.pm) and not in a pre-file, which might or might not be loaded by
someone.  It makes no sense to have defaults in two separate places.

-1 for default change anyway, why bother.


Re: bayes_auto_learn default value

Posted by Bill Cole <sa...@billmail.scconsult.com>.
On 2022-02-08 at 07:46:17 UTC-0500 (Tue, 8 Feb 2022 13:46:17 +0100)
Axb <ax...@gmail.com>
is rumored to have said:

> On 2/8/22 11:33, Kevin A. McGrail wrote:
>> Auto learning is something that should never of existed. All it does is
>> reinforce misclassification and slowly spirals the database into having
>> wrong answers be more wrong.
>
> I don't agree - I've been running autoloearn for years and my bayes results have always been solid.
> (and I'm speaking of a global bayes redis DB in a 200k user setup)

With substantially smaller systems (my own personal server and those I manage for my employer) I have the same benign experience. I don't think we should disable auto-learn by default *in any way* without actual research and hard data beyond anecdotal experience.


> Where I see potential is in optimizing auto expiration when using a file based DB. Very often DB is locked and tokens cannot be expired which leads to what you call "reinforce misclassification". If tokens are expired regularly, skewing is very improbable.
> Thankfully, using Redis, it's way more controllable.

I think that's also not a problem for systems that are not persistently loaded with in-process mail.

All we see as SA maintainers are our own systems and cases that people are having problems with. I don't think we really know whether auto-learn works well generally or why/how it breaks when it does.

>> Since we don't seem to have consensus on changing the default does anybody
>> object to a pre-file that disables it? That would be more clearly
>> documented in people will look at the pre-file for V4.
>
> I'm -1 for disabling, one way or another.

Same. It would substantially change how peoples' existing stable systems operate.

I'm less averse to tweaking default auto-learning parameters. In ALL cases where I use auto-learn I have reduced both thresholds, so I learn as ham ONLY mail with negative scores (< -0.1, so effectively at least 2 ham-signs...) and learn as spam substantially more than just the absurdly spammy stuff. This sacrifices some overall effectiveness in theory but I think it also helps make Bayes less brittle. I have NOT done rigorous testing to prove that.

I believe that SA has reached the point of broad use where we should be making substantial change decisions based on hard data rather than anecdote and lore.

-- 
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire

Re: bayes_auto_learn default value

Posted by Axb <ax...@gmail.com>.
On 2/8/22 11:33, Kevin A. McGrail wrote:
> Auto learning is something that should never of existed. All it does is
> reinforce misclassification and slowly spirals the database into having
> wrong answers be more wrong.

I don't agree - I've been running autoloearn for years and my bayes 
results have always been solid.
(and I'm speaking of a global bayes redis DB in a 200k user setup)

Where I see potential is in optimizing auto expiration when using a file 
based DB. Very often DB is locked and tokens cannot be expired which 
leads to what you call "reinforce misclassification". If tokens are 
expired regularly, skewing is very improbable.
Thankfully, using Redis, it's way more controllable.

> Since we don't seem to have consensus on changing the default does anybody
> object to a pre-file that disables it? That would be more clearly
> documented in people will look at the pre-file for V4.

I'm -1 for disabling, one way or another.


> Regards, KAM
> 
> On Tue, Feb 8, 2022, 04:43 Giovanni Bechis <gi...@paclan.it> wrote:
> 
>> On 2/7/22 20:03, Henrik K wrote:
>>>
>>> On Mon, Feb 07, 2022 at 06:32:18PM +0100, Giovanni Bechis wrote:
>>>> Hi,
>>>> as per Mail::SpamAssassin::Conf(3), bayes_auto_learn defaults to 1/true.
>>>> Is anybody against changing its default value to 0/false on trunk (aka
>> SpamAssassin 4.x) ?
>>>
>>> What is the reasoning for this proposal?
>>>
>> IMHO using autolearn without a correct learning process frequently poisons
>> bayes data, I think bayes_auto_learn should be enabled only if you know
>> what you are doing and not by default.
>> I understand that changing a default value now could be a problem for
>> users.
>>   Giovanni
>>
> 


Re: bayes_auto_learn default value

Posted by John Hardin <jh...@impsec.org>.
On Tue, 8 Feb 2022, Kevin A. McGrail wrote:

> Since we don't seem to have consensus on changing the default does anybody
> object to a pre-file that disables it? That would be more clearly
> documented in people will look at the pre-file for V4.

+1 for making it explicitly disabled in the v4.0 PRE file.

I was going to respond "I have no objections" to the initial request but 
I've been recovering from a hardware failure in my mail server... :(


> Regards, KAM
>
> On Tue, Feb 8, 2022, 04:43 Giovanni Bechis <gi...@paclan.it> wrote:
>
>> On 2/7/22 20:03, Henrik K wrote:
>>>
>>> On Mon, Feb 07, 2022 at 06:32:18PM +0100, Giovanni Bechis wrote:
>>>> Hi,
>>>> as per Mail::SpamAssassin::Conf(3), bayes_auto_learn defaults to 1/true.
>>>> Is anybody against changing its default value to 0/false on trunk (aka
>> SpamAssassin 4.x) ?
>>>
>>> What is the reasoning for this proposal?
>>>
>> IMHO using autolearn without a correct learning process frequently poisons
>> bayes data, I think bayes_auto_learn should be enabled only if you know
>> what you are doing and not by default.
>> I understand that changing a default value now could be a problem for
>> users.
>>  Giovanni
>>
>

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org                         pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   The yardstick you should use when considering whether to support a
   given piece of legislation is "what if my worst enemy is chosen to
   administer this law?"
-----------------------------------------------------------------------
  74 more days working to pay your (average) annual US tax bill
  before you're finally working for yourself.

Re: bayes_auto_learn default value

Posted by "Kevin A. McGrail" <km...@apache.org>.
Auto learning is something that should never of existed. All it does is
reinforce misclassification and slowly spirals the database into having
wrong answers be more wrong.

Since we don't seem to have consensus on changing the default does anybody
object to a pre-file that disables it? That would be more clearly
documented in people will look at the pre-file for V4.

Regards, KAM

On Tue, Feb 8, 2022, 04:43 Giovanni Bechis <gi...@paclan.it> wrote:

> On 2/7/22 20:03, Henrik K wrote:
> >
> > On Mon, Feb 07, 2022 at 06:32:18PM +0100, Giovanni Bechis wrote:
> >> Hi,
> >> as per Mail::SpamAssassin::Conf(3), bayes_auto_learn defaults to 1/true.
> >> Is anybody against changing its default value to 0/false on trunk (aka
> SpamAssassin 4.x) ?
> >
> > What is the reasoning for this proposal?
> >
> IMHO using autolearn without a correct learning process frequently poisons
> bayes data, I think bayes_auto_learn should be enabled only if you know
> what you are doing and not by default.
> I understand that changing a default value now could be a problem for
> users.
>  Giovanni
>

Re: bayes_auto_learn default value

Posted by Giovanni Bechis <gi...@paclan.it>.
On 2/7/22 20:03, Henrik K wrote:
> 
> On Mon, Feb 07, 2022 at 06:32:18PM +0100, Giovanni Bechis wrote:
>> Hi,
>> as per Mail::SpamAssassin::Conf(3), bayes_auto_learn defaults to 1/true.
>> Is anybody against changing its default value to 0/false on trunk (aka SpamAssassin 4.x) ?
> 
> What is the reasoning for this proposal?
> 
IMHO using autolearn without a correct learning process frequently poisons bayes data, I think bayes_auto_learn should be enabled only if you know what you are doing and not by default.
I understand that changing a default value now could be a problem for users.
 Giovanni

Re: bayes_auto_learn default value

Posted by Henrik K <he...@hege.li>.
On Mon, Feb 07, 2022 at 06:32:18PM +0100, Giovanni Bechis wrote:
> Hi,
> as per Mail::SpamAssassin::Conf(3), bayes_auto_learn defaults to 1/true.
> Is anybody against changing its default value to 0/false on trunk (aka SpamAssassin 4.x) ?

What is the reasoning for this proposal?


Re: bayes_auto_learn default value

Posted by Bill Cole <sa...@billmail.scconsult.com>.
On 2022-02-07 at 12:32:18 UTC-0500 (Mon, 7 Feb 2022 18:32:18 +0100)
Giovanni Bechis <gi...@paclan.it>
is rumored to have said:

> Hi,
> as per Mail::SpamAssassin::Conf(3), bayes_auto_learn defaults to 1/true.
> Is anybody against changing its default value to 0/false on trunk (aka SpamAssassin 4.x) ?

-1

I know that there is a broad consensus among people who pay close attention to SA that auto-learning is risky, but having it enabled has been the default for long enough that a change will be unexpected and will break systems where auto-learning is enabled, is working well, and is generally ignored.

I should probably disclose that I have auto-learn enabled on my personal system and on those of my primary employer and it has been quite harmless, although I cannot say definitively that it is making a significant difference. The biggest strength I see from it is a steady stream of ham, which is hard to obtain otherwise.



-- 
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire

Re: bayes_auto_learn default value

Posted by Michael Peddemors <mi...@linuxmagic.com>.
On 2022-02-07 9:44 a.m., Axb wrote:
> On 2/7/22 18:32, Giovanni Bechis wrote:
>> Hi,
>> as per Mail::SpamAssassin::Conf(3), bayes_auto_learn defaults to 1/true.
>> Is anybody against changing its default value to 0/false on trunk (aka 
>> SpamAssassin 4.x) ?
>>
>>   Cheers
>>    Giovanni
> 
> I'm against changing to 0 / false
> 
> Axb

We override this and set to FALSE in our packaging for the record.



-- 
"Catch the Magic of Linux..."
------------------------------------------------------------------------
Michael Peddemors, President/CEO LinuxMagic Inc.
Visit us at http://www.linuxmagic.com @linuxmagic
A Wizard IT Company - For More Info http://www.wizard.ca
"LinuxMagic" a Registered TradeMark of Wizard Tower TechnoServices Ltd.
------------------------------------------------------------------------
604-682-0300 Beautiful British Columbia, Canada

This email and any electronic data contained are confidential and intended
solely for the use of the individual or entity to which they are addressed.
Please note that any views or opinions presented in this email are solely
those of the author and are not intended to represent those of the company.

Re: bayes_auto_learn default value

Posted by Axb <ax...@gmail.com>.
On 2/7/22 18:32, Giovanni Bechis wrote:
> Hi,
> as per Mail::SpamAssassin::Conf(3), bayes_auto_learn defaults to 1/true.
> Is anybody against changing its default value to 0/false on trunk (aka SpamAssassin 4.x) ?
> 
>   Cheers
>    Giovanni

I'm against changing to 0 / false

Axb