You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by joe a <jo...@j4computers.com> on 2023/02/28 16:37:34 UTC

BAYES scores

Curious as to why these scores, apparently "stock" are what they are. 
I'd expect BAYES_999 BODY to count more than BAYES_99 BODY.

Noted in a header this morning:

*  3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
*      [score: 1.0000]
*  0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%
*      [score: 1.0000]

Was this discussed recently?  I added a local score to mollify my sense 
of propriety.



Re: BAYES scores

Posted by Loren Wilton <lw...@earthlink.net>.
> From: "Bill Cole" <sa...@billmail.scconsult.com>
>
> It is my understanding that an automated rescoring job was run quite some 
> time ago (before I was on the PMC) to generate the Bayes scores, which 
> determined that to be the best supplemental score to give to the greater 
> certainty.

I was around in those days. My memory isn't the greatest anymore, but what I 
recall was that they did automatic rescoring, and then manually tweaked a 
few of the values, basically to make them look pretty by rounding off long 
fractions. BAYES_999 may have been scored almost completely manually, I 
can't quite recall.

        Loren


Re: BAYES scores

Posted by Bill Cole <sa...@billmail.scconsult.com>.
On 2023-02-28 at 13:38:35 UTC-0500 (Tue, 28 Feb 2023 13:38:35 -0500)
joe a <jo...@j4computers.com>
is rumored to have said:

> On 2/28/2023 12:05 PM, Jeff Mincy wrote:
>>   > From: joe a <jo...@j4computers.com>
>>   > Date: Tue, 28 Feb 2023 11:37:34 -0500
>>   >
>>   > Curious as to why these scores, apparently "stock" are what they 
>> are.
>>   > I'd expect BAYES_999 BODY to count more than BAYES_99 BODY.
>>   >
>>   > Noted in a header this morning:
>>   >
>>   > *  3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
>>   > *      [score: 1.0000]
>>   > *  0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%
>>   > *      [score: 1.0000]
>>   >
>>   > Was this discussed recently?  I added a local score to mollify my 
>> sense
>>   > of propriety.
>>
>> Those two rules overlap.   A message with bayes >= 99.9% hits both
>> rules.   BAYES_99 ends at 1.00 not .999.
>> -jeff
>>
>
> I get that they overlap.  I guess my thinker gets in a knot wondering 
> why there is so little weight given to the more certain determination.

It is my understanding that an automated rescoring job was run quite 
some time ago (before I was on the PMC) to generate the Bayes scores, 
which determined that to be the best supplemental score to give to the 
greater certainty. Bayes rules are not rescored routinely in the daily 
rescoring task because those hits are inherently different at every 
site. If you wish to determine the ideal scores for YOUR mix of ham and 
spam, I believe all the tools for doing so are in the SA code tree, but 
they may not be well-documented.

That's likely to not be a satisfying answer, but as a volunteer project 
we have no funding for Customer Satisfaction, so the bare unsatisfying 
truth will have to do.

> In my narrow view, anything that is 99.9% certain is probably worth a 
> 5 on it's own.  Or, at least should when, summed with BAYES_99, equal 
> 5. As that is what the default "SPAM flag" is.
>
> Appears more experienced or thoughtful persons think otherwise.

I don't know that I'd go that far. Rescoring is not done based on simple 
clear reason, but on numbers. I'm not sure whether any currently active 
SA developers are able to explain exactly how the rescoring works.

> Yes, it did snow heavily overnight.  Yes, I am looking for excuses not 
> to visit that issue.

I vehemently recommend reading all of Justin's scripts and documentation 
(I think it's all in the 'build' sub-directory) and figuring out how to 
rescore based on your own mail. That's MUCH less unpleasant than dealing 
with the snow.


-- 
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire

Re: BAYES scores

Posted by hg user <me...@gmail.com>.
From my small experience... I score BAYES_999 with 2.00, it was
suggested to me months ago.

But nowadays I'd be more careful and do some more testing: I'd check which
messages have only BAYES_99 and  which have BAYES_999, If you are
absolutely certain that BYES_999 are only and definitively spam, go with 2
or more; if you have several false positives, keep the score low.

I learnt the hard way that BAYES depends on the corpus used to grow the
database.

On Tue, Feb 28, 2023 at 7:39 PM joe a <jo...@j4computers.com> wrote:

> On 2/28/2023 12:05 PM, Jeff Mincy wrote:
> >   > From: joe a <jo...@j4computers.com>
> >   > Date: Tue, 28 Feb 2023 11:37:34 -0500
> >   >
> >   > Curious as to why these scores, apparently "stock" are what they are.
> >   > I'd expect BAYES_999 BODY to count more than BAYES_99 BODY.
> >   >
> >   > Noted in a header this morning:
> >   >
> >   > *  3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
> >   > *      [score: 1.0000]
> >   > *  0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%
> >   > *      [score: 1.0000]
> >   >
> >   > Was this discussed recently?  I added a local score to mollify my
> sense
> >   > of propriety.
> >
> > Those two rules overlap.   A message with bayes >= 99.9% hits both
> > rules.   BAYES_99 ends at 1.00 not .999.
> > -jeff
> >
>
> I get that they overlap.  I guess my thinker gets in a knot wondering
> why there is so little weight given to the more certain determination.
>
> In my narrow view, anything that is 99.9% certain is probably worth a 5
> on it's own.  Or, at least should when, summed with BAYES_99, equal 5.
> As that is what the default "SPAM flag" is.
>
> Appears more experienced or thoughtful persons think otherwise.
>
> Yes, it did snow heavily overnight.  Yes, I am looking for excuses not
> to visit that issue.
>

Re: BAYES scores

Posted by joe a <jo...@j4computers.com>.
On 2/28/2023 12:05 PM, Jeff Mincy wrote:
>   > From: joe a <jo...@j4computers.com>
>   > Date: Tue, 28 Feb 2023 11:37:34 -0500
>   >
>   > Curious as to why these scores, apparently "stock" are what they are.
>   > I'd expect BAYES_999 BODY to count more than BAYES_99 BODY.
>   >
>   > Noted in a header this morning:
>   >
>   > *  3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
>   > *      [score: 1.0000]
>   > *  0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%
>   > *      [score: 1.0000]
>   >
>   > Was this discussed recently?  I added a local score to mollify my sense
>   > of propriety.
> 
> Those two rules overlap.   A message with bayes >= 99.9% hits both
> rules.   BAYES_99 ends at 1.00 not .999.
> -jeff
> 

I get that they overlap.  I guess my thinker gets in a knot wondering 
why there is so little weight given to the more certain determination.

In my narrow view, anything that is 99.9% certain is probably worth a 5 
on it's own.  Or, at least should when, summed with BAYES_99, equal 5. 
As that is what the default "SPAM flag" is.

Appears more experienced or thoughtful persons think otherwise.

Yes, it did snow heavily overnight.  Yes, I am looking for excuses not 
to visit that issue.

Re: BAYES scores

Posted by Jeff Mincy <jw...@gmail.com>.
 > From: joe a <jo...@j4computers.com>
 > Date: Tue, 28 Feb 2023 11:37:34 -0500
 > 
 > Curious as to why these scores, apparently "stock" are what they are. 
 > I'd expect BAYES_999 BODY to count more than BAYES_99 BODY.
 > 
 > Noted in a header this morning:
 > 
 > *  3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
 > *      [score: 1.0000]
 > *  0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%
 > *      [score: 1.0000]
 > 
 > Was this discussed recently?  I added a local score to mollify my sense 
 > of propriety.

Those two rules overlap.   A message with bayes >= 99.9% hits both
rules.   BAYES_99 ends at 1.00 not .999.
-jeff


Re: BAYES scores

Posted by Benny Pedersen <me...@junc.eu>.
joe a skrev den 2023-02-28 17:37:
> Curious as to why these scores, apparently "stock" are what they are.
> I'd expect BAYES_999 BODY to count more than BAYES_99 BODY.
> 
> Noted in a header this morning:
> 
> *  3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
> *      [score: 1.0000]
> *  0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%
> *      [score: 1.0000]
> 
> Was this discussed recently?  I added a local score to mollify my
> sense of propriety.

what does it solve for you ?

maybe it could be changed to not overlap on scores, but what should 
scores change ?




Re: BAYES scores

Posted by Benny Pedersen <me...@junc.eu>.
joe a skrev den 2023-02-28 17:37:
> Curious as to why these scores, apparently "stock" are what they are.
> I'd expect BAYES_999 BODY to count more than BAYES_99 BODY.
> 
> Noted in a header this morning:
> 
> *  3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
> *      [score: 1.0000]
> *  0.2 BAYES_999 BODY: Bayes spam probability is 99.9 to 100%
> *      [score: 1.0000]
> 
> Was this discussed recently?  I added a local score to mollify my
> sense of propriety.

what does it solve for you ?

maybe it could be changed to not overlap on scores, but what should 
scores change ?

tag can be splited so it is not overlapping hits, but what should scores 
so change to ?