You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by da...@chaosreigns.com on 2012/11/09 17:58:21 UTC

Re-use bayes in masscheck?

I think these should be added to the rules:

reuse BAYES_00
reuse BAYES_05
reuse BAYES_20
reuse BAYES_40
reuse BAYES_50
reuse BAYES_60
reuse BAYES_80
reuse BAYES_95
reuse BAYES_99

Recently playing around a little with bayes stuff, I noticed there is no
data for these in http://ruleqa.spamassassin.org/?rule=%2Fbayes

Then I realized to actually test bayes with masscheck, I needed to copy my
bayes dbs into masses/spamassassin (which possibly should've been more
obvious to me).

Then it occurred to me it would probably suck to put everybody's CPUs
through regenerating all the bayes scores, and that's pretty close to what
reuse is for.  

-- 
"Let's just say that if complete and utter chaos was lightning, then
he'd be the sort to stand on a hilltop in a thunderstorm wearing wet
copper armour and shouting 'All gods are bastards'." - The Color of Magic
http://www.ChaosReigns.com

Re: Re-use bayes in masscheck?

Posted by Axb <ax...@gmail.com>.
On 11/09/2012 09:11 PM, Kevin A. McGrail wrote:
> On 11/9/2012 2:50 PM, darxus@chaosreigns.com wrote:
>> On 11/09, Axb wrote:
>>> On 11/09/2012 05:58 PM, darxus@chaosreigns.com wrote:
>>>> I think these should be added to the rules:
>>>>
>>>> reuse BAYES_00
>>>> reuse BAYES_05
>>>> reuse BAYES_20
>>>> reuse BAYES_40
>>>> reuse BAYES_50
>>>> reuse BAYES_60
>>>> reuse BAYES_80
>>>> reuse BAYES_95
>>>> reuse BAYES_99
>>>>
>>>> Which is why I suggested using the "reuse" flag, not re-running bayes
>>>> during masscheck.
>>>>
>
> Makes sense to me.  Might also speed up masscheck considerably. There
> isn't a good place for these beyond 50_scores.cf, is there?

IMO, they should be added to the masses section so they shoupd up in
/trunk/masses/spamassassin/ or whereve rthe prefences show up at 
masscheck runs and not clutter 50_scores


Re: Re-use bayes in masscheck?

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.
On 11/9/2012 2:50 PM, darxus@chaosreigns.com wrote:
> On 11/09, Axb wrote:
>> On 11/09/2012 05:58 PM, darxus@chaosreigns.com wrote:
>>> I think these should be added to the rules:
>>>
>>> reuse BAYES_00
>>> reuse BAYES_05
>>> reuse BAYES_20
>>> reuse BAYES_40
>>> reuse BAYES_50
>>> reuse BAYES_60
>>> reuse BAYES_80
>>> reuse BAYES_95
>>> reuse BAYES_99
>>>
>>> Which is why I suggested using the "reuse" flag, not re-running bayes
>>> during masscheck.
>>>

Makes sense to me.  Might also speed up masscheck considerably. There 
isn't a good place for these beyond 50_scores.cf, is there?

Re: Re-use bayes in masscheck?

Posted by da...@chaosreigns.com.
On 11/09, Axb wrote:
> releases, as part of final QA but bayes scores shouldn't be mutable.

They aren't, and wouldn't be as a result of adding the "reuse" flag,
because they're not in a "<gen:mutable></gen:mutable>" block.  And I
certainly wouldn't suggest changing that without at least seeing some data
first, and a test run, and... the rescorer doesn't handle sets of rules
that should have scores in a specific order well anyway.

-- 
"Some people will tell you that slow is good - and it may be, on some
days - but I am here to tell you that fast is better....
That is why God made fast motorcycles...." - Hunter S. Thompson
http://www.ChaosReigns.com

Re: Re-use bayes in masscheck?

Posted by Axb <ax...@gmail.com>.
On 11/09/2012 09:21 PM, darxus@chaosreigns.com wrote:
> On 11/09, Axb wrote:
>>> I realize some of the corpora won't have the bayes data, including most of
>>> mine.  But I don't see how that's a reason not to provide the data that has
>>> already been calculated to ruleqa.
>>
>> coz chances are it's skewed data are huge?
>>
>> imo disabling bayes at masscheck would be the safest way to go &
>> Bayes should be site dependent during production and should not be
>> used for score generation.
>
> Doesn't score generation for the sets without bayes automatically drop
> bayes from score generation?

honestly, I'm not sure, but if it were up to me masschecks would run 
without Bayes and we'd need a dedicated bayes rescorer before releases, 
as part of final QA but bayes scores shouldn't be mutable.

> (Sorry for taking this back to the list.  I'm hoping you don't mind, and if
> there's a chance this could screw up score generation....)

no prob


Re: Re-use bayes in masscheck?

Posted by da...@chaosreigns.com.
On 11/09, Axb wrote:
> >I realize some of the corpora won't have the bayes data, including most of
> >mine.  But I don't see how that's a reason not to provide the data that has
> >already been calculated to ruleqa.
> 
> coz chances are it's skewed data are huge?
> 
> imo disabling bayes at masscheck would be the safest way to go &
> Bayes should be site dependent during production and should not be
> used for score generation.

Doesn't score generation for the sets without bayes automatically drop
bayes from score generation?  

Er... how does score generation work for the sets *with* bayes if ruleqa
isn't getting any bayes data?

(Sorry for taking this back to the list.  I'm hoping you don't mind, and if
there's a chance this could screw up score generation....)

-- 
"Begin at the beginning and go on till you come to the end; then stop."
- Lewis Carrol, Alice in Wonderland
http://www.ChaosReigns.com

Re: Re-use bayes in masscheck?

Posted by da...@chaosreigns.com.
On 11/09, Axb wrote:
> On 11/09/2012 05:58 PM, darxus@chaosreigns.com wrote:
> >I think these should be added to the rules:
> >
> >reuse BAYES_00
> >reuse BAYES_05
> >reuse BAYES_20
> >reuse BAYES_40
> >reuse BAYES_50
> >reuse BAYES_60
> >reuse BAYES_80
> >reuse BAYES_95
> >reuse BAYES_99
> >
> >Recently playing around a little with bayes stuff, I noticed there is no
> >data for these in http://ruleqa.spamassassin.org/?rule=%2Fbayes
> >
> >Then I realized to actually test bayes with masscheck, I needed to copy my
> >bayes dbs into masses/spamassassin (which possibly should've been more
> >obvious to me).
> >
> >Then it occurred to me it would probably suck to put everybody's CPUs
> >through regenerating all the bayes scores, and that's pretty close to what
> >reuse is for.
> 
> Imo, we should't be using any Bayes during masschecks.
> It slows up the process, depending on the corpus it can produce a
> totally distorted results.

Which is why I suggested using the "reuse" flag, not re-running bayes
during masscheck.

-- 
"You shall know the truth, and it shall make you odd."
-- Flannery O'Connor
http://www.ChaosReigns.com

Re: Re-use bayes in masscheck?

Posted by Axb <ax...@gmail.com>.
On 11/09/2012 05:58 PM, darxus@chaosreigns.com wrote:
> I think these should be added to the rules:
>
> reuse BAYES_00
> reuse BAYES_05
> reuse BAYES_20
> reuse BAYES_40
> reuse BAYES_50
> reuse BAYES_60
> reuse BAYES_80
> reuse BAYES_95
> reuse BAYES_99
>
> Recently playing around a little with bayes stuff, I noticed there is no
> data for these in http://ruleqa.spamassassin.org/?rule=%2Fbayes
>
> Then I realized to actually test bayes with masscheck, I needed to copy my
> bayes dbs into masses/spamassassin (which possibly should've been more
> obvious to me).
>
> Then it occurred to me it would probably suck to put everybody's CPUs
> through regenerating all the bayes scores, and that's pretty close to what
> reuse is for.

Imo, we should't be using any Bayes during masschecks.
It slows up the process, depending on the corpus it can produce a 
totally distorted results.