You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spamassassin.apache.org by John Hardin <jh...@impsec.org> on 2014/04/05 18:42:21 UTC

Perceptron/GA logic w/r/t low-scoring high-S/O rules?

Could someone who understands the scoring logic used by the perceptron or 
GA please comment on why this rule (and others like it) are only being 
scored at 0.01?

http://ruleqa.spamassassin.org/20140404-r1584563-n/T_DX_TEXT_02/detail

I would think that a rule which hits nothing but spam (S/O 1.00), and 
whose hits are 70% on spam scoring below 5 points, would be scored at 2 or 
3 points regardless of how many actual hits it gets...

Does it just take some time for the perceptron to get "primed" and start 
scoring rules once the corpora are of sufficient size? Because there are 
older rules with similar profiles that are being scored.

I've observed that a lot of high-S/O rules that hit well on low-scoring 
spam but that don't necessarily hit a lot of spam are assigned very low 
scores, such that they don't appear to help much in pushing those 
low-scoring spams towards the threshold. Many aren't being scored at all 
and thus aren't being published.

I haven't started digging into the scoring code yet; is there some bias 
based on the number of overall hits a rule gets, or the highest score on 
messages the rule hits, that would tend to impose a seemingly unreasonably 
low limit on the generated score?

I'd rather not have to resort to hitting the masscheck system over the 
head with the "tflags publish" cluebat, but I will if it keeps ignoring 
these rules.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   The difference is that Unix has had thirty years of technical
   types demanding basic functionality of it. And the Macintosh has
   had fifteen years of interface fascist users shaping its progress.
   Windows has the hairpin turns of the Microsoft marketing machine
   and that's all.                                    -- Red Drag Diva
-----------------------------------------------------------------------
  8 days until Thomas Jefferson's 271st Birthday

Re: Perceptron/GA logic w/r/t low-scoring high-S/O rules?

Posted by Axb <ax...@gmail.com>.

On 04/05/2014 06:59 PM, Axb wrote:
> If Darxus sees so much of this type, why isn't he running a masschecker?

opps. sorry- I hand't seen he is indeed participating.

Re: Perceptron/GA logic w/r/t low-scoring high-S/O rules?

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.

On 4/7/2014 11:03 AM, John Hardin wrote:
> On Mon, 7 Apr 2014, Kevin A. McGrail wrote:
>
>> On 4/5/2014 12:59 PM, Axb wrote:
>>>  On 04/05/2014 06:42 PM, John Hardin wrote:
>>> >  I'd rather not have to resort to hitting the masscheck system 
>>> over the
>>> >  head with the "tflags publish" cluebat, but I will if it keeps 
>>> ignoring
>>> >  these rules.
>>>
>>>  this would by very unwise and would create rule bloat as obviosuly the
>>>  corpus isn't seeing much spams with whatever pattern you'd wan to 
>>> publish.
>>
>> According to the wiki, the tflags publish is required to publish 
>> rules: rules without an explicit "tflags publish" line are never 
>> published
>>
>> http://wiki.apache.org/spamassassin/SaUpdateBackend
>
> Unless "tflags publish" is the default, that doesn't seem to be the 
> current behavior. Many of my rules do not have an explicit "tflags 
> publish" on them yet they are being published - for example, 
> TO_NO_BRKTS_MSFT
>
Can't disagree but pointing out that I use tflags publish because 
according to the docs you are supposed to...