You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Axb <ax...@gmail.com> on 2012/07/04 09:32:01 UTC

high scores on HDRS_LCASE,MANY_HDRS_LCASE > FPs

from last update's 72_scores.cf

score HDRS_LCASE                            3.749 3.999 3.749 3.999
score MANY_HDRS_LCASE                       1.251 1.004 1.251 1.004

Although John manually set low scores in the sandbox file, these are 
ignored (per design).

Fixed/forced scores should be set via 73_sandbox_manual_scores.cf and 
not in sanbox files

They have comment:
# observed in UCE 9/2009

As they are hitting lots of ham, can we please loose these.

HDRS_LCASE_IMGONLY may be another candidate to be dropped.


Thanks

Axb.

Re: high scores on HDRS_LCASE,MANY_HDRS_LCASE > FPs

Posted by Axb <ax...@gmail.com>.
On 07/04/2012 04:40 PM, John Hardin wrote:
> On Wed, 4 Jul 2012, Axb wrote:
>
>> from last update's 72_scores.cf
>>
>> score HDRS_LCASE                            3.749 3.999 3.749 3.999
>> score MANY_HDRS_LCASE                       1.251 1.004 1.251 1.004
>>
>> Although John manually set low scores in the sandbox file, these are
>> ignored (per design).
>
> They are _limits_. The generator should not exceed those scores. The
> newly limited scores may take a bit to show up in an update.

I'll watch those scores closely.

>> Fixed/forced scores should be set via 73_sandbox_manual_scores.cf and
>> not in sanbox files
>>
>> They have comment:
>> # observed in UCE 9/2009
>>
>> As they are hitting lots of ham, can we please loose these.
>>
>> HDRS_LCASE_IMGONLY may be another candidate to be dropped.
>
> Alex, I don't recall if you're running masschecks; if you are, can you
> include such FPs in your ham corpus? The reason they're being scored so
> highly by the rescorer is they do perform well against the masscheck
> corpus.

I am running masschecks but these hits I see on msgs  (maillog) 
gatewayed thru $dayjob's boxes - not stuff stored locally.

I understand, a lot of rules may perform well in masschecks but overall 
generic patterns should be dropped if we detect that real world traffic 
shows they're dangerous.

Imo, we should be able to trust our traffic & judgement more than 
masscheck corpuses which may be highly biased.

Axb


Re: high scores on HDRS_LCASE,MANY_HDRS_LCASE > FPs

Posted by John Hardin <jh...@impsec.org>.
On Wed, 4 Jul 2012, Axb wrote:

> from last update's 72_scores.cf
>
> score HDRS_LCASE                            3.749 3.999 3.749 3.999
> score MANY_HDRS_LCASE                       1.251 1.004 1.251 1.004
>
> Although John manually set low scores in the sandbox file, these are ignored 
> (per design).

They are _limits_. The generator should not exceed those scores. The newly 
limited scores may take a bit to show up in an update.

> Fixed/forced scores should be set via 73_sandbox_manual_scores.cf and not in 
> sanbox files
>
> They have comment:
> # observed in UCE 9/2009
>
> As they are hitting lots of ham, can we please loose these.
>
> HDRS_LCASE_IMGONLY may be another candidate to be dropped.

Alex, I don't recall if you're running masschecks; if you are, can you 
include such FPs in your ham corpus? The reason they're being scored so 
highly by the rescorer is they do perform well against the 
masscheck corpus.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Ignorance is no excuse for a law.
-----------------------------------------------------------------------
  Today: the 236th anniversary of the Declaration of Independence