You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Robert Menschel <Ro...@Menschel.net> on 2004/04/16 03:37:22 UTC

Re[2]: possible HTML rules to delete

Hello Daniel,

Tuesday, April 13, 2004, 6:46:10 PM, you wrote:

DQ> Loren Wilton <lw...@earthlink.net> writes:

>> meta  LW_BIG_AND_RED   (HTML_FONT_BIG && HTML_FONTCOLOR_RED)
>> describe LW_BIG_AND_RED   BIG RED TEXT
>> score  LW_BIG_AND_RED   3

DQ> Someone with a corpus could certainly give it a shot.  It's
DQ> speculative without a corpus run, though.

Corpus run with 2.63 distribution rules plus the above rule.

S/O 0.918 (compared to global 0.813), over 6% of spam, significant ham.

Section 3 -- Frequencies Log
(First numeric frequencies, followed by percentage frequencies)

OVERALL     SPAM      HAM     S/O   SCORE  NAME
 111528    90720    20808    0.813   0.00    0.00  (all messages)
  22646    22625       21    0.996   1.00   0.75  BIZ_TLD
  15721    15720        1    1.000   1.00   4.50  DATE_SPAMWARE_Y2K
   3923     3923        0    1.000   0.98   3.61  SUBJ_ILLEGAL_CHARS
                                                  ...
   6281     6155      126    0.918   0.76   3.00  LW_BIG_AND_RED
                                                  ...
  15427    14932      495    0.874   0.67   0.27  HTML_FONT_BIG
                                                  ...
   9750     9425      325    0.869   0.66   0.10  HTML_FONTCOLOR_RED
                                                  ...

OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
 111528    90720    20808    0.813   0.00    0.00  (all messages)
100.000  81.3428  18.6572    0.813   0.00    0.00  (all messages as %)
 20.305  24.9394   0.1009    0.996   1.00    0.75  BIZ_TLD
 14.096  17.3280   0.0048    1.000   1.00    4.50  DATE_SPAMWARE_Y2K
  3.518   4.3243   0.0000    1.000   0.98    3.61  SUBJ_ILLEGAL_CHARS
                                                   ...
  5.632   6.7846   0.6055    0.918   0.76    3.00  LW_BIG_AND_RED
                                                   ...
 13.832  16.4594   2.3789    0.874   0.67    0.27  HTML_FONT_BIG
                                                   ...
  8.742  10.3891   1.5619    0.869   0.66    0.10  HTML_FONTCOLOR_RED




Re: Re[2]: possible HTML rules to delete

Posted by Daniel Quinlan <qu...@pathname.com>.
Robert Menschel <Ro...@Menschel.net> writes:

> Corpus run with 2.63 distribution rules plus the above rule.
>
> S/O 0.918 (compared to global 0.813), over 6% of spam, significant ham.

Thanks.  Can you run hit-frequencies as follows?

  $ ./hit-frequences -xpa -M 'HTML_MESSAGE|__MIME_HTML' -m 'LW_BIG_AND_RED|HTML_FONT_BIG|HTML_FONTCOLOR_RED'

That will show the results for just HTML messages which is a more useful
of whether or not a rule is helpful.  Anything below 0.500 is pretty
much not useful.

I think a better rule would combine color with font size all at once (so
it would have to be integrated into HTML.pm).

Daniel