You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Robert Menschel <Ro...@Menschel.net> on 2004/04/16 03:37:22 UTC
Re[2]: possible HTML rules to delete
Hello Daniel,
Tuesday, April 13, 2004, 6:46:10 PM, you wrote:
DQ> Loren Wilton <lw...@earthlink.net> writes:
>> meta LW_BIG_AND_RED (HTML_FONT_BIG && HTML_FONTCOLOR_RED)
>> describe LW_BIG_AND_RED BIG RED TEXT
>> score LW_BIG_AND_RED 3
DQ> Someone with a corpus could certainly give it a shot. It's
DQ> speculative without a corpus run, though.
Corpus run with 2.63 distribution rules plus the above rule.
S/O 0.918 (compared to global 0.813), over 6% of spam, significant ham.
Section 3 -- Frequencies Log
(First numeric frequencies, followed by percentage frequencies)
OVERALL SPAM HAM S/O SCORE NAME
111528 90720 20808 0.813 0.00 0.00 (all messages)
22646 22625 21 0.996 1.00 0.75 BIZ_TLD
15721 15720 1 1.000 1.00 4.50 DATE_SPAMWARE_Y2K
3923 3923 0 1.000 0.98 3.61 SUBJ_ILLEGAL_CHARS
...
6281 6155 126 0.918 0.76 3.00 LW_BIG_AND_RED
...
15427 14932 495 0.874 0.67 0.27 HTML_FONT_BIG
...
9750 9425 325 0.869 0.66 0.10 HTML_FONTCOLOR_RED
...
OVERALL% SPAM% HAM% S/O RANK SCORE NAME
111528 90720 20808 0.813 0.00 0.00 (all messages)
100.000 81.3428 18.6572 0.813 0.00 0.00 (all messages as %)
20.305 24.9394 0.1009 0.996 1.00 0.75 BIZ_TLD
14.096 17.3280 0.0048 1.000 1.00 4.50 DATE_SPAMWARE_Y2K
3.518 4.3243 0.0000 1.000 0.98 3.61 SUBJ_ILLEGAL_CHARS
...
5.632 6.7846 0.6055 0.918 0.76 3.00 LW_BIG_AND_RED
...
13.832 16.4594 2.3789 0.874 0.67 0.27 HTML_FONT_BIG
...
8.742 10.3891 1.5619 0.869 0.66 0.10 HTML_FONTCOLOR_RED
Re: Re[2]: possible HTML rules to delete
Posted by Daniel Quinlan <qu...@pathname.com>.
Robert Menschel <Ro...@Menschel.net> writes:
> Corpus run with 2.63 distribution rules plus the above rule.
>
> S/O 0.918 (compared to global 0.813), over 6% of spam, significant ham.
Thanks. Can you run hit-frequencies as follows?
$ ./hit-frequences -xpa -M 'HTML_MESSAGE|__MIME_HTML' -m 'LW_BIG_AND_RED|HTML_FONT_BIG|HTML_FONTCOLOR_RED'
That will show the results for just HTML messages which is a more useful
of whether or not a rule is helpful. Anything below 0.500 is pretty
much not useful.
I think a better rule would combine color with font size all at once (so
it would have to be integrated into HTML.pm).
Daniel