You are viewing a plain text version of this content. The canonical link for it is here.
Posted to by Robert Menschel <> on 2004/04/16 03:44:51 UTC

Obsolete rules?

Looking at the results of the LW_BIG_AND_RED corpus run I just did, I
find a goodly number of rules that seem to be backfiring at this time,
using SA 2.63. I know that rules are being reevaluated for 3.0, and just
wanted to make sure people were aware of these.

Corpus includes 3 years ham, 4 months spam.

Because they stand out like a sore thumb, I checked into the TONER ham
hits -- they're all advertisements at the bottom of YahooGroups mailing
list emails.

Bob Menschel

Section 3 -- Frequencies Log
(First numeric frequencies, followed by percentage frequencies)

 111528    90720    20808    0.813   0.00    0.00  (all messages)
     16        8        8    0.187   0.01   0.85  EXCUSE_13
      4        2        2    0.187   0.01   0.07  REPLY_TO_EMPTY
      2        1        1    0.187   0.01   0.20  MSGID_THREESIXSIX
      2        1        1    0.187   0.01   1.87  FAKE_HELO_YAHOO
      2        1        1    0.187   0.01   2.90  VERB_UP_TO_OR_MORES
      2        1        1    0.187   0.01   1.18  MIME_BOUND_DASH_DIGIT
    369      156      213    0.144   0.00   2.00  BLANK_LINES_70_80
      5        2        3    0.133   0.00   2.12  FIND_ANYTHING
    171       64      107    0.121   0.00   1.64  BLANK_LINES_80_90
      3        1        2    0.103   0.00   2.38  FAKED_HOTMAIL_DAV
      7        2        5    0.084   0.00   2.90  FRIEND_AT_PUBLIC
    508      129      379    0.072   0.00   1.60  FROM_NO_LOWER
    339       86      253    0.072   0.00   1.12  EXTRA_MPART_TYPE
     60       14       46    0.065   0.00   0.72  FROM_AND_TO_SAME
   2320       87     2233    0.009   0.00   0.56  TONER
      9        1        8    0.028   0.00   2.70  BUGGY_CGI
    359        1      358    0.001   0.00   2.90  FAKE_HELO_BIGFOOT
     10        0       10    0.000   0.00   1.65  IDENT_NOBODY
      1        0        1    0.000   0.00   0.04  SUBJ_NOW_ONLY
      2        0        2    0.000   0.00   1.17  FAKE_HELO_HOTMAIL
     18        0       18    0.000   0.00   1.92  FAKE_HELO_AOL
     18        0       18    0.000   0.00   2.19  NO_RDNS_DOTCOM_HELO
      5        0        5    0.000   0.00   1.64  THE_FOLLOWING_FORM
      2        0        2    0.000   0.00   2.80  FAKE_HELO_USA_NET
      4        0        4    0.000   0.00   1.70  AOL_USERS_LINK

 111528    90720    20808    0.813   0.00    0.00  (all messages)
100.000  81.3428  18.6572    0.813   0.00    0.00  (all messages as %)
  0.014   0.0088   0.0384    0.187   0.01    0.85  EXCUSE_13
  0.004   0.0022   0.0096    0.187   0.01    0.07  REPLY_TO_EMPTY
  0.002   0.0011   0.0048    0.187   0.01    0.20  MSGID_THREESIXSIX
  0.002   0.0011   0.0048    0.187   0.01    1.87  FAKE_HELO_YAHOO
  0.002   0.0011   0.0048    0.187   0.01    2.90  VERB_UP_TO_OR_MORES
  0.002   0.0011   0.0048    0.187   0.01    1.18  MIME_BOUND_DASH_DIGIT
  0.331   0.1720   1.0236    0.144   0.00    2.00  BLANK_LINES_70_80
  0.004   0.0022   0.0144    0.133   0.00    2.12  FIND_ANYTHING
  0.153   0.0705   0.5142    0.121   0.00    1.64  BLANK_LINES_80_90
  0.003   0.0011   0.0096    0.103   0.00    2.38  FAKED_HOTMAIL_DAV
  0.006   0.0022   0.0240    0.084   0.00    2.90  FRIEND_AT_PUBLIC
  0.455   0.1422   1.8214    0.072   0.00    1.60  FROM_NO_LOWER
  0.304   0.0948   1.2159    0.072   0.00    1.12  EXTRA_MPART_TYPE
  0.054   0.0154   0.2211    0.065   0.00    0.72  FROM_AND_TO_SAME
  2.080   0.0959  10.7314    0.009   0.00    0.56  TONER
  0.008   0.0011   0.0384    0.028   0.00    2.70  BUGGY_CGI
  0.322   0.0011   1.7205    0.001   0.00    2.90  FAKE_HELO_BIGFOOT
  0.009   0.0000   0.0481    0.000   0.00    1.65  IDENT_NOBODY
  0.001   0.0000   0.0048    0.000   0.00    0.04  SUBJ_NOW_ONLY
  0.002   0.0000   0.0096    0.000   0.00    1.17  FAKE_HELO_HOTMAIL
  0.016   0.0000   0.0865    0.000   0.00    1.92  FAKE_HELO_AOL
  0.016   0.0000   0.0865    0.000   0.00    2.19  NO_RDNS_DOTCOM_HELO
  0.004   0.0000   0.0240    0.000   0.00    1.64  THE_FOLLOWING_FORM
  0.002   0.0000   0.0096    0.000   0.00    2.80  FAKE_HELO_USA_NET
  0.004   0.0000   0.0192    0.000   0.00    1.70  AOL_USERS_LINK