You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Robert Menschel <Ro...@Menschel.net> on 2004/02/17 15:52:42 UTC
Re[2]: possible bayes poison error to use against them?
Hello Loren,
Monday, February 16, 2004, 7:42:19 AM, you wrote:
LW> body DUMB_PERIODS /(?:.*\b[a-z]{3,10}[\.\!][a-z]{3,10}\b){6,30}/i
LW> describe DUMB_PERIODS Writer doesn't put spaces after periods.
LW> score DUMB_PERIODS 2.0 # not real high, can match source code listings
LW> This is UNTESTED, but might help. You can twiddle the score higher if
LW> nobody ever sends you code listings in mail. I'd really like to run this
LW> against a corpus and see how much ham it catches before putting it in my own
LW> configuration.
Results against my corpus:
DUMB_PERIODS -- 5029s/1518h of 100794 corpus (82099s/18695h) 02/16/04
DUMB_PERIODS -- suggested score: 0.184 (of 5.0)
OVERALL% SPAM% HAM% S/O RANK SCORE NAME
100794 82099 18695 0.815 0.00 0.00 (all messages)
100.000 81.4523 18.5477 0.815 0.00 0.00 (all messages as %)
6.495 6.1255 8.1198 0.430 0.00 2.00 DUMB_PERIODS
It matches 8% of my ham, and only 6% of my spam.
Bob Menschel
Re: Re[2]: possible bayes poison error to use against them?
Posted by Loren Wilton <lw...@earthlink.net>.
> It matches 8% of my ham, and only 6% of my spam.
>
> Bob Menschel
I think that can safely qualify that as a pretty bad rule! It might be
possible to tune it by twiddling the word length values some, but I doubt it
is worth the effort unless nothing better can be found. I don't have the
tools to do corpus checks with my tiny Linux machine, and I doubt that
anyone else would want to waste the hours fiddling with that to try to
improve it.
I *knew* there was a reason I didn't want to put it on my own machine...
:-)
BTW, I seem to be having some luck with a rule that checks for my email
address in the to and cc lists and looks to see if the optional name in
front of it is correct. I only gave this a couple points since it will
obviously fail on all mailing lists, but I still generally end up with a
negative score from bayes or whitelist. And it adds a couple of points to a
whole lot of real spams. Can't really say yet how worthwhile this rule is,
and it is certainly a difficult one to implement without user-specific
rules.
Loren