You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2004/11/30 01:54:22 UTC
Re: selected rulesets for better performance
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
mine were all discussions of spam :( doh! I'll have to remember --
*never* mark spam discussions as ham, even if you can't spot a spamsign.
- --j.
Theo Van Dinter writes:
> On Wed, Nov 24, 2004 at 01:19:49AM -0500, Matt Kettler wrote:
> > Quite frankly, I suspect corpus pollution. It really only takes 1 high
> > scoring spam in the nonspam corpus to really screw up the message scores.
>
> That's quite possible. I don't think anyone has 100% non-polluted corpus,
> though try we might. :(
>
> > 1) DRUGS_PAIN_OBFU actually hit some nonspam? I find that odd, but it could
> > be a typo.
>
> Looking at the submitted results:
>
> dave.log:. /home/dave/corpus/cooked-ham.43366468
> jm.log:. /home/jm/Mail/deld.priv/34675
> jm.log:. /home/jm/Mail/deld.priv/34682
> jm.log:. /home/jm/Mail/deld.priv/34699
> jm.log:. /home/jm/Mail/deld.priv/34703
> quinlan.log:. /home/corpus/mail/ham/166370
> quinlan.log:. /home/corpus/mail/ham/166400
> quinlan.log:. /home/corpus/mail/ham/166430
> quinlan.log:. /home/corpus/mail/ham/166437
>
> > 2) DRUGS_SMEAR1 hit some nonspam? I find that damn near impossible. I don't
> > think any nonspam email other than one quoting spam will ever hit that
> > rule. It seems there's one drug spam, or drug spam quote in somebody's
> > corpus, and it was run in all 4 sets. (If anyone can show me the nonspam
> > matching that rule and it's not spam or a spam quote or discussion of SA's
> > rules, I'll send em $20. Really.)
>
> jm.log:. /home/jm/Mail/deld.priv/26352
>
> > 4) NIGERIAN_BODY3? could be a finance newsletter, but very unlikely.
>
> That was mine:
>
> theo.log:Y ham/misc200405-200407.33861588
>
> Unfortunately I took those misc ham mboxes and converted them to dir
> format a while ago, so I don't know what message that was.
>
> > 6) PERCENT_RANDOM? Very unlikely. What would have %rnd_x in it?
>
> jm.log:. /home/jm/Mail/deld.pub/12701
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS
iD8DBQFBq8S+MJF5cimLx9ARAqIgAJ9cvW676a9p9lliRZwZIb79xDNnqwCgstps
ie+5pylFyumlfeFwt2kTRXA=
=cb4U
-----END PGP SIGNATURE-----