You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2004/11/30 01:54:22 UTC

Re: selected rulesets for better performance

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


mine were all discussions of spam :(   doh!  I'll have to remember --
*never* mark spam discussions as ham, even if you can't spot a spamsign.

- --j.

Theo Van Dinter writes:
> On Wed, Nov 24, 2004 at 01:19:49AM -0500, Matt Kettler wrote:
> > Quite frankly, I suspect corpus pollution. It really only takes 1 high 
> > scoring spam in the nonspam corpus to really screw up the message scores.
> 
> That's quite possible.  I don't think anyone has 100% non-polluted corpus,
> though try we might. :(
> 
> > 1) DRUGS_PAIN_OBFU actually hit some nonspam? I find that odd, but it could 
> > be a typo.
> 
> Looking at the submitted results:
> 
> dave.log:. /home/dave/corpus/cooked-ham.43366468
> jm.log:. /home/jm/Mail/deld.priv/34675
> jm.log:. /home/jm/Mail/deld.priv/34682
> jm.log:. /home/jm/Mail/deld.priv/34699
> jm.log:. /home/jm/Mail/deld.priv/34703
> quinlan.log:. /home/corpus/mail/ham/166370
> quinlan.log:. /home/corpus/mail/ham/166400
> quinlan.log:. /home/corpus/mail/ham/166430
> quinlan.log:. /home/corpus/mail/ham/166437
> 
> > 2) DRUGS_SMEAR1 hit some nonspam? I find that damn near impossible. I don't 
> > think any nonspam email other than one quoting spam will ever hit that 
> > rule. It seems there's one drug spam, or drug spam quote in somebody's 
> > corpus, and it was run in all 4 sets. (If anyone can show me the nonspam 
> > matching that rule and it's not spam or a spam quote or discussion of SA's 
> > rules, I'll send em $20. Really.)
> 
> jm.log:. /home/jm/Mail/deld.priv/26352
> 
> > 4) NIGERIAN_BODY3? could be a finance newsletter, but very unlikely.
> 
> That was mine:
> 
> theo.log:Y ham/misc200405-200407.33861588
> 
> Unfortunately I took those misc ham mboxes and converted them to dir
> format a while ago, so I don't know what message that was.
> 
> > 6) PERCENT_RANDOM? Very unlikely. What would have %rnd_x in it?
> 
> jm.log:. /home/jm/Mail/deld.pub/12701
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFBq8S+MJF5cimLx9ARAqIgAJ9cvW676a9p9lliRZwZIb79xDNnqwCgstps
ie+5pylFyumlfeFwt2kTRXA=
=cb4U
-----END PGP SIGNATURE-----