You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2004/06/23 01:07:36 UTC

Re: interesting paper on SpamAssassin (fwd)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


BTW, that ROCAC technique (ie. area under ROC curve) for getting a useful
single figure from a pair of FP/FN figures seems very promising....  Dan,
weren't you drawing graphs along those lines at some stage last year?

- --j.

Henry Stern writes:
> The author of the paper, Gordon Cormack, has a lot of experience in the area
> of information retrieval.  It would be a good idea to carefully analyse his
> results and conclusions for ways to improve SpamAssassin and for approaches
> that we should ignore.
> 
> I've been very skeptical of the hand wavy approaches with little theoretical
> background or improper evaluation (e.g. Dobly and Bayesian chains).  The
> results in Cormack's paper should warn us against blindly accepting "cool"
> ideas without taking the steps to ensure their validity.
> 
> Lastly, the positive results speak for themselves.  Kudos guys!
> 
> Henry
> 
> > -----Original Message-----
> > From: Justin Mason [mailto:jm@jmason.org]
> > Sent: June 22, 2004 3:42 AM
> > To: SpamAssassin-dev@incubator.apache.org
> > Subject: interesting paper on SpamAssassin (fwd)
> > 
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> > 
> > 
> > http://plg.uwaterloo.ca/~gvcormac/spamcormack.html
> > 
> > A good study comparing SpamAssassin (in several configurations) and
> > several other spam filtering systems, over the course of 8 months (Aug
> > 2003 to Mar 2004).   The measurements and methodology are all pretty
> > sound, as far as I can see.
> > 
> > Well worth a read...
> > 
> > - --j.
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.2.4 (GNU/Linux)
> > Comment: Exmh CVS
> > 
> > iD8DBQFA19SiQTcbUG5Y7woRAikGAJ4ye0EFbwOC0CrMtX8wk/TiIrNVnACgxWX/
> > 4XDSllJJSiRBFIklOrF93fE=
> > =xzPv
> > -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFA2Lu4QTcbUG5Y7woRAhPIAKDaPlBLl3SU+/YNlALs/nd+owRDkwCgiY+7
TA0xk3jhomdjmsHCtJ+MRKM=
=L/9N
-----END PGP SIGNATURE-----


Re: interesting paper on SpamAssassin (fwd)

Posted by Daniel Quinlan <qu...@pathname.com>.
jm@jmason.org (Justin Mason) writes:

> BTW, that ROCAC technique (ie. area under ROC curve) for getting a useful
> single figure from a pair of FP/FN figures seems very promising....  Dan,
> weren't you drawing graphs along those lines at some stage last year?

I've been using the FP/FN plots (for SpamAssassin as well as some work
stuff) pretty regularly, although I never actually tried calculating the
area, I just eyeballed the area and the distance to the origin (FP=0,
FN=0) in the region of the lines we cared about.  (I even drew one today
for a VP here trying to explain something.)

The best thing about it is that it allows you to compare any two filters
with tunable thresholds (and you can even plot as a point the
non-tunable ones).

The first time I saw the FP/FN plot was in that Microsoft presentation
in Boston.  ;-)

Daniel

-- 
Daniel Quinlan
http://www.pathname.com/~quinlan/