You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spamassassin.apache.org by Adam Katz <an...@khopis.com> on 2010/04/24 22:22:59 UTC

Re: hardware on ruleqa ... and perceptrons

On 04/24/2010 02:36 PM, Adam Katz wrote:
> The ruleqa system is very slow to crunch its results [...] As to the
> nightly runs, while a code revamp might help, it's typically easier
> to throw more hardware at it.

Hm.  Digging through the repository brought forth some of the
conversations I had at the MIT Spam Conference this year between some
Cisco (IronPort) developers working there under Henry Stern.

Specifically, I was being challenged with the differences between SA's
genetic algorithm and perceptrons, which is beyond my current
mathematical prowess.  Upon talking to my girlfriend (who knows far more
about these things than myself), we concluded that since the Cisco group
was specialized in perceptrons, they likely suffered from the "when you
have a hammer, everything looks like a nail" problem and that it was
probably a negligible gain not worth the needed rewriting.

Today, I saw this in svn at masses/README.perceptron:
> The advantage of this program over that of the genetic algorithm
> (GA) implementation in spamassassin/masses/craig_evolve.c is that
> while the GA requires several hours to run on high-end machines, the
> perceptron requires only about 15 seconds of CPU time on an Athlon XP
> 1700+ system.

Written by Henry Stern, 2004-01-08.  If I recall from my conversation
with him last month, he abandoned the project and his PhD pursuit when
offered a job at IronPort.

Henry:  I've Bcc'd you in case you're not on the dev list anymore.
Apologies if you get this twice.

Re: hardware on ruleqa ... and perceptrons

Posted by Justin Mason <jm...@jmason.org>.

On Sun, Apr 25, 2010 at 00:57, Sidney Markowitz <si...@sidney.com> wrote:
> Adam Katz wrote, On 25/04/10 8:22 AM:
>> Today, I saw this in svn at masses/README.perceptron:
>
> See this that Justin posted to sa-dev that explains the history of our
> using GA, then perceptron, then back to GA.
>
> It also links to Duncan Findlay's thesis work on using logistic
> regression as a faster algorithm that gets better results, but I don't
> know what ended up happening with that.
>
> http://mail-archives.apache.org/mod_mbox/spamassassin-dev/200707.mbox/%3C20070701224117.F1A7732D60@radish.jmason.org%3E
>
> or if that link gets garbled, also archived at
>
> http://www.mail-archive.com/dev@spamassassin.apache.org/msg21162.html

Yep.  Basically, the perceptron implementation seems to require a lot
of hand-tuning to
produce decent results.  The GA is a lot more "fire and forget", if slower.

--j.

Re: hardware on ruleqa ... and perceptrons

Posted by Sidney Markowitz <si...@sidney.com>.

Adam Katz wrote, On 25/04/10 8:22 AM:
> Today, I saw this in svn at masses/README.perceptron:

See this that Justin posted to sa-dev that explains the history of our
using GA, then perceptron, then back to GA.

It also links to Duncan Findlay's thesis work on using logistic
regression as a faster algorithm that gets better results, but I don't
know what ended up happening with that.

http://mail-archives.apache.org/mod_mbox/spamassassin-dev/200707.mbox/%3C20070701224117.F1A7732D60@radish.jmason.org%3E

or if that link gets garbled, also archived at

http://www.mail-archive.com/dev@spamassassin.apache.org/msg21162.html