You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2007/04/26 13:15:52 UTC

Re: Score Generation for Apache SpamAssassin

Duncan Findlay writes:
> Hi everybody,
> 
> As you may already know, Steven Birk and I have been working on our
> 4th year undergraduate project in Math and Engineering at Queen's
> University.
> 
> The goal of our project was to examine the use of logistic regression
> as a potential replacement for the Perceptron/GA currently used by the
> SpamAssassin project.
> 
> It's now done, and it's available here:
> http://people.apache.org/~duncf/FindlayBirkThesis.pdf
> 
> Basically, we've found a technique that shows promise as a possible
> replacement, but requires some modifications in order to handle some
> of the restrictions the SpamAssassin projects puts on scores.
> 
> I hope to try to make those modifications in the next month or so, but
> I have no idea how well it will turn out, or how easy it will be.
> 
> The paper may be an interesting read for people not too familiar with
> the way the scoring process works now, as it discusses many of the
> issues that differentiate the scoring process from most other machine
> learning problems. (Then again, it might just be boring.)

thanks Duncan -- a great read, and looks promising!

Would it help btw if we came up with a spec for what a score-generation
tool needs to generate, in terms of score ranges and so on?
This would also be useful for the future (I'm sure there'll be
more... ;)

that'd be related to
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5376 ...

--j.

Re: Score Generation for Apache SpamAssassin

Posted by Duncan Findlay <du...@debian.org>.
On Thu, Apr 26, 2007 at 12:15:52PM +0100, Justin Mason wrote:
> thanks Duncan -- a great read, and looks promising!

> Would it help btw if we came up with a spec for what a score-generation
> tool needs to generate, in terms of score ranges and so on?
> This would also be useful for the future (I'm sure there'll be
> more... ;)

Probably not to me, but it might be useful to others. (I think I
already know what needs to be done.) Also, it might limit creativity
in possible solutions. We need a score ranges mechanism, we don't need
the specific one we have now.


-- 
Duncan Findlay

Re: Score Generation for Apache SpamAssassin

Posted by Duncan Findlay <du...@debian.org>.
On Thu, Apr 26, 2007 at 12:15:52PM +0100, Justin Mason wrote:
> thanks Duncan -- a great read, and looks promising!

> Would it help btw if we came up with a spec for what a score-generation
> tool needs to generate, in terms of score ranges and so on?
> This would also be useful for the future (I'm sure there'll be
> more... ;)

Probably not to me, but it might be useful to others. (I think I
already know what needs to be done.) Also, it might limit creativity
in possible solutions. We need a score ranges mechanism, we don't need
the specific one we have now.


-- 
Duncan Findlay