You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Andrej Bratko <an...@ijs.si> on 2007/02/25 20:17:24 UTC
Re: [2] Google Summer of Code 2007 ...
Chris St. Pierre wrote:
>
>
> Mark Martinec wrote:
>>
>>
>> ... the following sounds promising as an additional classifier
>> to existing bayes (especially since the author comes from the same
>> organization as myself :)
>>
>> http://www.virusbtn.com/spambulletin/archive/2006/01/sb200601-trec
>>
>> ijsSPAM2 PPM-D compression model
>> Andrej Bratko (Josef Stefan Institute)
>>
>> Observations:
>> The most startling observation is that character-based compression models
>> perform outstandingly well for spam filtering. Commonly used open-source
>> filters perform well, but not nearly so well or nearly so poorly as
>> reported elsewhere.
>>
>>
>
> This looks very promising. I found a description of the ijsSPAM2 tool
> on the site:
>
> http://www.virusbtn.com/spambulletin/archive/2006/03/sb200603-compression
>
> Remarkable stuff. That would be a helluva nice plugin to have.
>
>
I've recently released a C++ library that includes an implementation of the
PPM-D algorithm, geared towards classification (or mail filtering). This is
essentially
the same algorithm that appeared at TREC 2005 as `ijsSPAM2'.
It's available at:
http://ai.ijs.si/andrej/psmslib.html
There's also a Python wrapper:
http://ai.ijs.si/andrej/psmpylib.html
The C++ library and Python extension module are free for personal and for
research
use, but unfortunately, I cannot disclose the source code at this time, or
release the
libraries under an Apache-compatible license. Anyway, you might want to try
it out
before coding your own implementation.
--
View this message in context: http://www.nabble.com/Google-Summer-of-Code-2007-...-tf3240085.html#a9146893
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.