You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Andrej Bratko <an...@ijs.si> on 2007/02/25 20:17:24 UTC

Re: [2] Google Summer of Code 2007 ...


Chris St. Pierre wrote:
> 
> 
> Mark Martinec wrote:
>> 
>> 
>> ... the following sounds promising as an additional classifier
>> to existing bayes (especially since the author comes from the same
>> organization as myself :)
>> 
>> http://www.virusbtn.com/spambulletin/archive/2006/01/sb200601-trec
>> 
>> ijsSPAM2    PPM-D compression model
>>    Andrej Bratko (Josef Stefan Institute)
>> 
>> Observations:
>> The most startling observation is that character-based compression models
>> perform outstandingly well for spam filtering. Commonly used open-source
>> filters perform well, but not nearly so well or nearly so poorly as
>> reported elsewhere.
>> 
>> 
> 
> This looks very promising.  I found a description of the ijsSPAM2 tool
> on the site:
> 
> http://www.virusbtn.com/spambulletin/archive/2006/03/sb200603-compression
> 
> Remarkable stuff.  That would be a helluva nice plugin to have.
> 
> 

I've recently released a C++ library that includes an implementation of the 
PPM-D algorithm, geared towards classification (or mail filtering). This is
essentially 
the same algorithm that appeared at TREC 2005 as `ijsSPAM2'. 

It's available at:
http://ai.ijs.si/andrej/psmslib.html

There's also a Python wrapper:
http://ai.ijs.si/andrej/psmpylib.html

The C++ library and Python extension module are free for personal and for
research 
use, but unfortunately, I cannot disclose the source code at this time, or
release the 
libraries under an Apache-compatible license. Anyway, you might want to try
it out 
before coding your own implementation. 

-- 
View this message in context: http://www.nabble.com/Google-Summer-of-Code-2007-...-tf3240085.html#a9146893
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.