You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by Chris Means <cm...@intfar.com> on 2002/08/19 06:11:19 UTC

First pass Token Counter mailet for ANTI SPAM mailet

Hi,

This is my first attempt at developing a mailet...so if I've made a mistake
about how best to implement this...or if I've just done something dumb in my
code (cos I'm no Java guru either) please let me know.

I'm following through with a posting I saw on /. regarding using word
occurance statistics to be able to filter out SPAM from legit messages.
Here's the original article: http://www.paulgraham.com/spam.html

I saw this as a two part development.

Part 1:
  Routines for building good/bad word-token statistics.

Part 2:
  Using the statistics to route or flag new messages as SPAM or not.

Attached is my first pass at the code for Part 1.

As I decided to get familiar with JDBC with James at the same time, I've
coded this to use JDBC as the repository...that may not be the best approach
as it introduces a time lag at start up (as it loads the existing
words/occurances) and at shutdown, as it persists the new statistics back
into the database.

Let me know what you guys think of my approach...etc.

P.S.  Hopefully, there's something I don't understand about how to develope
under James.  I'm using JBuilder 4 to compile my code, then I've got to
update the James.bar (which JBuilder doesn't recognize as a jar repository)
with the new class file, then restart James.  I realize there's probably no
easy way around restarting James, but it would be nice to skip updating the
.bar all the time...is there a way to do this?

Thanks.

-Chris

Re: First pass Token Counter mailet for ANTI SPAM mailet

Posted by Stephen McConnell <mc...@apache.org>.

Chris Means wrote:

>Hi,
>
>This is my first attempt at developing a mailet...so if I've made a mistake
>about how best to implement this...or if I've just done something dumb in my
>code (cos I'm no Java guru either) please let me know.
>
>I'm following through with a posting I saw on /. regarding using word
>occurance statistics to be able to filter out SPAM from legit messages.
>Here's the original article: http://www.paulgraham.com/spam.html
>
>I saw this as a two part development.
>
>Part 1:
>  Routines for building good/bad word-token statistics.
>
>Part 2:
>  Using the statistics to route or flag new messages as SPAM or not.
>
>Attached is my first pass at the code for Part 1.
>
>As I decided to get familiar with JDBC with James at the same time, I've
>coded this to use JDBC as the repository...that may not be the best approach
>as it introduces a time lag at start up (as it loads the existing
>words/occurances) and at shutdown, as it persists the new statistics back
>into the database.
>
>Let me know what you guys think of my approach...etc.
>
>P.S.  Hopefully, there's something I don't understand about how to develope
>under James.  I'm using JBuilder 4 to compile my code, then I've got to
>update the James.bar (which JBuilder doesn't recognize as a jar repository)
>with the new class file, then restart James.  I realize there's probably no
>easy way around restarting James, but it would be nice to skip updating the
>.bar all the time...is there a way to do this?
>

You can safely rename a xxxx.bar files to xxxxx.jar.
The .bar file was an "old" format supported by Phoenix - it basically 
represebnted a Block packaged as a jar file.  Phoneix today does not do 
anytihg special with respect to .bar files.

Cheers, Steve.

-- 

Stephen J. McConnell

OSM SARL
digital products for a global economy
mailto:mcconnell@osm.net
http://www.osm.net




--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


config.xml : RE: First pass Token Counter mailet for ANTI SPAM mailet

Posted by Chris Means <cm...@intfar.com>.
I forgot to add sample configuration necessary in config.xml:

          <mailet match="RecipientIs=not.spam@yourhost.com"
class="ToProcessor">
            <processor> notSpam </processor>
          </mailet>


        <processor name="notSpam">
          <mailet match="All" class="JDBCTokenCounter">
            <repositoryPath> db://maildb/filter_goodwords </repositoryPath>
            <statisticPath> filter_statistics/goodmessages </statisticPath>
          </mailet>
        </processor>

-Chris

> -----Original Message-----
> From: Chris Means [mailto:cmeans@intfar.com]
> Sent: Sunday, August 18, 2002 11:11 PM
> To: james-dev@jakarta.apache.org
> Subject: First pass Token Counter mailet for ANTI SPAM mailet
>
>
> Hi,
>
> This is my first attempt at developing a mailet...so if I've made
> a mistake
> about how best to implement this...or if I've just done something
> dumb in my
> code (cos I'm no Java guru either) please let me know.
>
> I'm following through with a posting I saw on /. regarding using word
> occurance statistics to be able to filter out SPAM from legit messages.
> Here's the original article: http://www.paulgraham.com/spam.html
>
> I saw this as a two part development.
>
> Part 1:
>   Routines for building good/bad word-token statistics.
>
> Part 2:
>   Using the statistics to route or flag new messages as SPAM or not.
>
> Attached is my first pass at the code for Part 1.
>
> As I decided to get familiar with JDBC with James at the same time, I've
> coded this to use JDBC as the repository...that may not be the
> best approach
> as it introduces a time lag at start up (as it loads the existing
> words/occurances) and at shutdown, as it persists the new statistics back
> into the database.
>
> Let me know what you guys think of my approach...etc.
>
> P.S.  Hopefully, there's something I don't understand about how
> to develope
> under James.  I'm using JBuilder 4 to compile my code, then I've got to
> update the James.bar (which JBuilder doesn't recognize as a jar
> repository)
> with the new class file, then restart James.  I realize there's
> probably no
> easy way around restarting James, but it would be nice to skip
> updating the
> .bar all the time...is there a way to do this?
>
> Thanks.
>
> -Chris
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>