You are viewing a plain text version of this content. The canonical link for it is here.
Posted to server-dev@james.apache.org by Chris Means <cm...@intfar.com> on 2002/08/19 06:11:19 UTC
First pass Token Counter mailet for ANTI SPAM mailet
Hi,
This is my first attempt at developing a mailet...so if I've made a mistake
about how best to implement this...or if I've just done something dumb in my
code (cos I'm no Java guru either) please let me know.
I'm following through with a posting I saw on /. regarding using word
occurance statistics to be able to filter out SPAM from legit messages.
Here's the original article: http://www.paulgraham.com/spam.html
I saw this as a two part development.
Part 1:
Routines for building good/bad word-token statistics.
Part 2:
Using the statistics to route or flag new messages as SPAM or not.
Attached is my first pass at the code for Part 1.
As I decided to get familiar with JDBC with James at the same time, I've
coded this to use JDBC as the repository...that may not be the best approach
as it introduces a time lag at start up (as it loads the existing
words/occurances) and at shutdown, as it persists the new statistics back
into the database.
Let me know what you guys think of my approach...etc.
P.S. Hopefully, there's something I don't understand about how to develope
under James. I'm using JBuilder 4 to compile my code, then I've got to
update the James.bar (which JBuilder doesn't recognize as a jar repository)
with the new class file, then restart James. I realize there's probably no
easy way around restarting James, but it would be nice to skip updating the
.bar all the time...is there a way to do this?
Thanks.
-Chris
Re: First pass Token Counter mailet for ANTI SPAM mailet
Posted by Stephen McConnell <mc...@apache.org>.
Chris Means wrote:
>Hi,
>
>This is my first attempt at developing a mailet...so if I've made a mistake
>about how best to implement this...or if I've just done something dumb in my
>code (cos I'm no Java guru either) please let me know.
>
>I'm following through with a posting I saw on /. regarding using word
>occurance statistics to be able to filter out SPAM from legit messages.
>Here's the original article: http://www.paulgraham.com/spam.html
>
>I saw this as a two part development.
>
>Part 1:
> Routines for building good/bad word-token statistics.
>
>Part 2:
> Using the statistics to route or flag new messages as SPAM or not.
>
>Attached is my first pass at the code for Part 1.
>
>As I decided to get familiar with JDBC with James at the same time, I've
>coded this to use JDBC as the repository...that may not be the best approach
>as it introduces a time lag at start up (as it loads the existing
>words/occurances) and at shutdown, as it persists the new statistics back
>into the database.
>
>Let me know what you guys think of my approach...etc.
>
>P.S. Hopefully, there's something I don't understand about how to develope
>under James. I'm using JBuilder 4 to compile my code, then I've got to
>update the James.bar (which JBuilder doesn't recognize as a jar repository)
>with the new class file, then restart James. I realize there's probably no
>easy way around restarting James, but it would be nice to skip updating the
>.bar all the time...is there a way to do this?
>
You can safely rename a xxxx.bar files to xxxxx.jar.
The .bar file was an "old" format supported by Phoenix - it basically
represebnted a Block packaged as a jar file. Phoneix today does not do
anytihg special with respect to .bar files.
Cheers, Steve.
--
Stephen J. McConnell
OSM SARL
digital products for a global economy
mailto:mcconnell@osm.net
http://www.osm.net
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>
config.xml : RE: First pass Token Counter mailet for ANTI SPAM mailet
Posted by Chris Means <cm...@intfar.com>.
I forgot to add sample configuration necessary in config.xml:
<mailet match="RecipientIs=not.spam@yourhost.com"
class="ToProcessor">
<processor> notSpam </processor>
</mailet>
<processor name="notSpam">
<mailet match="All" class="JDBCTokenCounter">
<repositoryPath> db://maildb/filter_goodwords </repositoryPath>
<statisticPath> filter_statistics/goodmessages </statisticPath>
</mailet>
</processor>
-Chris
> -----Original Message-----
> From: Chris Means [mailto:cmeans@intfar.com]
> Sent: Sunday, August 18, 2002 11:11 PM
> To: james-dev@jakarta.apache.org
> Subject: First pass Token Counter mailet for ANTI SPAM mailet
>
>
> Hi,
>
> This is my first attempt at developing a mailet...so if I've made
> a mistake
> about how best to implement this...or if I've just done something
> dumb in my
> code (cos I'm no Java guru either) please let me know.
>
> I'm following through with a posting I saw on /. regarding using word
> occurance statistics to be able to filter out SPAM from legit messages.
> Here's the original article: http://www.paulgraham.com/spam.html
>
> I saw this as a two part development.
>
> Part 1:
> Routines for building good/bad word-token statistics.
>
> Part 2:
> Using the statistics to route or flag new messages as SPAM or not.
>
> Attached is my first pass at the code for Part 1.
>
> As I decided to get familiar with JDBC with James at the same time, I've
> coded this to use JDBC as the repository...that may not be the
> best approach
> as it introduces a time lag at start up (as it loads the existing
> words/occurances) and at shutdown, as it persists the new statistics back
> into the database.
>
> Let me know what you guys think of my approach...etc.
>
> P.S. Hopefully, there's something I don't understand about how
> to develope
> under James. I'm using JBuilder 4 to compile my code, then I've got to
> update the James.bar (which JBuilder doesn't recognize as a jar
> repository)
> with the new class file, then restart James. I realize there's
> probably no
> easy way around restarting James, but it would be nice to skip
> updating the
> .bar all the time...is there a way to do this?
>
> Thanks.
>
> -Chris
>
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>