You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by jaysheel bhavsar <jb...@mindbridge.com> on 2004/06/11 02:04:06 UTC

spamassassin in java work with JAMES

Hi guys, 
  I am very new to spam assassin. I have been working for about a week or two 
on implementing spam assassin with JAMES (Java Apache Mail Enterprise 
Server). I am trying to write spam assassin in java, using the already 
written spam assassin rules (which is written in perl). At this point I am 
able to break the email up and make some regex comparisons.

  My question is how does spam assassin make comparison and break up email. My 
problem is that when I get an email I break it up into its respective parts 
(i.e: header, mime type etc) and I can save the body of the message into a 
file may it be a text/html or other content type (not worried about 
attachments). Now if I save the body (content) into a file and them take this 
thousands or rules already written, and apply them it can take quite some 
time.

  I wanna know how spam assassin does this. Does spam assassin take the 
content store it in memory and then apply all the rules? Does it take various 
regex's and combine them into one? 

  Is there any documentations on the core workings of spam assassin? I also 
want to know how does spam assassin save the rules (what type of data 
structure).

I will make the code available once I have completed the project.

thanks for your help. Let me if you are at all confused by what I am asking :)

jay


Re: spamassassin in java work with JAMES

Posted by Sidney Markowitz <si...@sidney.com>.
jaysheel bhavsar wrote:
> I am trying to write spam assassin in java, using the already 
> written spam assassin rules (which is written in perl)

My immediate reaction is "Why?" I could see the motivation of the person 
who once asked a similar question about writing a version in C, 
expecting a big speed increase, but that doesn't apply to Java. My 
second reaction is to wonder how one person can expect to duplicate the 
work of the entire group of developers who are still working as hard as 
they can on the task of completing SpamAssassin version 3.0 starting 
from a working version 2.63. SpamAssassin is not just a set of regexps 
that you can translate to a Java regexp parser. Will you want to use the 
same class structure for objects in Java as make sense in perl? Are you 
going to write or find your own versions of the CPAN modules that the SA 
developers did not have to spend time writing themselves?

It might be a fun project to rewrite SpamAssassin in Java, but I can't 
see any rationale for only allowing 100% Java programs on an enterprise 
server unless you are a Sun executive. It seems it would be a whole lot 
more useful to the JAMES project if you found an efficient way to pipe 
its mail through a SpamAssassin daemon running in perl.

> I will make the code available once I have completed the project.

That's admirable, but I can't imagine how it would be useful to us 
unless you came up with basic improvements that apply across languages. 
Of course, anyone who really does have a problem running perl but no 
problem running Java wold appreciate the work.

> thanks for your help. Let me if you are at all confused by what I am asking :)

Perhaps I am confused. I don't mean to be negative, and of course you 
are free to do what you want with your project, but I don't see anything 
here that would induce me to take away from the little time I have to 
volunteer helping get SpamAssassin written to try anything so massive as 
educating someone in the nitty gritty details of SpamAssassin's code.

Please note that I am just one volunteer here and I do not speak for 
everyone. However, if nobody else sees a benefit of this to the 
SpamAssassin development project, then this will quickly become a topic 
not suitable for the spamassassin-dev mailing list. I am not asking you 
to go away. I'm just saying that if thread goes off into a discussion of 
how to port the code to Java with no benefits to the original project, I 
don't think it will be relevant to this list.

  -- sidney