You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by jaysheel bhavsar <jb...@mindbridge.com> on 2004/06/11 02:04:06 UTC
spamassassin in java work with JAMES
Hi guys,
I am very new to spam assassin. I have been working for about a week or two
on implementing spam assassin with JAMES (Java Apache Mail Enterprise
Server). I am trying to write spam assassin in java, using the already
written spam assassin rules (which is written in perl). At this point I am
able to break the email up and make some regex comparisons.
My question is how does spam assassin make comparison and break up email. My
problem is that when I get an email I break it up into its respective parts
(i.e: header, mime type etc) and I can save the body of the message into a
file may it be a text/html or other content type (not worried about
attachments). Now if I save the body (content) into a file and them take this
thousands or rules already written, and apply them it can take quite some
time.
I wanna know how spam assassin does this. Does spam assassin take the
content store it in memory and then apply all the rules? Does it take various
regex's and combine them into one?
Is there any documentations on the core workings of spam assassin? I also
want to know how does spam assassin save the rules (what type of data
structure).
I will make the code available once I have completed the project.
thanks for your help. Let me if you are at all confused by what I am asking :)
jay
Re: spamassassin in java work with JAMES
Posted by Sidney Markowitz <si...@sidney.com>.
jaysheel bhavsar wrote:
> I am trying to write spam assassin in java, using the already
> written spam assassin rules (which is written in perl)
My immediate reaction is "Why?" I could see the motivation of the person
who once asked a similar question about writing a version in C,
expecting a big speed increase, but that doesn't apply to Java. My
second reaction is to wonder how one person can expect to duplicate the
work of the entire group of developers who are still working as hard as
they can on the task of completing SpamAssassin version 3.0 starting
from a working version 2.63. SpamAssassin is not just a set of regexps
that you can translate to a Java regexp parser. Will you want to use the
same class structure for objects in Java as make sense in perl? Are you
going to write or find your own versions of the CPAN modules that the SA
developers did not have to spend time writing themselves?
It might be a fun project to rewrite SpamAssassin in Java, but I can't
see any rationale for only allowing 100% Java programs on an enterprise
server unless you are a Sun executive. It seems it would be a whole lot
more useful to the JAMES project if you found an efficient way to pipe
its mail through a SpamAssassin daemon running in perl.
> I will make the code available once I have completed the project.
That's admirable, but I can't imagine how it would be useful to us
unless you came up with basic improvements that apply across languages.
Of course, anyone who really does have a problem running perl but no
problem running Java wold appreciate the work.
> thanks for your help. Let me if you are at all confused by what I am asking :)
Perhaps I am confused. I don't mean to be negative, and of course you
are free to do what you want with your project, but I don't see anything
here that would induce me to take away from the little time I have to
volunteer helping get SpamAssassin written to try anything so massive as
educating someone in the nitty gritty details of SpamAssassin's code.
Please note that I am just one volunteer here and I do not speak for
everyone. However, if nobody else sees a benefit of this to the
SpamAssassin development project, then this will quickly become a topic
not suitable for the spamassassin-dev mailing list. I am not asking you
to go away. I'm just saying that if thread goes off into a discussion of
how to port the code to Java with no benefits to the original project, I
don't think it will be relevant to this list.
-- sidney