You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Mike Tonks <fl...@googlemail.com> on 2010/08/05 11:58:05 UTC

Calling SpamAssassin from a Perl Web Form

Hi folks,

I'm looking into hooking the Mail::SpamAssassin module into a perl
processor for a couple of web forms - contact us, comments form, and
publish an article form (open publishing).

The main barrier seems to be the need for a message format rather than
just a plain text body.

I tried two approaces:

1) I tried just passing the body text to SA but this triggers a load
of missing header rules.

2) I tried to 'spoof' the headers to get SA to process each post like
it's a normal mail, but found my spoof headers caused various issues
as well not least long delays - I'm guessing while the spoof email
domais (always the same) are checked via the network.

Also, I probably need to disable a bunch of rules that aren't really
appropriate, e.g. looking up the sender email address since I'm just
using a dummy one anyway (except for the contact form).  Seems like
mainly the header rules would need to be disabled, and the body rules
given more weighting.  Is there an easy way to do this?

Alternatively, perhaps I should just identify particular rules that
are relevant and call the directly.  Is this possible?

Thanks for any help.

mike

Re: Calling SpamAssassin from a Perl Web Form

Posted by Mike Tonks <fl...@googlemail.com>.
> I've yet to hear anyone implementing SA for forms in a sensible manner..

Thanks for the feedback.  If people have tried before it's unlikely
I'll do much better :)

> It would make much more sense to me to just apply well known form spam
> specific checks into your code. The standard captcha, too-many-links, bad
> keywords etc. Lots of information for that around. I have a hard time
> believing SA default rules would catch anything serious.

One of the main attractions is the Bayes Learning stuff, which seems
to be nicely implemented (and non trivial) and the URI Blacklist
lookup, plus bad word lists & the general scoring system which is
really cool, plus the ability to tweak the setting and add rules via
plugins, etc.

It seems that SA is fairly mature and has a good user base, so it
would seem a nice idea to hook into this rather than reinventing these
components.

Any suggestions how to achieve this either via SA or otherwise
(existing CPAN modules?) would be much appreciated, if anyone here is
knowledgeable in this area.

Yes, there will be some captcha stuff too but I see that as separate
to the actual 'identifying spam content' issue.

cheers for any help.

mike

On 5 August 2010 15:38, Henrik K <he...@hege.li> wrote:

>
>
> You can easily make SA-like network checks, just parse URIs from messages
> and check few URIBLs. Check sender IP from applicable lists (not dial-up
> ones) etc.. at simplest it's some regexp and gethostbyname calls.
>
>

Re: Calling SpamAssassin from a Perl Web Form

Posted by Henrik K <he...@hege.li>.
On Thu, Aug 05, 2010 at 10:58:05AM +0100, Mike Tonks wrote:
> Hi folks,
> 
> I'm looking into hooking the Mail::SpamAssassin module into a perl
> processor for a couple of web forms - contact us, comments form, and
> publish an article form (open publishing).

I've yet to hear anyone implementing SA for forms in a sensible manner..

It would make much more sense to me to just apply well known form spam
specific checks into your code. The standard captcha, too-many-links, bad
keywords etc. Lots of information for that around. I have a hard time
believing SA default rules would catch anything serious.

You can easily make SA-like network checks, just parse URIs from messages
and check few URIBLs. Check sender IP from applicable lists (not dial-up
ones) etc.. at simplest it's some regexp and gethostbyname calls.


Re: Calling SpamAssassin from a Perl Web Form

Posted by Bowie Bailey <Bo...@BUC.com>.
 On 8/5/2010 5:58 AM, Mike Tonks wrote:
> Hi folks,
>
> I'm looking into hooking the Mail::SpamAssassin module into a perl
> processor for a couple of web forms - contact us, comments form, and
> publish an article form (open publishing).
>
> The main barrier seems to be the need for a message format rather than
> just a plain text body.
>
> I tried two approaces:
>
> 1) I tried just passing the body text to SA but this triggers a load
> of missing header rules.

Yeah.  This won't work.  SA is designed to process mail messages, not a
text blob.

> 2) I tried to 'spoof' the headers to get SA to process each post like
> it's a normal mail, but found my spoof headers caused various issues
> as well not least long delays - I'm guessing while the spoof email
> domais (always the same) are checked via the network.

This will work as long as you create the proper headers.  Send yourself
a message using the form (or some other method) and then grab everything
from the first received header down to use as your template to create
the fake headers.  You can get rid of extra headers if you want -- all
you really need is From, To, Subject, Date, Message-ID, and at least one
Received header.  Just make sure the Date header is dynamic (and in the
right format) so that you don't start getting hits on INVALID_DATE or
DATE_IN_PAST.

> Also, I probably need to disable a bunch of rules that aren't really
> appropriate, e.g. looking up the sender email address since I'm just
> using a dummy one anyway (except for the contact form).  Seems like
> mainly the header rules would need to be disabled, and the body rules
> given more weighting.  Is there an easy way to do this?
>
> Alternatively, perhaps I should just identify particular rules that
> are relevant and call the directly.  Is this possible?

You can run spamd (or spamassassin) and point it to a directory
containing only the rules you are interested in, but this would probably
be overkill unless your volume is high and scantimes are becoming excessive.

-- 
Bowie