You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Olivier Nicole <on...@cs.ait.ac.th> on 2007/11/02 03:52:51 UTC

Check only the body with Mail::SpamAssassin

Hi,

I am wondering if some hiden method exist in Mail::SpamAssassin module
to only check the body of an email?

The problem is the following. I am looking at PDFassassin plugin
http://blog.atmail.com/?p=61 that uses pdftotext and pdftoimages to
extract the text from a PDF file. This text is then formated into a
dummy email message and a new instance of SA is launched to check that
dummy. The only interesting part of the dummy is the body (and further
more it should be without attachement) so it woul dbe faster to run
light SA (body/plain text only).

Best regards,

Olivier

Re: Check only the body with Mail::SpamAssassin

Posted by Theo Van Dinter <fe...@apache.org>.
On Sat, Nov 03, 2007 at 08:32:20AM +0700, Olivier Nicole wrote:
> Any pointer/example? I am quite new to this (and lame OO programmer).

Sure, search the list archive for PDF and post_message_parse.  ie:

http://www.nabble.com/PDFText-Plugin-for-PDF-file-scoring---not-for-PDF-images-tf4077171.html

:)

-- 
Randomly Selected Tagline:
I just got out of the hospital after a speed reading accident.
 I hit a bookmark.
 		-- Steven Wright

Re: Check only the body with Mail::SpamAssassin

Posted by Olivier Nicole <on...@cs.ait.ac.th>.
> I'd just convert the PDF into something usable by SA and scan it that way,
> instead of trying to kluge around the whole thing.  There are specific plug=
> in
> hooks and such to support this type of thing.

Any pointer/example? I am quite new to this (and lame OO programmer).

TIA,

Olivier

Re: Check only the body with Mail::SpamAssassin

Posted by Theo Van Dinter <fe...@apache.org>.
On Fri, Nov 02, 2007 at 09:52:51AM +0700, Olivier Nicole wrote:
> I am wondering if some hiden method exist in Mail::SpamAssassin module
> to only check the body of an email?

No, short of you removing/disabling non-body rules.

> The problem is the following. I am looking at PDFassassin plugin
> http://blog.atmail.com/?p=61 that uses pdftotext and pdftoimages to
> extract the text from a PDF file. This text is then formated into a
> dummy email message and a new instance of SA is launched to check that
> dummy. The only interesting part of the dummy is the body (and further
> more it should be without attachement) so it woul dbe faster to run
> light SA (body/plain text only).

I'd just convert the PDF into something usable by SA and scan it that way,
instead of trying to kluge around the whole thing.  There are specific plugin
hooks and such to support this type of thing.

-- 
Randomly Selected Tagline:
Sarchasm: The gulf between the author of sarcastic wit, and the recipient
 who doesn't get it.
         - Washington Post