You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Yves Goergen <no...@unclassified.de> on 2007/07/12 18:11:55 UTC

"body" configuration option without "Subject" header?

Hi,

the Mail::Spamassassin::Conf manpage says that the "body" rule also
contains the "Subject" header as first part of the body content. I'm
trying to create a rule that catches empty messages but if the Subject
is always prepended, I can't do that. Is there a way to get the text or
html parts of the message alone, without any headers that I can also
check otherwise?

Using SA 3.1.8 on Linux.

-- 
Yves Goergen "LonelyPixel" <no...@unclassified.de>
Visit my web laboratory at http://beta.unclassified.de

Re: "body" configuration option without "Subject" header?

Posted by Theo Van Dinter <fe...@apache.org>.
On Thu, Jul 12, 2007 at 05:32:17PM -0400, Theo Van Dinter wrote:
> > <html>
> > <head>
> > <some garbage here>
> > </head>
> > <body><p>&nbsp;</p></body>
> > </html>
> > 
> > I consider this empty, but ^$ does not. Any suggestions?
> 
> ^\s*$ ?

fwiw, I was thinking of the rendered part when I wrote that, it obviously
doesn't do anything for rawbody/HTML.

-- 
Randomly Selected Tagline:
"I highly recommend the movie People Vs. Larry Flynt.  You'd be amazed
 how authentic Courtney Love seems at playing a heroin addict."
 - Dan Kohn in <B1...@exchange2003.ad.skymv.com>

Re: "body" configuration option without "Subject" header?

Posted by Theo Van Dinter <fe...@apache.org>.
On Thu, Jul 12, 2007 at 11:01:45PM +0200, Yves Goergen wrote:
> Okay, and what about an HTML part like this:
> 
> <html>
> <head>
> <some garbage here>
> </head>
> <body><p>&nbsp;</p></body>
> </html>
> 
> I consider this empty, but ^$ does not. Any suggestions?

^\s*$ ?

You're going down the slippery slope, btw.  Originally, there was nothing
in the body, so rawbody is fine.  If there is now html w/ just a space,
the next step will be random crap plus the pdf, so I wouldn't waste my
time on this and just target other information in the mail.

ymmv.

-- 
Randomly Selected Tagline:
Passwords are implemented as a result of insecurity.

Re: "body" configuration option without "Subject" header?

Posted by Yves Goergen <no...@unclassified.de>.
On 12.07.2007 20:02 CE(S)T, Loren Wilton wrote:
>> Hm, according to the manpage, this doesn't remove HTML tags and it's
>> matched line by line, so checking for a visually empty message isn't an
>> easy job here. I guess for such advanced tests like whether a message is
>> empty or not, I'll need to learn Perl and create my own function (or
>> plugin) for it. :(
>>
>> Or has anybody a quick measure agains empty mails with "Invoice #0000"
>> in the subject and only a PDF attached?
> 
> You want empty body other than the subject line?  I think there is an 
> EMPTY_BODY rule already, or something very similar to that.  Also I believe 
> rules that fire on various amounts of body text.

Okay, and what about an HTML part like this:

<html>
<head>
<some garbage here>
</head>
<body><p>&nbsp;</p></body>
</html>

I consider this empty, but ^$ does not. Any suggestions?

-- 
Yves Goergen "LonelyPixel" <no...@unclassified.de>
Visit my web laboratory at http://beta.unclassified.de

Re: "body" configuration option without "Subject" header?

Posted by Loren Wilton <lw...@earthlink.net>.
> Hm, according to the manpage, this doesn't remove HTML tags and it's
> matched line by line, so checking for a visually empty message isn't an
> easy job here. I guess for such advanced tests like whether a message is
> empty or not, I'll need to learn Perl and create my own function (or
> plugin) for it. :(
>
> Or has anybody a quick measure agains empty mails with "Invoice #0000"
> in the subject and only a PDF attached?

You want empty body other than the subject line?  I think there is an 
EMPTY_BODY rule already, or something very similar to that.  Also I believe 
rules that fire on various amounts of body text.

However, you could probably do something like the following.  This is 
UNTESTED, and I may have the /m and /s options wrong to make it really work 
correctly.  But possibly something along the lines:

body    EMPTYNESS    /^Subject[^\n]{0,150}[\n\s]{1,500}$/im

        Loren



Re: "body" configuration option without "Subject" header?

Posted by Yves Goergen <no...@unclassified.de>.
On 12.07.2007 18:47 CE(S)T, Theo Van Dinter wrote:
> On Thu, Jul 12, 2007 at 06:11:55PM +0200, Yves Goergen wrote:
>> is always prepended, I can't do that. Is there a way to get the text or
>> html parts of the message alone, without any headers that I can also
>> check otherwise?
> 
> rawbody.

Hm, according to the manpage, this doesn't remove HTML tags and it's
matched line by line, so checking for a visually empty message isn't an
easy job here. I guess for such advanced tests like whether a message is
empty or not, I'll need to learn Perl and create my own function (or
plugin) for it. :(

Or has anybody a quick measure agains empty mails with "Invoice #0000"
in the subject and only a PDF attached?

-- 
Yves Goergen "LonelyPixel" <no...@unclassified.de>
Visit my web laboratory at http://beta.unclassified.de

Re: "body" configuration option without "Subject" header?

Posted by Theo Van Dinter <fe...@apache.org>.
On Thu, Jul 12, 2007 at 06:11:55PM +0200, Yves Goergen wrote:
> is always prepended, I can't do that. Is there a way to get the text or
> html parts of the message alone, without any headers that I can also
> check otherwise?

rawbody.

-- 
Randomly Selected Tagline:
"The Motorola 6800 had an undocumented assembly opcode that earned the
 mnemonic 'Halt and Catch Fire'.  It was used by the factory to test the
 address bus.  It's harmless when the chip is hooked up to a test stand or
 normal RAM, but hook it up to core memory and it really would fry."
                      - Unknown