You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Mark London <mr...@psfc.mit.edu> on 2005/01/31 05:05:14 UTC

Howto skip empty lines in a body test?

I use the "body" command to tests for phrases.  This was working great, until a
spammer started to use double spacing in his email, and the phrases were split
up by empty lines.  Is there any way around this?  I've tried everything,
including using full and rawbody, but I still can't find a way to specify a
phrase, and allow for the fact that the words in the phrase might be broken up
by empty lines.  Something tells me that it's impossible.  Is that true?  Thanks.

Re: Howto skip empty lines in a body test?

Posted by Matt Kettler <mk...@evi-inc.com>.

At 01:04 PM 1/31/2005, John Hardin wrote:
>That simplifies it greatly:
>
>         \s+

Yep, which goes back to being a lot like my earlier suggestion, \s{1,10} , 
it just lacks the upper bound of 10 I was using.

I generally don't like to use + or * unless I really want SA to be able to 
bridge a very long gap, but that's really just a personal preference

Re: Howto skip empty lines in a body test?

Posted by John Hardin <jo...@aproposretail.com>.

On Mon, 2005-01-31 at 09:48, Matt Kettler wrote:
> At 12:26 PM 1/31/2005, jdow wrote:
> > >     (?:\s|<\/?(?:P|BR)>)+
> >
> >Geshundheidt, John.

Danke.

> >Er, would you care to translate that sneeze, please.
> 
> I think he's trying to catch spaces or HTML line-end type tags.
> 
> Two problems
> 
>          1) it will look for </P> and </BR>, but BR doesn't have a /BR.

Eh, so what? That simplifies the regex a bit.

>          2) body rules never see HTML tags anyway they are already 
> stripped, so that part's pointless. 

Agh. I keep forgetting that.

That simplifies it greatly:

	\s+

:)

--
John Hardin
Development and Technology group (Seattle)
CRS Retail Systems, Inc.
3400 188th Street SW, Suite 185
Lynnwood, WA 98037
voice: (425) 672-1304
  fax: (425) 672-0192
email: jhardin@crsretail.com
  web: http://www.crsretail.com
-----------------------------------------------------------------------
 If you smash a computer to bits with a mallet, that appears to count
 as encryption in the state of Nevada.
                                               - CRYPTO-GRAM 12/2001
-----------------------------------------------------------------------

Re: Howto skip empty lines in a body test?

Posted by Matt Kettler <mk...@evi-inc.com>.

At 12:26 PM 1/31/2005, jdow wrote:
> >     (?:\s|<\/?(?:P|BR)>)+
>
>Geshundheidt, John.
>
>Er, would you care to translate that sneeze, please.

I think he's trying to catch spaces or HTML line-end type tags.

Two problems

         1) it will look for </P> and </BR>, but BR doesn't have a /BR.

         2) body rules never see HTML tags anyway they are already 
stripped, so that part's pointless.

Re: Howto skip empty lines in a body test?

Posted by jdow <jd...@earthlink.net>.

From: "John Hardin" <jo...@aproposretail.com>

> On Mon, 2005-01-31 at 06:36, Matt Kettler wrote:
> 
> > Perhaps you just need to modify your rule to tolerate more spaces, and 
> > perhaps tabs, between words by using \s{1,10} instead of a space.
> 
> Maybe better yet:
> 
>     (?:\s|<\/?(?:P|BR)>)+

Geshundheidt, John.

Er, would you care to translate that sneeze, please.
{O.O}

Re: Howto skip empty lines in a body test?

Posted by John Hardin <jo...@aproposretail.com>.

On Mon, 2005-01-31 at 06:36, Matt Kettler wrote:

> Perhaps you just need to modify your rule to tolerate more spaces, and 
> perhaps tabs, between words by using \s{1,10} instead of a space.

Maybe better yet:

    (?:\s|<\/?(?:P|BR)>)+

--
John Hardin
Development and Technology group (Seattle)
CRS Retail Systems, Inc.
3400 188th Street SW, Suite 185
Lynnwood, WA 98037
voice: (425) 672-1304
  fax: (425) 672-0192
email: jhardin@crsretail.com
  web: http://www.crsretail.com
-----------------------------------------------------------------------
 If you smash a computer to bits with a mallet, that appears to count
 as encryption in the state of Nevada.
                                               - CRYPTO-GRAM 12/2001
-----------------------------------------------------------------------

Re: Howto skip empty lines in a body test?

Posted by Matt Kettler <mk...@comcast.net>.

At 11:05 PM 1/30/2005, Mark London wrote:
>I use the "body" command to tests for phrases.  This was working great, 
>until a
>spammer started to use double spacing in his email, and the phrases were split
>up by empty lines.  Is there any way around this?

  The body command works on a copy of the message body that has had all 
newlines stripped out. So the extra CR's aren't your problem. However, body 
rules do not inherently deal with multiple space characters between words.

Perhaps you just need to modify your rule to tolerate more spaces, and 
perhaps tabs, between words by using \s{1,10} instead of a space.

Look closely at the message, in a hex editor if you have one.

>   I've tried everything,
>including using full and rawbody, but I still can't find a way to specify a
>phrase, and allow for the fact that the words in the phrase might be broken up
>by empty lines.

rawbody actually makes your problem real. Rawbody is a copy of the body 
WITH all the CR's and all the HTML tags, thus a rawbody rule will suffer 
from CR insertion.

>   Something tells me that it's impossible.  Is that true?  Thanks.

No, it's perfectly possible..

Re: Howto skip empty lines in a body test?

Posted by Keith Ivey <kc...@cpcug.org>.

Loren Wilton wrote:

> Try the rule with /s on the end of the re.  That will tend to turn newlines
> into spaces.

People often seem to be confused by the /s modifier for regexes. 
  All it does is allow '.' to match any character.  Without the 
/s, '.' matches any character other than newline.  So /s does 
nothing at all if you have no '.' in your regex.

-- 
Keith C. Ivey <kc...@cpcug.org>
Washington, DC

Re: Howto skip empty lines in a body test?

Posted by Mark London <mr...@psfc.mit.edu>.

Loren Wilton <lwilton <at> earthlink.net> writes:
> It might be impossible on full, if the message is encoded, since full will
> see the encoded text.
> It may or may not be impossible on body, depending on the version you are
> running and a handful of other things.
> 
> Sometimes body gets broken up into multiple pieces, making it nearly
> impossible to do anything that spans multiple lines.  In HTML, a <b> will
> do this, as occasionally will a <p>.  In plain text, sometimes it will break
> the body on multiple lines, sometimes not.
> 
> Try the rule with /s on the end of the re.  That will tend to turn newlines
> into spaces.

Thanks for the info!  Looks like I can get full to work.  

These messages are simply plain text, nigerian type spam. 

Maybe instead, what I should do, is create a test for double spacing in email. :)

Re: Howto skip empty lines in a body test?

Posted by Matt Kettler <mk...@evi-inc.com>.

At 11:14 PM 1/30/2005, Loren Wilton wrote:
>Try the rule with /s on the end of the re.  That will tend to turn newlines
>into spaces.

Loren, that should be redundant in any "body" or "uri" rule. SA already 
does that conversion to the whole body to save doing it repeatedly for 
every body rule in the ruleset (generally most body rules would want this)

The only time you should want to use the /s or /m modifier to a regex rule 
in SA is in a header, full, or rawbody rule.

Re: Howto skip empty lines in a body test?

Posted by Loren Wilton <lw...@earthlink.net>.

It might or might not be impossible.

It *is* impossible on rawbody, since the rules only see one line at a time.
It might be impossible on full, if the message is encoded, since full will
see the encoded text.
It may or may not be impossible on body, depending on the version you are
running and a handful of other things.

Sometimes body gets broken up into multiple pieces, making it nearly
impossible to do anything that spans multiple lines.  In HTML, a <br> will
do this, as occasionally will a <p>.  In plain text, sometimes it will break
the body on multiple lines, sometimes not.

Try the rule with /s on the end of the re.  That will tend to turn newlines
into spaces.

        Loren