You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Ben Wylie <sa...@benwylie.co.uk> on 2007/02/19 22:51:02 UTC

Using ^ and $ in SA Rules

I have tried to write a rule which would hit a line which only contains 
four capital letters, each separated by a space.

so i wrote a body rule:
/^[A-Z]\s[A-Z]\s[A-Z]\s[A-Z]$/

unfortunately it doesn't hit when I expect it to:
C T C X

If I take the ^ and $ parts, it does hit, but i would like it to only 
hit if that is the only thing on a particular line.

Have I made a mistake here? How might I get a rule like this to work?

Thanks
Ben



Re: Using ^ and $ in SA Rules

Posted by Ben Wylie <sa...@benwylie.co.uk>.
Matt Kettler wrote:
> Ben Wylie wrote:
>> I have tried to write a rule which would hit a line which only
>> contains four capital letters, each separated by a space.
>>
>> so i wrote a body rule:
>> /^[A-Z]\s[A-Z]\s[A-Z]\s[A-Z]$/
>>
>> unfortunately it doesn't hit when I expect it to:
>> C T C X
>>
>> If I take the ^ and $ parts, it does hit, but i would like it to only
>> hit if that is the only thing on a particular line.
>>
>> Have I made a mistake here? How might I get a rule like this to work?
> Use rawbody for this. Body rules have CR/LF stripped out.

That would explain it.
Thanks
Ben



Re: Using ^ and $ in SA Rules

Posted by Matt Kettler <mk...@verizon.net>.
Ben Wylie wrote:
> I have tried to write a rule which would hit a line which only
> contains four capital letters, each separated by a space.
>
> so i wrote a body rule:
> /^[A-Z]\s[A-Z]\s[A-Z]\s[A-Z]$/
>
> unfortunately it doesn't hit when I expect it to:
> C T C X
>
> If I take the ^ and $ parts, it does hit, but i would like it to only
> hit if that is the only thing on a particular line.
>
> Have I made a mistake here? How might I get a rule like this to work?
Use rawbody for this. Body rules have CR/LF stripped out.



Re: Using ^ and $ in SA Rules

Posted by Loren Wilton <lw...@earthlink.net>.
> An example email which doesn't hit can be found here:
> http://www.arkbb.co.uk/ExampleEmail.txt

I just looked again at that spam.  I'm somewhat amused by the current and 
projected prices:

Currently priced at: .80
Expected: .00


        Loren



Re: Using ^ and $ in SA Rules

Posted by Loren Wilton <lw...@earthlink.net>.
Oh, you are going for a body rule, and the source is html.  Whether the body 
(which is broken into sections) starts just before the term you want is 
questionable.  Ah, there is also a plain text section.  That makes it a 
little easier.

Try this:

    body FOO_SYMBOL    /\n\s{0,15}[A-Z]\s[A-Z]\s[A-Z]\s[A-Z]\s{0,15}\n/

There are a lot of things Mr. Spammer can do to get around that, but I think 
it will catch this one case.

If you don't have the SARE stock rules, you should get those also.  I'd 
expect at least a few points from them on this thing.

        Loren



Re: Using ^ and $ in SA Rules

Posted by Matt Kettler <mk...@verizon.net>.
Mark Martinec wrote:
> Theo Van Dinter writes:
>   
>> body rules aren't run on lines, they're run on paragraphs,
>> so that text is in the middle of a string.
>>     
>
> Matt Kettler writes:
>   
>> Use rawbody for this. Body rules have CR/LF stripped out.
>>     
>
> Giving whole paragraphs to regexp is fine, but why are newlines
> stripped out in 'body' rules? 
In order to normalize whitespace. This way rules don't have to care
about whitespace, they can just be written normally.

Otherwise
/Hello I'm a spammer/i

Would fail to match:
Hello I'm
a spammer.

SA also reduces excess spaces in normal body rules, that way spammers
can't obfuscate text by simply inserting piles of spaces.

It would be really a pain to have to rewrite the above rule as:

/Hello\s*I'm\s*a\s*spammer/m

And also much slower if you have to do that for a few hundred rules.


>  Perl regexp modifiers m (and s)
> would be handy:
>
> body L_TEST  /^[A-Z]\s[A-Z]\s[A-Z]\s[A-Z]$/m
>
> but as it stands now the m modifier is of no use in 'body' rules
> (unlike in 'rawbody').
True. If you care about whitespace formatting and EOLs, use rawbody.

If you want to match text in a straightforward way, use body and let
SA's pre-processing of the text deal with simplifying whitespace.




Re: Using ^ and $ in SA Rules

Posted by Mark Martinec <Ma...@ijs.si>.
Theo Van Dinter writes:
> body rules aren't run on lines, they're run on paragraphs,
> so that text is in the middle of a string.

Matt Kettler writes:
> Use rawbody for this. Body rules have CR/LF stripped out.

Giving whole paragraphs to regexp is fine, but why are newlines
stripped out in 'body' rules?  Perl regexp modifiers m (and s)
would be handy:

body L_TEST  /^[A-Z]\s[A-Z]\s[A-Z]\s[A-Z]$/m

but as it stands now the m modifier is of no use in 'body' rules
(unlike in 'rawbody').

  Mark

Re: Using ^ and $ in SA Rules

Posted by Theo Van Dinter <fe...@apache.org>.
On Tue, Feb 20, 2007 at 12:26:17AM +0000, Ben Wylie wrote:
> >>so i wrote a body rule:
> >>/^[A-Z]\s[A-Z]\s[A-Z]\s[A-Z]$/
> >>
> >>Have I made a mistake here? How might I get a rule like this to
> >>work?

body rules aren't run on lines, they're run on paragraphs, so that text is in
the middle of a string.

-- 
Randomly Selected Tagline:
"Now they show you how detergents take out bloodstains, a pretty violent 
 image there. I think if you've got a T-shirt with a bloodstain all over 
 it, maybe laundry isn't your biggest problem.  Maybe you should get rid
 of the body before you do the wash."         - Jerry Seinfeld

Re: Using ^ and $ in SA Rules

Posted by Ben Wylie <sa...@benwylie.co.uk>.
John D. Hardin wrote:
> On Mon, 19 Feb 2007, Ben Wylie wrote:
> 
>> so i wrote a body rule:
>> /^[A-Z]\s[A-Z]\s[A-Z]\s[A-Z]$/
>>
>> unfortunately it doesn't hit when I expect it to:
>> C T C X
>>
>> If I take the ^ and $ parts, it does hit, but i would like it to
>> only hit if that is the only thing on a particular line.
>>
>> Have I made a mistake here? How might I get a rule like this to
>> work?
> 
> There might be leading and/or trailing space. Try:
> 
> /^\s?[A-Z]\s[A-Z]\s[A-Z]\s[A-Z]\s?$/

Thanks for the suggestion.
This still doesn't work for me.
An example email which doesn't hit can be found here:
http://www.arkbb.co.uk/ExampleEmail.txt
Thanks
Ben




Re: Using ^ and $ in SA Rules

Posted by "John D. Hardin" <jh...@impsec.org>.
On Mon, 19 Feb 2007, Ben Wylie wrote:

> so i wrote a body rule:
> /^[A-Z]\s[A-Z]\s[A-Z]\s[A-Z]$/
> 
> unfortunately it doesn't hit when I expect it to:
> C T C X
> 
> If I take the ^ and $ parts, it does hit, but i would like it to
> only hit if that is the only thing on a particular line.
> 
> Have I made a mistake here? How might I get a rule like this to
> work?

There might be leading and/or trailing space. Try:

/^\s?[A-Z]\s[A-Z]\s[A-Z]\s[A-Z]\s?$/

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  USMC Rules of Gunfighting #9: Accuracy is relative: most combat
  shooting standards will be more dependent on "pucker factor" than
  the inherent accuracy of the gun.
-----------------------------------------------------------------------
 3 days until George Washington's 275th Birthday