You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by Justin Mason <jm...@jmason.org> on 2007/07/27 16:08:55 UTC

add a new rule type: single-line body?

Just wondering.  would it be handy to have a new "body" type, the same as
"body" but matched as a single string, with all newlines converted to " "?
in other words, this text:

    I noticed we're now seeing a lot of folks using such an old perl 5.6.1
    that maybe we should update our SpamAssassin requirement to use 5.8.0
    as a bare minimum.
    
    I know that DBI requires 5.8.0 and states that while it may or may not
    build with a lesser version, you can't complain about it if you're
    using something that is over 5 yeas old!

    Heck I've still got a RH 6.2 sever online and have perl 5.8.x built
    and installed on it. 8*)


would be converted to this:

    "I noticed we're now seeing a lot of folks using such an old perl 5.6.1 that maybe we should update our SpamAssassin requirement to use 5.8.0 as a bare minimum. I know that DBI requires 5.8.0 and states that while it may or may not build with a lesser version, you can't complain about it if you're using something that is over 5 yeas old! Heck I've still got a RH 6.2 sever online and have perl 5.8.x built and installed on it. 8*)"

ie, no newlines, all whitespace converted to " ".  this would be optimal
for matching with phrase rules.  (To avoid exponential-runtime .*
problems, it'd chop the text after the first 8000 characters or so.)

This is based on what I've been doing with the "seek-phrases" script;
it appears it may allow us to catch some spam patterns we might otherwise
miss, from spammers exploiting our inability to use a "body" rule across
a paragraph boundary.  (Are they still doing that?)

--j.

Re: add a new rule type: single-line body?

Posted by Duncan Findlay <du...@rogers.com>.
On Jul 27, 2007, at 7:08 AM, Justin Mason wrote:

> Just wondering.  would it be handy to have a new "body" type, the  
> same as
> "body" but matched as a single string, with all newlines converted  
> to " "?
> in other words, this text:

[...]

> ie, no newlines, all whitespace converted to " ".  this would be  
> optimal
> for matching with phrase rules.  (To avoid exponential-runtime .*
> problems, it'd chop the text after the first 8000 characters or so.)

How is this different than rawbody /s rules?



Re: add a new rule type: single-line body?

Posted by Loren Wilton <lw...@earthlink.net>.
> Just wondering.  would it be handy to have a new "body" type, the same as
> "body" but matched as a single string, with all newlines converted to " "?
> in other words, this text:

It might be beneficial to convert the newlines to spaces, but it might also 
be beneficial to leave them there so that they can be explicitly checked for 
as a ratware pattern.  Isn't there a regex /something that treats newlines 
as spaces?  If so that might be the better way to do it, since the rule 
writer can have it either way.

Would would DEFINITELY be useful would be to stop breaking body on paragraph 
boundaries and to put it all into one string.  Or as you mention, "all" up 
to some convenient arbitrary limit.  I don't know how many full-body hacks 
I've had to write to get around the problems with body breaking on paragraph 
bodies.

        Loren