You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by David Benigni <db...@lutron.com> on 2005/08/30 19:50:44 UTC

SA rawbody rule problem

Hello,

I'm having some problems with a rule.  I'm filtering based on
particular words (yeah, its not good to do that) and its catching things
that I don't think it should.  I can't seem to find the problem.  Here
is the rule:

rawbody  BADWORD_RULE_1 /\b(?:xxx|porn)\b/i
describe BADWORD_RULE_1 Unacceptable word or phrase
score    BADWORD_RULE_1 0.1

The problem is that if I have an email with an attachment, its possible
the xxx part crops up from an encoded file.  If I run the email through
perl program with the same regex it doesn't pick it out, but SA seems
to.

Does anyone have any ideas?  I appreciate any help.

Thanks,
Dave

Re: SA rawbody rule problem

Posted by Matt Kettler <mk...@evi-inc.com>.
David Benigni wrote:
> Hello,
> 
> I'm having some problems with a rule.  I'm filtering based on
> particular words (yeah, its not good to do that) and its catching things
> that I don't think it should.  I can't seem to find the problem.  Here
> is the rule:
> 
> rawbody  BADWORD_RULE_1 /\b(?:xxx|porn)\b/i
> describe BADWORD_RULE_1 Unacceptable word or phrase
> score    BADWORD_RULE_1 0.1
> 
> The problem is that if I have an email with an attachment, its possible
> the xxx part crops up from an encoded file.  If I run the email through
> perl program with the same regex it doesn't pick it out, but SA seems
> to.
What SA version are you using?

AFAIK, You'd see this behavior for rawbody rules with 2.6x, but not with 3.x.

Workaround: use body instead of rawbody, but this won't match html tags. Add a
second uri rule to catch those.



Re: SA rawbody rule problem

Posted by Loren Wilton <lw...@earthlink.net>.
> The problem is that if I have an email with an attachment, its possible
> the xxx part crops up from an encoded file.  If I run the email through
> perl program with the same regex it doesn't pick it out, but SA seems
> to.

[source email] (full) ->
    decoding -> [decoded email] (rawbody) ->
        html decoding -> [text mail] (body)

Rawbody is the middle of the three steps.  You are probably hitting on some
html tag or the like.

If you are looking for words that show up in visible text, or in the subject
in visible text, then use 'body' as your rule base.  'rawbody' is best used
when looking for html tags.

        Loren