You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by dhanushka ranasinghe <pa...@gmail.com> on 2012/04/17 14:03:50 UTC

spamassassin rule set issue

Hi.. guys

i have following rule in place in spamassassin,

rawbody  BLOCK_RULE2 /(\W|^)Orange(\W|^)/i
score BLOCK_RULE2 50
describe BLOCK_RULE2 Bad Word

but one of my mails got blocked even-though its doesn't have  word
"Orange" , but when search via the mail spamassassin show mail has
word Orange by displaying following.but that mail have words like
"Orangicat"

 pts rule name              description
 ---- ---------------------- --------------------------------------------------
 -0.7 RCVD_IN_DNSWL_LOW      RBL: Sender listed at http://www.dnswl.org/, low
                             trust
                             [209.85.217.180 listed in list.dnswl.org]
 -0.0 SPF_PASS               SPF: sender matches SPF record
  0.0 WEIRD_PORT             URI: Uses non-standard port number for HTTP
  0.0 HTML_MESSAGE           BODY: HTML included in message
 -1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
                             [score: 0.0000]
   50 BLOCK_RULE2 RAW: Bad Word
X-Spam-Flag: YES




Any idea why this is happening  ?

Thank You
Dhanushka

Re: spamassassin rule set issue

Posted by Martin Gregorie <ma...@gregorie.org>.

On Tue, 2012-04-17 at 15:18 +0200, Tom Kinghorn wrote:

> > /\borange\b/i is what I'd use.
> >
>
I should have added that the latest versions of grep understand Perl
regex syntax, which can be useful for rapidly checking regexes before
writing an SA rule. The main difference is that the regex should be
enclosed in single quotes rather than forward slashes and the 'm' prefix
used by Perl to change the regex delimiters isn't understood and nor is
the /../i suffix. For example I was able to very rapidly run through the
suggestions for this case by using something like 

	grep -iP '\bOrange\b' <words.txt

where the -P option says that the regex is in Perl syntax, the -i option
sets case insensitivity and word.txt contains:

a line
Orange
an Orange
a drink of Orangeade now
a final line

Beware that the grep man page says "This is  highly experimental and
grep -P may warn of unimplemented features." IOW using grep is only the
first step in developing a rule. You should still check the completed SA
rule against both ham, spam and (preferably) edge cases to make sure it
does no more and no less than you want it to do.


Martin





> >
> > Martin
> >
> >
> Noted.
> 
> Thanks Martin.

Re: spamassassin rule set issue

Posted by Tom Kinghorn <th...@gmail.com>.

On 17/04/2012 15:15, Martin Gregorie wrote:
> On Tue, 2012-04-17 at 14:39 +0200, Tom Kinghorn wrote:
> Indeed, and /^Orange$/i will only match Orange if it is the entire line.
> In fact, as SA converts each paragraph into one long line in body rules,
> it will only match a paragraph containing just the word 'Orange'.
>
> /\borange\b/i is what I'd use.
>
>
> Martin
>
>
Noted.

Thanks Martin.

Re: spamassassin rule set issue

Posted by Martin Gregorie <ma...@gregorie.org>.

On Tue, 2012-04-17 at 14:39 +0200, Tom Kinghorn wrote:
> On 17/04/2012 14:18, dhanushka ranasinghe wrote:
> > Hi.. guys..
> >
> > I don't think regex is the issue , i tested the /(\W|^)Orange(\W|^)/i
> > its correctly doing the exact word match
> >
> >
> > Thank You
> > Dhanushka
> >
> 
> Firstly, please do not "top post"
> 
> Secondly, I disagree with you completely.....
> 
> The ^ (carat) indicates "start of the word", so why have it at the end???
>
Indeed, and /^Orange$/i will only match Orange if it is the entire line.
In fact, as SA converts each paragraph into one long line in body rules,
it will only match a paragraph containing just the word 'Orange'.

/\borange\b/i is what I'd use.


Martin

Re: spamassassin rule set issue

Posted by RW <rw...@googlemail.com>.

On Tue, 17 Apr 2012 14:39:41 +0200
Tom Kinghorn wrote:

> On 17/04/2012 14:18, dhanushka ranasinghe wrote:
> > Hi.. guys..
> >
> > I don't think regex is the issue , i tested
> > the /(\W|^)Orange(\W|^)/i its correctly doing the exact word match
> >
> >
> > Thank You
> > Dhanushka
> >
> 
> Firstly, please do not "top post"
> 
> Secondly, I disagree with you completely.....
> 
> The ^ (carat) indicates "start of the word", so why have it at the
> end???

It is pretty irrelevant. Clearly it should have
been /(\W|^)Orange(\W|$)/i, or simply /\bOrange\b/i, but that would only
make it fail to fire in a minority of cases, and the problem here is
that it's an FP.

I wonder if perhaps spamd (or whatever daemon is being used) has an
older version and needs to be restarted.

Re: spamassassin rule set issue

Posted by Tom Kinghorn <th...@gmail.com>.

On 17/04/2012 14:18, dhanushka ranasinghe wrote:
> Hi.. guys..
>
> I don't think regex is the issue , i tested the /(\W|^)Orange(\W|^)/i
> its correctly doing the exact word match
>
>
> Thank You
> Dhanushka
>

Firstly, please do not "top post"

Secondly, I disagree with you completely.....

The ^ (carat) indicates "start of the word", so why have it at the end???

Re: spamassassin rule set issue

Posted by Swati R <sw...@gmail.com>.

Try testing below rules, if you are trying to flag the mails containing the
exact 'orange' word only and not other such as orangecat.

Rest will depend upon requirement.

Thanks,
Swati

On Tue, Apr 17, 2012 at 5:48 PM, dhanushka ranasinghe <
parakrama1282@gmail.com> wrote:

> Hi.. guys..
>
> I don't think regex is the issue , i tested the /(\W|^)Orange(\W|^)/i
> its correctly doing the exact word match
>
>
> Thank You
> Dhanushka
>
> On 17 April 2012 17:44, Swati R <sw...@gmail.com> wrote:
> >
> >
> > On Tue, Apr 17, 2012 at 5:39 PM, Tom Kinghorn <thomas.kinghorn@gmail.com
> >
> > wrote:
> >>
> >> On 17/04/2012 14:03, dhanushka ranasinghe wrote:
> >>>
> >>>
> >>> Any idea why this is happening  ?
> >>>
> >>> Thank You
> >>> Dhanushka
> >>>
> >>
> >> Try
> >>
> >> /^Orange$/i
> >>
> >> The $ specifies end of the word.
> >>
> >> Regards
> >> Tom
> >
> >
> > I think, this should work :
> >
> > /\bOrange\b/i
> >
> > Regards,
> > Swati
>

Re: spamassassin rule set issue

Posted by dhanushka ranasinghe <pa...@gmail.com>.

Hi.. guys..

I don't think regex is the issue , i tested the /(\W|^)Orange(\W|^)/i
its correctly doing the exact word match


Thank You
Dhanushka

On 17 April 2012 17:44, Swati R <sw...@gmail.com> wrote:
>
>
> On Tue, Apr 17, 2012 at 5:39 PM, Tom Kinghorn <th...@gmail.com>
> wrote:
>>
>> On 17/04/2012 14:03, dhanushka ranasinghe wrote:
>>>
>>>
>>> Any idea why this is happening  ?
>>>
>>> Thank You
>>> Dhanushka
>>>
>>
>> Try
>>
>> /^Orange$/i
>>
>> The $ specifies end of the word.
>>
>> Regards
>> Tom
>
>
> I think, this should work :
>
> /\bOrange\b/i
>
> Regards,
> Swati

Re: spamassassin rule set issue

Posted by Swati R <sw...@gmail.com>.

On Tue, Apr 17, 2012 at 5:39 PM, Tom Kinghorn <th...@gmail.com>wrote:

> On 17/04/2012 14:03, dhanushka ranasinghe wrote:
>
>>
>> Any idea why this is happening  ?
>>
>> Thank You
>> Dhanushka
>>
>>
> Try
>
> /^Orange$/i
>
> The $ specifies end of the word.
>
> Regards
> Tom
>

I think, this should work :

/\bOrange\b/i

Regards,
Swati

Re: spamassassin rule set issue

Posted by Tom Kinghorn <th...@gmail.com>.

On 17/04/2012 14:03, dhanushka ranasinghe wrote:
>
> Any idea why this is happening  ?
>
> Thank You
> Dhanushka
>

Try

/^Orange$/i

The $ specifies end of the word.

Regards
Tom

Re: spamassassin rule set issue

Posted by Joseph Brennan <br...@columbia.edu>.


>> rawbody  BLOCK_RULE2 /(\W|^)Orange(\W|^)/i

> Some good suggestions here already.  While your original regexp should
> have worked in most cases, the optimal regexp for this situation is:
>
> /\borange\b/i
>


And probably body, not rawbody.  Rawbody won't match if the spammer
obfuscates words with html tags, e.g. ora<tag>nge.

Joseph Brennan
Columbia University Information Technology

Re: spamassassin rule set issue

Posted by Bowie Bailey <Bo...@BUC.com>.

On 4/17/2012 8:03 AM, dhanushka ranasinghe wrote:
> Hi.. guys
>
> i have following rule in place in spamassassin,
>
> rawbody  BLOCK_RULE2 /(\W|^)Orange(\W|^)/i
> score BLOCK_RULE2 50
> describe BLOCK_RULE2 Bad Word
>
> but one of my mails got blocked even-though its doesn't have  word
> "Orange" , but when search via the mail spamassassin show mail has
> word Orange by displaying following.but that mail have words like
> "Orangicat"

Some good suggestions here already.  While your original regexp should
have worked in most cases, the optimal regexp for this situation is:

/\borange\b/i

(as has been noted previously)

If you are still having problems with false positives, please post the
exact rule you are using and put one or two samples of emails that
generate the false positive in pastebin so that we can see exactly what
is happening.  Since we are talking about a body rule here, it is ok to
munge the headers if you are worried about privacy.  If you make any
changes to the subject or body, please run it through SA afterwards to
make sure it still generates the false positive.

-- 
Bowie