You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by Dan <a...@patnode.net> on 2006/05/17 23:59:16 UTC

Negative lookaround?

Sick of obsfucation, I'm going to town on spacing and letter  
variations, with one problem:

body __OBSFU_FRE1a /\bFREE\b/i
body __OBSFU_FRE1b /\bF(\s|\s\s|\s\S|\s\S\s|\S\s|\S)?R(\s|\s\s|\s\S|\s 
\S\s|\S\s|\S)?E(\s|\s\s|\s\S|\s\S\s|\S\s|\S)?E\b/i
meta __OBSFU_FRE1 (!__OBSFU_FRE1a && __OBSFU_FRE1b)


"Every variation" includes the whole world: FREE.  To exclude the  
whole word, I created a meta exception but as you might guess, this  
also finds the whole word elsewhere in the same message.  While its  
odd to have one word mangled and another not, spammers do it.  I'm  
told a negative lookaround will solve this problem, but I can't  
figure out how to do it.  Everything I've read relates to neighboring  
text, not the same text.

How do I write a single regex that includes every variation except a  
single specific one?

Thanks,
Dan

Re: Negative lookaround?

Posted by Matt Kettler <mk...@evi-inc.com>.
Dan wrote:

>>
>> I'd also suggest that the last-term of bare \S is not such a good idea.
>> /F\S?R\S?E\S?E\S?/i will match Frisbee.
Self-correction. That won't match Frisbee..

 However, it will match enforcement. (enFoRcEmEnt)

And  forever  (FoREvEr)

Admittedly the \b's of your original rule would keep those matches out, but you
get the idea.. There's a lot of words that have "free" embedded in them, but
there's a lot more that have F\S?R\S?E\S?E embedded in them.

Some may not get excluded by your \b requirement, such as the german domain
"frede.de", and the Surname "Foree"

http://en.wikipedia.org/wiki/Ken_Foree






Re: Negative lookaround?

Posted by Dan <a...@patnode.net>.
> It's looking like you want to use the ReplaceTags plugin.  Check  
> out the default rules.

Probably less work, thank you


> Do you mean negative lookahead?
>
> body __OBSFU_FRE1 /(?!FREE)\bF(\s|\s\s|\s\S...

Thats what I'm talking about, thank you


> It's not "negative lookarround" it's "negative lookahead"
>
> You use (?! ) to group-off a negative look-ahead.
>
> body __OBSFU_FRE1
> /(?!FREE)\bF(\s|\s\s|\s\S|\s\S\s|\S\s|\S)?R(\s|\s\s|\s\S|\s\S\s|\S 
> \s|\S)?E(\s|\s\s|\s\S|\s\S\s|\S\s|\S)?E\b/i
>
> Also, Might I suggest that (\s|\s\s|\s\S|\s\S\s|\S\s|\S)? is not  
> very optimal
> here. At minimum, turn off capture for the group by using (?: at  
> the start.
>
> I'd also suggest that the last-term of bare \S is not such a good  
> idea.
> /F\S?R\S?E\S?E\S?/i will match Frisbee.
>
> I would also consider replacing the whole  (\s|\s\s|\s\S|\s\S\s|\S 
> \s|\S)? group
> with just: \s+\S?\s?  This does force at least one whitespace  
> character in the
> match, but that fixes the Frisbee problem.
>
>
> So you might wish to try:
>
> body __OBSFU_FRE1 /(?!FREE)\bF\s+\S?\s?R\s+\S?\s?E\s+\S?\s?E\b/i

You guys rock!    :)

Dan

Re: Negative lookaround?

Posted by Theo Van Dinter <fe...@apache.org>.
On Wed, May 17, 2006 at 02:59:16PM -0700, Dan wrote:
> How do I write a single regex that includes every variation except a  
> single specific one?

It's looking like you want to use the ReplaceTags plugin.  Check out the
default rules.

-- 
Randomly Generated Tagline:
"She's got a mortgage on my body and a lease on my soul."

Re: Negative lookaround?

Posted by Matt Kettler <mk...@evi-inc.com>.
Dan wrote:
> Sick of obsfucation, I'm going to town on spacing and letter variations,
> with one problem:
> 
> body __OBSFU_FRE1a /\bFREE\b/i
> body __OBSFU_FRE1b
> /\bF(\s|\s\s|\s\S|\s\S\s|\S\s|\S)?R(\s|\s\s|\s\S|\s\S\s|\S\s|\S)?E(\s|\s\s|\s\S|\s\S\s|\S\s|\S)?E\b/i
> meta __OBSFU_FRE1 (!__OBSFU_FRE1a && __OBSFU_FRE1b)
> 
> 
> "Every variation" includes the whole world: FREE.  To exclude the whole
> word, I created a meta exception but as you might guess, this also finds
> the whole word elsewhere in the same message.  While its odd to have one
> word mangled and another not, spammers do it.  I'm told a negative
> lookaround will solve this problem, but I can't figure out how to do
> it.  Everything I've read relates to neighboring text, not the same text.  
> 
> How do I write a single regex that includes every variation except a
> single specific one?
It's not "negative lookarround" it's "negative lookahead"

You use (?! ) to group-off a negative look-ahead.

body __OBSFU_FRE1
/(?!FREE)\bF(\s|\s\s|\s\S|\s\S\s|\S\s|\S)?R(\s|\s\s|\s\S|\s\S\s|\S\s|\S)?E(\s|\s\s|\s\S|\s\S\s|\S\s|\S)?E\b/i

Also, Might I suggest that (\s|\s\s|\s\S|\s\S\s|\S\s|\S)? is not very optimal
here. At minimum, turn off capture for the group by using (?: at the start.

I'd also suggest that the last-term of bare \S is not such a good idea.
/F\S?R\S?E\S?E\S?/i will match Frisbee.

I would also consider replacing the whole  (\s|\s\s|\s\S|\s\S\s|\S\s|\S)? group
with just: \s+\S?\s?  This does force at least one whitespace character in the
match, but that fixes the Frisbee problem.


So you might wish to try:

body __OBSFU_FRE1 /(?!FREE)\bF\s+\S?\s?R\s+\S?\s?E\s+\S?\s?E\b/i



Re[2]: Negative lookaround?

Posted by Robert Menschel <Ro...@Menschel.net>.
Hello Matt,

Wednesday, May 17, 2006, 4:04:39 PM, you wrote:

MK> Some of the shorter results are:

MK> body      SARE_OBFU_BACK_NUM       m'(?!BACK)\bb\d?a\d?c\d?k\b'i
MK> body      SARE_OBFU_SAVE_NUM       m'(?!save)\bs\d?a\d?v\d?e\b'i
MK> body      SARE_OBFU_SAVINGS_NUM    m'(?!savings)\bs\d?a\d?v\d?i\d?n\d?g\d?s\b'i
MK> body      SARE_OBFU_NUM_YOUR       m'(?!YOUR)\bY\d?O\d?U\d?R\b'i

MK> (why the author used m' instead of / is beyond me, as it serves no purpose in
MK> these rules..  but a lot of SARE rules have really weird style so I'll chalk it
MK> up to weird style.)

Many obfu rules need to test for letter substitutions, such as \/ for
V.  Those rules need a lot of quoting unless you use a construct like
m'regex' to eliminate that need.  After several passes of "gosh,
that's another one that needs quoting," I got into the habit of always
using m'regex' for any/all obfu rules.

Bob Menschel




Re: Negative lookaround?

Posted by Matt Kettler <mk...@evi-inc.com>.
David B Funk wrote:
> On Wed, 17 May 2006, Stuart Johnston wrote:
> 
>>> "Every variation" includes the whole world: FREE.  To exclude the whole
>>> word, I created a meta exception but as you might guess, this also finds
>>> the whole word elsewhere in the same message.  While its odd to have one
>>> word mangled and another not, spammers do it.  I'm told a negative
>>> lookaround will solve this problem, but I can't figure out how to do
>>> it.  Everything I've read relates to neighboring text, not the same text.
>>>
>>> How do I write a single regex that includes every variation except a
>>> single specific one?
>> Do you mean negative lookahead?
>>
>> body __OBSFU_FRE1 /(?!FREE)\bF(\s|\s\s|\s\S...
> 
> Almost, you -really- want that '\b' pattern enclosing the negative
> lookahead qualifier, otherwise it won't give you the expected results.

Are you sure? I find this works just fine.

FUZZY_MILF from the standard ruleset does this, as do 10 rules in
70_sare_obfu0.cf and 1 in 70_sare_specific.cf

Try grep -P '\(\?\!\w+\)\\b' 70_sare_obfu0.cf

Some of the shorter results are:

body      SARE_OBFU_BACK_NUM       m'(?!BACK)\bb\d?a\d?c\d?k\b'i
body      SARE_OBFU_SAVE_NUM       m'(?!save)\bs\d?a\d?v\d?e\b'i
body      SARE_OBFU_SAVINGS_NUM    m'(?!savings)\bs\d?a\d?v\d?i\d?n\d?g\d?s\b'i
body      SARE_OBFU_NUM_YOUR       m'(?!YOUR)\bY\d?O\d?U\d?R\b'i


(why the author used m' instead of / is beyond me, as it serves no purpose in
these rules..  but a lot of SARE rules have really weird style so I'll chalk it
up to weird style.)


Re: Negative lookaround?

Posted by Dan <a...@patnode.net>.
> Almost, you -really- want that '\b' pattern enclosing the negative
> lookahead qualifier, otherwise it won't give you the expected results.

> So try:
>
> body OBSFU_FRE1 /\b(?!FREE)F(\s|\s\s|\s\S...

Sweet.  Is a \b also needed at the end:  (?!FREE\b) or does the main  
one at the end handle it?:

body __OBSFU_FRE1 /\b(?!FREE)F(\s|\s\s|\s\S|\s\S\s|\S\s|\S)?R(\s|\s\s| 
\s\S|\s\S\s|\S\s|\S)?E(\s|\s\s|\s\S|\s\S\s|\S\s|\S)?E\b/i


> In that pattern you have "F(\s|\s\s|\s\S|\s\S\s|\S\s|\S)?R"
> Note that '(\s|\S)' says fire if it's either a space or a non-space
> so that is functionally equivalent to '.' (IE the wildcard character).
> Is that what you wanted? It may match on some real words.

Good point.  A dot would be more compact but by expressing every  
separator, I can more easily kill specific variations (just one or  
just the other) that FP.  Some also only trigger when multiples are  
found or when found alone only in the subject.

Dan

Re: Negative lookaround?

Posted by David B Funk <db...@engineering.uiowa.edu>.
On Wed, 17 May 2006, Stuart Johnston wrote:

> > "Every variation" includes the whole world: FREE.  To exclude the whole
> > word, I created a meta exception but as you might guess, this also finds
> > the whole word elsewhere in the same message.  While its odd to have one
> > word mangled and another not, spammers do it.  I'm told a negative
> > lookaround will solve this problem, but I can't figure out how to do
> > it.  Everything I've read relates to neighboring text, not the same text.
> >
> > How do I write a single regex that includes every variation except a
> > single specific one?
>
> Do you mean negative lookahead?
>
> body __OBSFU_FRE1 /(?!FREE)\bF(\s|\s\s|\s\S...

Almost, you -really- want that '\b' pattern enclosing the negative
lookahead qualifier, otherwise it won't give you the expected results.
Since the negative lookahead removed the need for the meta rule, you
want this to be a real standalone rule so remove the leading '__' too.

So try:

body OBSFU_FRE1 /\b(?!FREE)F(\s|\s\s|\s\S...

In that pattern you have "F(\s|\s\s|\s\S|\s\S\s|\S\s|\S)?R"
Note that '(\s|\S)' says fire if it's either a space or a non-space
so that is functionally equivalent to '.' (IE the wildcard character).
Is that what you wanted? It may match on some real words.

-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: Negative lookaround?

Posted by Stuart Johnston <st...@ebby.com>.
Dan wrote:
> Sick of obsfucation, I'm going to town on spacing and letter variations, 
> with one problem:
> 
> body __OBSFU_FRE1a /\bFREE\b/i
> body __OBSFU_FRE1b 
> /\bF(\s|\s\s|\s\S|\s\S\s|\S\s|\S)?R(\s|\s\s|\s\S|\s\S\s|\S\s|\S)?E(\s|\s\s|\s\S|\s\S\s|\S\s|\S)?E\b/i
> meta __OBSFU_FRE1 (!__OBSFU_FRE1a && __OBSFU_FRE1b)
> 
> 
> "Every variation" includes the whole world: FREE.  To exclude the whole 
> word, I created a meta exception but as you might guess, this also finds 
> the whole word elsewhere in the same message.  While its odd to have one 
> word mangled and another not, spammers do it.  I'm told a negative 
> lookaround will solve this problem, but I can't figure out how to do 
> it.  Everything I've read relates to neighboring text, not the same text.  
> 
> How do I write a single regex that includes every variation except a 
> single specific one?

Do you mean negative lookahead?

body __OBSFU_FRE1 /(?!FREE)\bF(\s|\s\s|\s\S...