You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by John Hardin <jh...@impsec.org> on 2014/06/06 03:44:33 UTC

Rule writing: new text obfuscation mechanism

All:

I've run across a new text obfuscation method in active use by spammers. 
It appears to be an attempt to bypass RE-based text matching of words. 
Rules you write will need modification to not be spoofed by this.

Unfortunately the RE engine considers the underscore as being a "word" 
character, so a rule like /\bthis advertisement\b/ can be defeated by 
replacing the spaces in the sentence with underscores. This is still 
readable to a human but foils the word-boundary check.

Recommendation: instead of a bare \b, use (?:\b|_) and instead of embedded 
spaces use [-_\s]

Examples:

Manage_advertising_preferences_here

To_remove_yourself_from_this_admail,_please_do_so_here

Be_removed_from_this_important_offer

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   *Your* lack of self-control does not give you the authority to
   dictate limitations on *my* freedom.
-----------------------------------------------------------------------
  Tomorrow: the 70th anniversary of D-Day

Re: Rule writing: new text obfuscation mechanism

Posted by Joe Quinn <jq...@pccc.com>.
The way we handle it in 
http://www.pccc.com/downloads/SpamAssassin/contrib/KAM.cf is to use a 
regex like /this.advertisement/ unanchored by \b.

When matching against phrases like yours, we find the word boundary does 
not add any specificity to the rule because the odds of matching against 
a different word or phrase is nil, and we catch almost every obfuscation 
of word boundaries.

Good catch though, we do have some rules in KAM.cf that can be avoided 
by this, and off the top of my head I can think of several stock SA 
rules that are vulnerable too.

On 6/5/2014 9:44 PM, John Hardin wrote:
> All:
>
> I've run across a new text obfuscation method in active use by 
> spammers. It appears to be an attempt to bypass RE-based text matching 
> of words. Rules you write will need modification to not be spoofed by 
> this.
>
> Unfortunately the RE engine considers the underscore as being a "word" 
> character, so a rule like /\bthis advertisement\b/ can be defeated by 
> replacing the spaces in the sentence with underscores. This is still 
> readable to a human but foils the word-boundary check.
>
> Recommendation: instead of a bare \b, use (?:\b|_) and instead of 
> embedded spaces use [-_\s]
>
> Examples:
>
> Manage_advertising_preferences_here
>
> To_remove_yourself_from_this_admail,_please_do_so_here
>
> Be_removed_from_this_important_offer
>