You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by John Hardin <jh...@impsec.org> on 2014/06/06 03:44:33 UTC
Rule writing: new text obfuscation mechanism
All:
I've run across a new text obfuscation method in active use by spammers.
It appears to be an attempt to bypass RE-based text matching of words.
Rules you write will need modification to not be spoofed by this.
Unfortunately the RE engine considers the underscore as being a "word"
character, so a rule like /\bthis advertisement\b/ can be defeated by
replacing the spaces in the sentence with underscores. This is still
readable to a human but foils the word-boundary check.
Recommendation: instead of a bare \b, use (?:\b|_) and instead of embedded
spaces use [-_\s]
Examples:
Manage_advertising_preferences_here
To_remove_yourself_from_this_admail,_please_do_so_here
Be_removed_from_this_important_offer
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
*Your* lack of self-control does not give you the authority to
dictate limitations on *my* freedom.
-----------------------------------------------------------------------
Tomorrow: the 70th anniversary of D-Day
Re: Rule writing: new text obfuscation mechanism
Posted by Joe Quinn <jq...@pccc.com>.
The way we handle it in
http://www.pccc.com/downloads/SpamAssassin/contrib/KAM.cf is to use a
regex like /this.advertisement/ unanchored by \b.
When matching against phrases like yours, we find the word boundary does
not add any specificity to the rule because the odds of matching against
a different word or phrase is nil, and we catch almost every obfuscation
of word boundaries.
Good catch though, we do have some rules in KAM.cf that can be avoided
by this, and off the top of my head I can think of several stock SA
rules that are vulnerable too.
On 6/5/2014 9:44 PM, John Hardin wrote:
> All:
>
> I've run across a new text obfuscation method in active use by
> spammers. It appears to be an attempt to bypass RE-based text matching
> of words. Rules you write will need modification to not be spoofed by
> this.
>
> Unfortunately the RE engine considers the underscore as being a "word"
> character, so a rule like /\bthis advertisement\b/ can be defeated by
> replacing the spaces in the sentence with underscores. This is still
> readable to a human but foils the word-boundary check.
>
> Recommendation: instead of a bare \b, use (?:\b|_) and instead of
> embedded spaces use [-_\s]
>
> Examples:
>
> Manage_advertising_preferences_here
>
> To_remove_yourself_from_this_admail,_please_do_so_here
>
> Be_removed_from_this_important_offer
>