You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by John Hardin <jh...@impsec.org> on 2015/01/01 03:54:46 UTC

Re: New type of obfuscation?

On Wed, 31 Dec 2014, Martin Gregorie wrote:

> During last night I received a phishing message with a new (to me
> anyway) form of obfuscation which can only be used inside HTML body text
> using us-ascii encoding. The obfuscation was apparently aimed at SA and
> similar scanners because its not obvious to anybody reading the message:
> every 'o' (0x6f) in the text is replaced by &#959;
>
> My Perl-fu isn't good enough to encode this in a regex - can anybody
> help?

Take a look at 25_replace.cf (esp. tags C and E), and the various FUZZY_* 
rules. It's not feasible to do broadly, but specific commonly-obfuscated 
words and short phrases can be focused on and that potentially would help 
Bayes recognize such as spammy more quickly.

I've been extending 25_replace.cf as I see more different types of 
obfuscation like this, but it's a bit hard to keep up. Given a list of 
Unicode code points that look like specific Latin letters, it should not 
be hard to automatically generate the tag subrules for obfuscation for 
all the encodings.

Is there such a list anywhere already that could be leveraged? I know we 
were discussing unicode normalization of body text at one point, is there 
anything there we could use?

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   It is not the business of government to make men virtuous or
   religious, or to preserve the fool from the consequences of his own
   folly.                                              -- Henry George
-----------------------------------------------------------------------
  944 days since the first successful private support mission to ISS (SpaceX)

Re: New type of obfuscation?

Posted by Martin Gregorie <ma...@gregorie.org>.
On Fri, 2015-01-02 at 06:18 -0600, Dave Pooser wrote:

> Wouldn't that have to be a rawbody rule?
>
Thanks, Dave. I thought I was probably missing something obvious and
that was it.


Martin




Re: New type of obfuscation?

Posted by Dave Pooser <da...@pooserville.com>.
On 1/2/15 6:08 AM, "Martin Gregorie" <ma...@gregorie.org> wrote:

>The resulting
>regexes pass SA lint tests and match example spam when run as, for
>instance 
>
>    grep -P '\&\#959;' <saved_spam.txt
>
>but don't generate hits when used in an SA body rule as:
>
>    body MG_OBFUSCATION  /\&\#959;/

Wouldn't that have to be a rawbody rule?
-- 
Dave Pooser
Cat-Herder-in-Chief, Pooserville.com



Re: New type of obfuscation?

Posted by Martin Gregorie <ma...@gregorie.org>.
On Fri, 2015-01-02 at 09:15 +0100, Joolee wrote:
> You can start with http://homoglyphs.net/?unicodepos=1 and the search term
> homoglyphs might get you even more extensive lists.
> 
I realised that this was spam containing homoglyphs: a look at the
message showed it to be using an abnormal size and font so, since I have
my reader set up to display plain text rather than HTML, I knew that
there would be only HTML in the body. 

What I was asking about was how to write a regex that would match the
ofuscation encoding. I've had several attempts at it now. The resulting
regexes pass SA lint tests and match example spam when run as, for
instance 

    grep -P '\&\#959;' <saved_spam.txt

but don't generate hits when used in an SA body rule as:

    body MG_OBFUSCATION  /\&\#959;/


Martin




Re: New type of obfuscation?

Posted by Joolee <sp...@joolee.nl>.
You can start with http://homoglyphs.net/?unicodepos=1 and the search term
homoglyphs might get you even more extensive lists.

On 1 January 2015 at 03:54, John Hardin <jh...@impsec.org> wrote:

> On Wed, 31 Dec 2014, Martin Gregorie wrote:
>
>  During last night I received a phishing message with a new (to me
>> anyway) form of obfuscation which can only be used inside HTML body text
>> using us-ascii encoding. The obfuscation was apparently aimed at SA and
>> similar scanners because its not obvious to anybody reading the message:
>> every 'o' (0x6f) in the text is replaced by &#959;
>>
>> My Perl-fu isn't good enough to encode this in a regex - can anybody
>> help?
>>
>
> Take a look at 25_replace.cf (esp. tags C and E), and the various FUZZY_*
> rules. It's not feasible to do broadly, but specific commonly-obfuscated
> words and short phrases can be focused on and that potentially would help
> Bayes recognize such as spammy more quickly.
>
> I've been extending 25_replace.cf as I see more different types of
> obfuscation like this, but it's a bit hard to keep up. Given a list of
> Unicode code points that look like specific Latin letters, it should not be
> hard to automatically generate the tag subrules for obfuscation for all the
> encodings.
>
> Is there such a list anywhere already that could be leveraged? I know we
> were discussing unicode normalization of body text at one point, is there
> anything there we could use?
>
> --
>  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
>  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
>  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
> -----------------------------------------------------------------------
>   It is not the business of government to make men virtuous or
>   religious, or to preserve the fool from the consequences of his own
>   folly.                                              -- Henry George
> -----------------------------------------------------------------------
>  944 days since the first successful private support mission to ISS
> (SpaceX)
>

Re: New type of obfuscation?

Posted by Paul Stead <pa...@zeninternet.co.uk>.

On 01/01/15 02:54, John Hardin wrote:
>
> Is there such a list anywhere already that could be leveraged? I know we
> were discussing unicode normalization of body text at one point, is
> there anything there we could use?
>
I found

http://unicode.org/cldr/utility/confusables.jsp#data
http://www.irongeek.com/homoglyph-attack-generator.php
http://en.wikipedia.org/wiki/User:SoxBot/UAA/Homoglyphs
--
Paul Stead
Systems Engineer
Zen Internet