You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Pedro David Marco <pe...@yahoo.com> on 2017/02/01 11:17:35 UTC

fake base64 encoding

Hi!
i have noticed that when an email contains this (wrong) headers:
Content-Type: text/html; charset="utf-8"Content-Transfer-Encoding: base64
as SMTP headers, not MIME headers, and the email body is not base64 enconded, email clients as Thunderbird show the content correctly butSpamAssasin body rules are blind.
Example:
suppose a rule like  body TEST_TEXT_DETECTED /TEST TEXT/ score TEST_TEXT_DETECTED 1 describe         TEST_TEXT_DETECTED Test text detected
 With this  .eml:  (minimized)
 # cat test.eml From: LinkedIn Email Confirmation <em...@linkedin.naver.com> To: "jcornago@" <sia.es jcornago@sia.es> Date: Tue, 8 Mar 2016 04:18:08 +0000  Subject: Please confirm your email address  TEST TEXT #
The rule triggers ok!  Now i add headers: 
 # cat test.eml  From: LinkedIn Email Confirmation <em...@linkedin.naver.com> To: "jcornago@" <sia.es jcornago@sia.es> Subject: Please confirm your email address Date: Tue, 8 Mar 2016 04:18:08 +0000 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: base64  TEST TEXT #
And the rule never triggers! 
It makes sense since SA tries to decode the body before applying rules but Thunderbird shows the email correctly in both cases (the email is human readable). Can anyone please try it as well to discard it is only me...  just add those 2 headers at the end of  smtp headers section..

Thanks!
--------Pedro

Re: fake base64 encoding

Posted by Pedro David Marco <pe...@yahoo.com>.

Correction: 
Some Outlook versions do show the email just as Thunderbird does.. so most users can see the email but SA...

      From: Pedro David Marco <pe...@yahoo.com>
 To: Kevin A. McGrail <KM...@PCCC.com>; SA Mailing List <us...@spamassassin.apache.org> 
 Sent: Thursday, February 2, 2017 5:30 AM
 Subject: Re: fake base64 encoding

Thanks Kevin,

I did a similar rule to detect it but with higher score (3) since we are seeing a huge LinkedIn Phishing campaign using this technique, that on purpose or by mistake is evading most SA rules...
I agree that Thunderbird may be doing it wrong. Outlook seems to do it right.

>I would say Thunderbird is not parsing it correctly.  Looking to see if this is a spam indicator.

>I ran some test cases with this rule:
>#Bad UTF--8 content type and transfer encoding

>header   __KAM_BAD_UTF8_1               Content-Type =~ /text\/html; charset=\"utf-8\"/i
>header   __KAM_BAD_UTF8_2               Content-Transfer-Encoding =~ /base64/i
>meta    KAM_BAD_UTF8    (__KAM_BAD_UTF8_1 + __KAM_BAD_UTF8_2 >= 2)
>score   KAM_BAD_UTF8    1.0
>describe KAM_BAD_UTF8   Bad Content Type and Transfer Encoding that attempts to evade SA scanning
 >
 >
>So far not seeing any sign it's in the wild.  Have you?
-----
Pedro

Re: fake base64 encoding

Posted by John Wilcock <jo...@tradoc.fr>.

Le 02/02/2017  15:50, RW a crit :
> On Thu, 2 Feb 2017 05:43:24 -0500
> Kevin A. McGrail wrote:
...
>> I will score much higher since it is in the wild.  Can you throw a
>> spample up on pastebin?
> Perhaps text/html makes a big difference, but base64 encoded utf-8
> text is not uncommon these days - particularly outside North America.
>
> To score it higher you might want to include a "full" rule that checks
> for base64 encoding in the headers followed by illegal whitespace near
> the beginning of what should be the base64 text.

Indeed. In my (very small) corpus, I see lots of base64-encoded utf-8 
text/html parts of multipart messages, but very few non-multipart examples.

All of the latter really are base64-encoded, rather than plain text 
labelled as base64, but that may simply be due to the small size of my 
corpus. As it happens they are all spam, but I'm not convinced that 
hitting on any utf-8 text/html message that purports to be 
base64-encoded, regardless of whether it is actually base64 or not, is a 
good idea.

FWIW,
John

Re: fake base64 encoding

Posted by RW <rw...@googlemail.com>.

On Thu, 2 Feb 2017 05:43:24 -0500
Kevin A. McGrail wrote:

> On 2/1/2017 11:30 PM, Pedro David Marco wrote:
> > I did a similar rule to detect it but with higher score (3) since
> > we are seeing a huge LinkedIn Phishing campaign using this
> > technique, that on purpose or by mistake is evading most SA
> > rules...  
> I will score much higher since it is in the wild.  Can you throw a 
> spample up on pastebin?

Perhaps text/html makes a big difference, but base64 encoded utf-8
text is not uncommon these days - particularly outside North America. 

To score it higher you might want to include a "full" rule that checks
for base64 encoding in the headers followed by illegal whitespace near
the beginning of what should be the base64 text.

Re: fake base64 encoding

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.

On 2/2/2017 5:43 AM, Kevin A. McGrail wrote:
> On 2/1/2017 11:30 PM, Pedro David Marco wrote:
>> I did a similar rule to detect it but with higher score (3) since we 
>> are seeing a huge LinkedIn Phishing campaign using this technique, 
>> that on purpose or by mistake is evading most SA rules...
> I will score much higher since it is in the wild.  Can you throw a 
> spample up on pastebin?
I've also create a bug for this 
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7388

Regards,
KAM

Re: fake base64 encoding

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.

On 2/1/2017 11:30 PM, Pedro David Marco wrote:
> I did a similar rule to detect it but with higher score (3) since we 
> are seeing a huge LinkedIn Phishing campaign using this technique, 
> that on purpose or by mistake is evading most SA rules...
I will score much higher since it is in the wild.  Can you throw a 
spample up on pastebin?

Regards,
KAM

Re: fake base64 encoding

Posted by Pedro David Marco <pe...@yahoo.com>.

Thanks Kevin,

I did a similar rule to detect it but with higher score (3) since we are seeing a huge LinkedIn Phishing campaign using this technique, that on purpose or by mistake is evading most SA rules...
I agree that Thunderbird may be doing it wrong. Outlook seems to do it right.

>I would say Thunderbird is not parsing it correctly.  Looking to see if this is a spam indicator.


>I ran some test cases with this rule:
>#Bad UTF--8 content type and transfer encoding

>header   __KAM_BAD_UTF8_1               Content-Type =~ /text\/html; charset=\"utf-8\"/i
>header   __KAM_BAD_UTF8_2               Content-Transfer-Encoding =~ /base64/i
>meta    KAM_BAD_UTF8    (__KAM_BAD_UTF8_1 + __KAM_BAD_UTF8_2 >= 2)
>score   KAM_BAD_UTF8    1.0
>describe KAM_BAD_UTF8   Bad Content Type and Transfer Encoding that attempts to evade SA scanning
 >
 >
>So far not seeing any sign it's in the wild.  Have you?
-----
Pedro

Re: fake base64 encoding

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.

On 2/1/2017 9:35 PM, Kevin A. McGrail wrote:
> I agree.  The test does not trigger
>
> The second test will trigger utf8_mode on
>
> Feb  1 21:29:32.246 [26958] dbg: message: HTML::Parser utf8_mode on 
> (assumed UTF-8 octets)
> Content-Type: text/html; charset="utf-8"
>> It makes sense since SA tries to decode the body before applying 
>> rules but Thunderbird shows the email correctly in
>> both cases (the email is human readable).
>> Can anyone please try it as well to discard it is only me...  just 
>> add those 2 headers at the end of  smtp headers section..
>>
>
> I would say Thunderbird is not parsing it correctly.  Looking to see 
> if this is a spam indicator.
I ran some test cases with this rule:

#Bad UTF--8 content type and transfer encoding
header   __KAM_BAD_UTF8_1               Content-Type =~ /text\/html; 
charset=\"utf-8\"/i
header   __KAM_BAD_UTF8_2               Content-Transfer-Encoding =~ 
/base64/i

meta    KAM_BAD_UTF8    (__KAM_BAD_UTF8_1 + __KAM_BAD_UTF8_2 >= 2)
score   KAM_BAD_UTF8    1.0
describe KAM_BAD_UTF8   Bad Content Type and Transfer Encoding that 
attempts to evade SA scanning


So far not seeing any sign it's in the wild.  Have you?

Regards,
KAM

Re: fake base64 encoding

Posted by "Kevin A. McGrail" <KM...@PCCC.com>.

On 2/1/2017 6:17 AM, Pedro David Marco wrote:
> Hi!
>
> i have noticed that when an email contains this (wrong) headers:
>
> Content-Type: text/html; charset="utf-8"
> Content-Transfer-Encoding: base64
>
> as SMTP headers, not MIME headers, and the email body is not base64 
> enconded, email clients as Thunderbird show the content correctly but
> SpamAssasin body rules are blind.
>
> Example:
>
> suppose a rule like
> bodyTEST_TEXT_DETECTED/TEST TEXT/
> scoreTEST_TEXT_DETECTED1
> describeTEST_TEXT_DETECTEDTest text detected
>
> With this  .eml:  (minimized)
>
> # cat test.eml
> From: LinkedIn Email Confirmation <em...@linkedin.naver.com>
> To: "jcornago@" <sia.es jcornago@sia.es>
> Date: Tue, 8 Mar 2016 04:18:08 +0000
> Subject: Please confirm your email address
> TEST TEXT
> #
>
> The rule triggers ok!  Now i add headers:
>
> # cat test.eml
> From: LinkedIn Email Confirmation <em...@linkedin.naver.com>
> To: "jcornago@" <sia.es jcornago@sia.es>
> Subject: Please confirm your email address
> Date: Tue, 8 Mar 2016 04:18:08 +0000
> Content-Type: text/html; charset="utf-8"
> Content-Transfer-Encoding: base64
> TEST TEXT
> #
>
> And the rule never triggers!
I agree.  The test does not trigger

The second test will trigger utf8_mode on

Feb  1 21:29:32.246 [26958] dbg: message: HTML::Parser utf8_mode on 
(assumed UTF-8 octets)
Content-Type: text/html; charset="utf-8"
> It makes sense since SA tries to decode the body before applying rules 
> but Thunderbird shows the email correctly in
> both cases (the email is human readable).
> Can anyone please try it as well to discard it is only me...  just add 
> those 2 headers at the end of  smtp headers section..
>

I would say Thunderbird is not parsing it correctly.  Looking to see if 
this is a spam indicator.