You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by "Chip M." <sa...@IowaHoneypot.com> on 2016/08/31 18:36:04 UTC

spample of "data" URL in well-crafted Phish

Freshly caught Spample:
	http://puffin.net/software/spam/samples/0042_data_embedded_phish.txt
The only munging was inserting ".EXAMPLE" between "wellsfargo"
and ".com".

Four years ago, I read this fascinating article:
	http://isc.sans.edu/diary/%22Data%22+URLs+used+for+in-URL+phishing/13996
and promptly added a simple word test to score these.

At the time, I had no idea whether these occurred in Ham
(seemed unlikely, but some Hammers are stunningly "thick"
(*cough, iframe, un-cough*)).

Since then, I've seen a steady, very low volume of spam hits,
with zero Ham hits (volume of about a quarter million emails
per month).
Yes, "ZERO" Ham. :)


Most of them have followed the same pattern that's in this
spample:
The MIME encoded "data" URL decodes to a classic Phish page.
Inside that, there's usually a small encoded bit of javascript,
typically starting with:
	document.write(unescape('
In this case, it decodes to (target URL munged/replaced):
	<form action="http://EXAMPLE.COM/wp-content/uploads/vrr.php" class="button" method="post" name="submit" id="submit">'

I just did a raw HTTP GET on the actual final URL, and it
returned a 302 with a Location of (genuine) Wellsfargo, with a
parameter starting with:
	/login?ERROR_CODE=
followed by a 36-character-long code.

I did another raw GET of that Location, and it returned a 302
with a Location of WF's plain URL (no parameters), and the
document body was a terse, semi-"offshore"-speak:
	This document you requested has moved temporarily.

It appears someone reported it to WF, who successfully did a
take down, but instead of providing a pedagogic page that
explained that the victim would have been toast, they chose just
to passively track accesses. :(


** Mitigation:
The easiest way to catch these is with a simple body word match.
Here's the exact matches I am currently using (some of them are
recent additions, listed in date of addition order):
	href="data:
	href='data:
	http://data:
	data:text/html;base64
	<img src="data:
	hta:application

*** Do any of you HTML gurus have additional suggestions? :)

I also recommend at least medium scoring:
	http-equiv="refresh"
which typically occurs in these, and many unrelated campaigns.

I have been thoroughly tempted to do the Klingon Coding Thing and
de-MIME these in real-time, then further decode the javascript
to get the URL... but the volume is so low, my anti-Phish system
is also nailing these, and sometimes a blaster really is a more
suitable weapon than a lightsaber. ;)
	- "Chip"



Re: spample of "data" URL in well-crafted Phish

Posted by Axb <ax...@gmail.com>.
On 08/31/2016 08:56 PM, John Hardin wrote:
> On Wed, 31 Aug 2016, Chip M. wrote:
>
>> ** Mitigation:
>> The easiest way to catch these is with a simple body word match.
>> Here's the exact matches I am currently using (some of them are
>> recent additions, listed in date of addition order):
>>     href="data:
>>     href='data:
>>     http://data:
>>     data:text/html;base64
>>     <IMG src="data:
>>     hta:application
>
> I'll see about getting those into the sandbox.

IMG src="data  can FP a lot.
>
>> *** Do any of you HTML gurus have additional suggestions? :)
>
> ... a poison-pill rule for < script > tags in email HTML?  (only
> slightly toungue-in-cheek)

could hit a lot of cheapo CMS sourced "legit" bulk content.
and possibly my favourite headache: airline ticket confirmations.







Re: spample of "data" URL in well-crafted Phish

Posted by John Hardin <jh...@impsec.org>.
On Wed, 31 Aug 2016, Chip M. wrote:

> ** Mitigation:
> The easiest way to catch these is with a simple body word match.
> Here's the exact matches I am currently using (some of them are
> recent additions, listed in date of addition order):
> 	href="data:
> 	href='data:
> 	http://data:
> 	data:text/html;base64
> 	<IMG src="data:
> 	hta:application

I'll see about getting those into the sandbox.

> *** Do any of you HTML gurus have additional suggestions? :)

... a poison-pill rule for < script > tags in email HTML?  (only slightly 
toungue-in-cheek)


-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   I'm seriously considering getting one of those bright-orange prison
   overalls and stencilling PASSENGER on the back. Along with the paper
   slippers, I ought to be able to walk right through security.
                                              -- Brian Kantor in a.s.r
-----------------------------------------------------------------------
  253 days since the first successful real return to launch site (SpaceX)

Re: spample of "data" URL in well-crafted Phish

Posted by "Chip M." <sa...@IowaHoneypot.com>.
On Fri, 16 Sep 2016, John Hardin wrote:
>Chip, could you send me some spamples of non-image data: messages 
>offlist? The only ones I have anywhere are images.

Sent last week - thanks for your ongoing work on this John! :)

After that request, I decided to add (in my post SA filter)
a minimally scoring test for "data:", to make it easy to find new
variants (in my log viewer, I can flag any test as a "warning"
which flashes an ugly yellow UI thingamajig, so I don't have to
remember to Look for Stuff).

My first hits have all been a new Snowshoe image variant that may
be of (mild) general interest:
	http://puffin.net/software/spam/samples/0044_data_embedded_snow.txt

Nowhere near as dangerous as Phish, but more "pee-in-the-pool"
evidence that sane senders should avoid this technique. :\
	- "Chip"


Re: spample of "data" URL in well-crafted Phish

Posted by John Hardin <jh...@impsec.org>.
On Thu, 15 Sep 2016, John Hardin wrote:

> On Wed, 15 Sep 2016, Chip M. wrote:
>
>>  Sadly, I have more FP data for you. :(
>>
>>  Here's one specific example (just a single very long line from
>>  one corpse):
>>   background-image: url("data:image/svg+xml;charset=utf8,%3Csvg
>>   width='104px' height='82px' viewBox='0 0 104 82' version='1.1'
>>   xmlns='http://www.w3.org/2000/svg' 
>
> Ok, I excluded image data from URI_DATA. This should reduce FPs without 
> hurting spam/phish detection (I hope).

...and now __URI_DATA isn't hitting *anything*.

I suspect that the only data: URLs in the masscheck corpora are for 
embedded images. This makes sense if they're being used primarily for 
spearphishing.

Chip, could you send me some spamples of non-image data: messages 
offlist? The only ones I have anywhere are images.

Thanks!

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Politicians never accuse you of "greed" for wanting other people's
   money, only for wanting to keep your own money.    -- Joseph Sobran
-----------------------------------------------------------------------
  Tomorrow: the 229th anniversary of the signing of the U.S. Constitution

Re: spample of "data" URL in well-crafted Phish

Posted by John Hardin <jh...@impsec.org>.
On Wed, 15 Sep 2016, Chip M. wrote:

> Sadly, I have more FP data for you. :(
>
> Here's one specific example (just a single very long line from
> one corpse):
>  background-image: url("data:image/svg+xml;charset=utf8,%3Csvg width='104px' height='82px' viewBox='0 0 104 82' version='1.1' xmlns='http://www.w3.org/2000/svg' xmlns:xlink='http://www.w3.org/1999/xlink'%3E%3C!-- Generator: Sketch 3.6.1 (26313) - http://www.bohemiancoding.com/sketch --%3E%3Ctitle%3Ediamond%3C/title%3E%3Cdesc%3ECreated with Sketch.%3C/desc%3E%3Cdefs%3E%3C/defs%3E%3Cg id='Current' stroke='none' stroke-width='1' fill='none' fill-rule='evenodd'%3E%3Cg id='Settings-Not-Supported-Grammarly-2' transform='translate(-241.000000, -183.000000)'%3E%3Cg id='4-copy-4' transform='translate(45.000000, 41.000000)'%3E%3Cg id='The-Settings' transform='translate(75.000000, 63.000000)'%3E%3Cg id='Not-Suported' transform='translate(1.000000, 56.000000)'%3E%3Cg id='Google-Docs' transform='translate(34.000000, 0.000000)'%3E%3Cg id='diamond' transform='translate(75.000000, 0.000000)'%3E%3Cimage id='Image-1' x='0' y='0.0800019' width='127.919997' height='127.919997' xlink:href='dat!
 a:image/pn

Ok, I excluded image data from URI_DATA. This should reduce FPs without 
hurting spam/phish detection (I hope).

This is an exploitable attack surface. SVG unfortunately does appear to 
support javascript, and binary image processing libraries have had 
exploitable bugs before.

But I doubt SA is the proper place to detect either of those. At the 
least, detecting javascript (much less hostile javascript) within a 
data:image/svg+xml block probably would be really inefficient.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  What nuts do with guns is terrible, certainly. But what
  evil or crazy people do with *anything* is not a valid argument
  for banning that item.           -- John C. Randolph <jc...@idiom.com>
-----------------------------------------------------------------------
  2 days until the 229th anniversary of the signing of the U.S. Constitution

Re: spample of "data" URL in well-crafted Phish

Posted by "Chip M." <sa...@IowaHoneypot.com>.
On Thu, 8 Sep 2016, John Hardin wrote:
>Yes. Given that ID on the first line the corpus owner can find the message 
>in question, review it, potentially fix misclassifications (that has 
>happened before), etc.

Shiny - that sounds perfect! :)

>There's one more exclusion I can add that will take out the last
>of the FPs in masscheck.

Thanks John!

Sadly, I have more FP data for you. :(
This week, two semi-well-known companies decided to join the
Embedded Data Hall of Shame:

"Overstock.com" (overstock.com)
831,744 bytes
X-Spam-Status: No, score=-3.2 required=5.1 tests=DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HTML_IMAGE_RATIO_02, HTML_MESSAGE, MIME_HTML_ONLY, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, RCVD_IN_RP_CERTIFIED, RCVD_IN_RP_SAFE

"Dave & Buster's" (daveandbusters.com)
806,962	bytes
X-Spam-Status: No, score=1.8 required=5.1 tests=DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HTML_IMAGE_RATIO_02, HTML_MESSAGE, MIME_HTML_ONLY, RCVD_IN_DNSWL_NONE

Sadly, both hit on my test for:
	href='data:
So the score for that should definitely be capped.
Fortunately, all these are otherwise scoring quite low.

Here's one specific example (just a single very long line from
one corpse):
  background-image: url("data:image/svg+xml;charset=utf8,%3Csvg width='104px' height='82px' viewBox='0 0 104 82' version='1.1' xmlns='http://www.w3.org/2000/svg' xmlns:xlink='http://www.w3.org/1999/xlink'%3E%3C!-- Generator: Sketch 3.6.1 (26313) - http://www.bohemiancoding.com/sketch --%3E%3Ctitle%3Ediamond%3C/title%3E%3Cdesc%3ECreated with Sketch.%3C/desc%3E%3Cdefs%3E%3C/defs%3E%3Cg id='Current' stroke='none' stroke-width='1' fill='none' fill-rule='evenodd'%3E%3Cg id='Settings-Not-Supported-Grammarly-2' transform='translate(-241.000000, -183.000000)'%3E%3Cg id='4-copy-4' transform='translate(45.000000, 41.000000)'%3E%3Cg id='The-Settings' transform='translate(75.000000, 63.000000)'%3E%3Cg id='Not-Suported' transform='translate(1.000000, 56.000000)'%3E%3Cg id='Google-Docs' transform='translate(34.000000, 0.000000)'%3E%3Cg id='diamond' transform='translate(75.000000, 0.000000)'%3E%3Cimage id='Image-1' x='0' y='0.0800019' width='127.919997' height='127.919997' xlink:href='
Which was in a huge (700+ Kb) set of Style blocks. :(
I'll send you both corpses, tonight, if you're interested.

I took a closer look at the "ClubNorton" monstrosity and it also
hit that test, and has many other similarities, so this appears
to be a new email "authoring" app. :(

For completeness, "ClubNorton" was:
812,383 bytes
X-Spam-Status: No, score=1.0 required=5.1 tests=DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, HTML_MESSAGE, MIME_HTML_ONLY, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL

That puts the inventors of the Hamster Cannon in the lead, in
terms of size and pandering to safe-listing "services". :(

I asked the recipient/survivor of the new duo to forward them to
his own account and tell me how they render in Outlook, and he
kindly sent me a screenshot, mostly to show an alert that Outlook
added:
"If there are problems with how this message is displayed, click here to view it in a web browser."

Purely IM(subjective)O, that sounds like even Outlook was a bit
disgruntled with it.
	- "Chip"


Re: spample of "data" URL in well-crafted Phish

Posted by John Hardin <jh...@impsec.org>.
On Thu, 8 Sep 2016, Chip M. wrote:

> On Sat, 3 Sep 2016, John Hardin wrote:
>> I've tweaked the FP avoidance a bit, maybe that will be enough
>> to get the S/O up high enough to publish it.
>
> John, do you have any detailed info about the Ham hits?

It's possible to look up what rules hit those messages, but to see the 
content and judge what might need to be changed I'd have to get in touch 
with the corpus owner and ask them about the messages - whether they were 
correctly classified as ham or spam, and whether they'd be willing to 
share them. That may not be possible as ham corpora are often private and 
sensitive.

To view the rule hits in masscheck, assuming that's of interest:
1. go to the detail page for the rule you're interested in, e.g.:
http://ruleqa.spamassassin.org/20160907-r1759562-n/URI_DATA/detail

2. in the "set 0, broken down by contributor", click on any links in the 
HAM% column.

You'll see something like:
.  1 
/data/archive/ham-misc//1433183357.M606569P40031.fumail03.cleanmail.ch,S=39348,W=40036%3A2,S 
HTML_MESSAGE,T_DKIM_INVALID,T_FSL_RCVD_EX_3,T_FSL_RCVD_TR_2,T_FSL_RCVD_UT_3,T_KAM_HTML_FONT_INVALID,T_NOT_A_PERSON,T_REMOTE_IMAGE,URI_DATA,URI_TRUNCATED,__ANY_TEXT_ATTACH,__ANY_TEXT_ATTACH_DOC,__BODY_TEXT_LINE,__BODY_TEXT_LINE,__BODY_TEXT_LINE,__BUGGED_IMG,__CT,__CTYPE_CHARSET_QUOTED,__CTYPE_HAS_BOUNDARY,__CTYPE_MULTIPART_ALT,__CTYPE_MULTIPART_ANY,__DKIM_EXISTS,__DOS_HAS_ANY_URI,__DOS_HAS_LIST_UNSUB,__DOS_RCVD_MON,__DOS_RCVD_SUN,__DOS_RELAYED_EXT,__FROM_ENCODED_QP,__FROM_FULL_NAME,__FROM_NEEDS_MIME,__FSL_COUNT_EXTERN,__FSL_COUNT_EXTERN,__FSL_COUNT_EXTERN,__FSL_COUNT_TRUST,__FSL_COUNT_TRUST,__FSL_COUNT_UNTRUST,__FSL_COUNT_UNTRUST,__FSL_COUNT_UNTRUST,__FSL_HAS_LIST_UNSUB,__HAS_ANY_EMAIL,__HAS_ANY_URI,__HAS_CAMPAIGN,__HAS_DATE,__HAS_DKIM_SIGHD,__HAS_DOMAINKEY_SIG,__HAS_FROM,__HAS_MESSAGE_ID,__HAS_MSGID,__HAS_RCVD,__HAS_REPLY_TO,__HAS_SUBJECT,__HAS_TO,__HAS_URI,__HAVE_BOUNCE_RELAYS,__HTML_LINK_IMAGE,__JM_REACTOR_DATE,__LAST_EXTERNAL_RELAY_NO_AUTH,__LAST_UNTRUSTED_RELAY_NO_AUTH,_!
 _LIST_PARTIAL,__LOCAL_PP_NONPPURL,__MIME_HTML,__MIME_VERSION,__MISSING_REF,__MISSING_REPLY,__MISSING_THREAD,__MSGID_OK_HOST,__NAKED_TO,__NONEMPTY_BODY,__NOT_A_PERSON,__RATWARE_0_TZ_DATE,__RCD_RDNS_MX_MESSY,__REMOTE_IMAGE,__REPLYTO_EXISTS,__SANE_MSGID,__SINGLE_WORD_LINE,__SINGLE_WORD_LINE,__SUBJ_2UPPER,__SUBJ_4LOWER,__SUBJ_HAS_WORDS,__SUBJ_NOT_SHORT,__TAG_EXISTS_BODY,__TAG_EXISTS_HEAD,__TAG_EXISTS_HTML,__TAG_EXISTS_META,__TOCC_EXISTS,__TO_NO_ARROWS_R,__TVD_BODY,__TVD_MIME_ATT_TP,__URI_DATA,__URI_DBL_DOM,__URI_MAILTO 
time=1433136576,scantime=0,format=f,reuse=no,set=0

...which is identification of the message in their corpora, and a list of 
all the rules that hit.

> I just datamined my three best corpora, from the beginning of
> 2014 thru this weekend, and found zero FPs, except for two hits
> on that "img" test.  My data does NOT prove it's impossible for
> anybody else, but it does seem odd, so I'm wondering if the
> SA MassCheck mechanism has some means for the contributor to
> pull out the corpses of specific hits.

Yes. Given that ID on the first line the corpus owner can find the message 
in question, review it, potentially fix misclassifications (that has 
happened before), etc.

There's one more exclusion I can add that will take out the last of the 
FPs in masscheck.

> If it doesn't, that would be a cool feature to add. :)

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   The Constitution is a written instrument. As such its meaning does
   not alter. That which it meant when adopted, it means now.
                     -- U.S. Supreme Court
                        SOUTH CAROLINA v. US, 199 U.S. 437, 448 (1905)
-----------------------------------------------------------------------
  9 days until the 229th anniversary of the signing of the U.S. Constitution

Re: spample of "data" URL in well-crafted Phish

Posted by "Chip M." <sa...@IowaHoneypot.com>.
On Sat, 3 Sep 2016, John Hardin wrote:
>I've tweaked the FP avoidance a bit, maybe that will be enough
>to get the S/O up high enough to publish it.

John, do you have any detailed info about the Ham hits?

I just datamined my three best corpora, from the beginning of
2014 thru this weekend, and found zero FPs, except for two hits
on that "img" test.  My data does NOT prove it's impossible for
anybody else, but it does seem odd, so I'm wondering if the
SA MassCheck mechanism has some means for the contributor to
pull out the corpses of specific hits.
If it doesn't, that would be a cool feature to add. :)


On Wed, 31 Aug 2016, Axb wrote:
>IMG src="data  can FP a lot.

AXB,
You are correct.
A few months ago, I had moved that rule in with my other "data"
rules, apparently because they had the token "data" in common.

I dug thru my notes, and the image rule was originally added to
combat a semi-subtle snowshoe campaign sent via Linode (as hosts,
they're much better than the other big-cheap-VPSs, so I've been
resisting scoring their IP blocks, which means that snow sent
thru them is sometimes harder to catch).

When I checked all data for 2014 to now in my three best corpora
(about 840 K-spam), I found that all the image spam hits were in
snow, and were NOT overtly dangerous, whereas all the non-image
"data" stuff has been in well-crafted Phish (UBER-dangerous).

There were exactly two Ham hits, and both were :grind-teeth:
ostensibly legitimate, albeit non-urgent.

Perhaps ironically or merely sadly, one was an 800 Kb monstrosity
of HTML badness (yes, all in one single Part), with several 
images and :cring: fonts inlined via "data" statements.  When I
tried to view it as an HTML page in my raw corpse viewer (using
an old-ish open source HTML rendering engine), it grinded away
for a while then died. :(
Who was the Sender?
Norton.
Yes, THAT Norton.
... and the Subject header was:
"ClubNorton Newsletter: Avoiding Social Engineering Tricks on Social Networks"

I've been scoring my data img rule at about 2.3 so it's well
below Poison Pill, and would not have caused either of those two
Hams to die.  Though I would not have lost sleep over a
Mercy Killing of the "ClubNorton" monstrosity. ;)

Bottom-line:
I strongly recommend a high scoring non-img "data" rule, and
gently recommend a modest scoring img "data" rule.
Everyone's mileage will vary, as always. :)
	- "Chip"

P.S. Javascript... I agree 100% with John, while respecting AXB's
right to disagree and choose his own poison. ;)
I'll describe what I'm doing later, in a separate thread.
It's flexible enough to provide good protection, while letting
in all but the self-injurious Ham (e.g. someone at Amazon drank
some of the ClubNorton koolaid).



Re: spample of "data" URL in well-crafted Phish

Posted by John Hardin <jh...@impsec.org>.
On Wed, 31 Aug 2016, Chip M. wrote:

> Freshly caught Spample:
> 	http://puffin.net/software/spam/samples/0042_data_embedded_phish.txt
> The only munging was inserting ".EXAMPLE" between "wellsfargo"
> and ".com".
>
> ** Mitigation:
> The easiest way to catch these is with a simple body word match.
> Here's the exact matches I am currently using (some of them are
> recent additions, listed in date of addition order):
> 	href="data:
> 	href='data:
> 	http://data:
> 	data:text/html;base64
> 	<DEFANGED_IMG src="data:
> 	hta:application

That was added to the sandbox in 2012 (I read SANS too...):

   https://svn.apache.org/viewvc?view=revision&revision=1378630

but it isn't performing well enough to be published:

   http://ruleqa.spamassassin.org/20160902-r1758905-n/T_URI_DATA/detail

I've tweaked the FP avoidance a bit, maybe that will be enough to get the 
S/O up high enough to publish it.


-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   ...a great many people are not fit for Liberty, it scares the crap
   out of them and they'd much rather be ruled. As Loki said in the
   Avengers movie, kneeling is their natural state.    -- Mark D @ TSM
-----------------------------------------------------------------------
  14 days until the 229th anniversary of the signing of the U.S. Constitution