You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@spamassassin.apache.org by Christian Grunfeld <ch...@gmail.com> on 2011/10/12 20:01:07 UTC

antiphishing

Hi,

I have an idea that I want to discuss with users and developers.

Many phishing mails exploit the bad knowledge of the difference
between real url and link anchor text by simple users. So they show
atractive link text that points to hiden, unrecognized and evil urls.
eg: exe files hiden by photo names, etc.

My idea is to have a rewrite engine in spamassassin that can rewrite
real url in place of the link anchor text or at least to write it near
the anchor text without removing it. In that way people can check if
both agree or if the url is known or unknown. It would be another step
before the "inevitable click" :p

The link functionality is not broken in any case (good or evil link)
so genuine links can be followed and evil links can be warned !

In sumary...replace text between <a> and </a> by the href or add the
href next to the text with an ascii arrow (-->) or something like
that.

Cheers !

Re: antiphishing

Posted by da...@chaosreigns.com.

On 10/12, Christian Grunfeld wrote:
> the point is that I dont think it would be a good idea to let SA give
> a high score based on an "apparently" missmatch between text and url.

SpamAssassin rule QA and optimized score generation infrastructure means
we can find out if it's useful before deploying it, and then calculate
a score for the rule that has the optimal impact on spam filtration
accuracy.

And according to the ruleqa results, you're right, it wouldn't be good to
give a high score on mismatched href and value.  Now I want to know why,
and how it can be improved, because it seems likely to be useful.

-- 
"Where are you going and what do you wish?"
- The Old Moon, to Winkin' Blinkin' and Nod
http://www.ChaosReigns.com

Re: antiphishing

Posted by Christian Grunfeld <ch...@gmail.com>.

> Rather than tampering with the original mail, surely the solution is to
> clearly detect the mail as spam in the first place so it hopefully never
> reaches the user.

the point is that I dont think it would be a good idea to let SA give
a high score based on an "apparently" missmatch between text and url.

> History has taught me that if there's a link, someone *will* click on it
> regardless of how obvious it might be to you or I that the link is
> malicious.

I think the same as you! thats why I said "another" step before the
click......but that step may be usefull

Re: antiphishing

Posted by Ned Slider <ne...@unixmail.co.uk>.

On 10/12/2011 07:01 PM, Christian Grunfeld wrote:
> Hi,
>
> I have an idea that I want to discuss with users and developers.
>
> Many phishing mails exploit the bad knowledge of the difference
> between real url and link anchor text by simple users. So they show
> atractive link text that points to hiden, unrecognized and evil urls.
> eg: exe files hiden by photo names, etc.
>
> My idea is to have a rewrite engine in spamassassin that can rewrite
> real url in place of the link anchor text or at least to write it near
> the anchor text without removing it. In that way people can check if
> both agree or if the url is known or unknown. It would be another step
> before the "inevitable click" :p
>
> The link functionality is not broken in any case (good or evil link)
> so genuine links can be followed and evil links can be warned !
>
> In sumary...replace text between<a>  and</a>  by the href or add the
> href next to the text with an ascii arrow (-->) or something like
> that.
>
> Cheers !
>

Rather than tampering with the original mail, surely the solution is to 
clearly detect the mail as spam in the first place so it hopefully never 
reaches the user.

History has taught me that if there's a link, someone *will* click on it 
regardless of how obvious it might be to you or I that the link is 
malicious.

Re: antiphishing

Posted by Christian Grunfeld <ch...@gmail.com>.

> Large numbers of spammers use DKIM. We've been under attack for weeks
> now by some outfit who is buying up old, "clean" IP subnets and using it
> to spew their non-pharma, really "clean looking" spam onto us - no
> RBL/SURBL hits for 3-5 *days*, getting scores from 0.5-3.0 - really
> tough - nothing to write content rules for.
>
> All of it DKIM signed and SPF'ed. I ended up building my own RBL just
> so we could catch it :-(
>
> Well, that's the case for the above-mentioned spam too. All the spam has
> links to websites that are part of the same domain as the email -
> running on webservers in the same subnets. :-(

really a pathological scenario !
yes...for particular case you end up writing rules....but I think your
case is not the general one

Re: antiphishing

Posted by Jason Haar <Ja...@trimble.com>.

On 13/10/11 14:05, Christian Grunfeld wrote:
>
> I was not specifically talking about dkim signed mails. It is clear
> that body rewriting mess up sigs. It is also clear that phishers dont
> use dkim !
>

Large numbers of spammers use DKIM. We've been under attack for weeks
now by some outfit who is buying up old, "clean" IP subnets and using it
to spew their non-pharma, really "clean looking" spam onto us - no
RBL/SURBL hits for 3-5 *days*, getting scores from 0.5-3.0 - really
tough - nothing to write content rules for.

All of it DKIM signed and SPF'ed. I ended up building my own RBL just 
so we could catch it :-(

> and if they do you have the certainty that the originating
> domain has nothing to do with what the content claims to be !...unless
> the phishing comes from the same domain ! (really bizarre) ! :D
>

Well, that's the case for the above-mentioned spam too. All the spam has
links to websites that are part of the same domain as the email -
running on webservers in the same subnets. :-(

-- 
Cheers

Jason Haar
Information Security Manager, Trimble Navigation Ltd.
Phone: +1 408 481 8171
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1

Re: antiphishing

Posted by David B Funk <db...@engineering.uiowa.edu>.

On Wed, 12 Oct 2011, Christian Grunfeld wrote:

> > Modifying headers -might- mess up DKIM, gpg, etc sigs (depending upon
> > how they were done). Modifying bodies -will- mess up sigs.
>
> I was not specifically talking about dkim signed mails. It is clear
> that body rewriting mess up sigs. It is also clear that phishers dont
> use dkim ! and if they do you have the certainty that the originating
> domain has nothing to do with what the content claims to be !...unless
> the phishing comes from the same domain ! (really bizarre) ! :D

phishers -might- not dkim sign messages but other legimate messages
(such as airline reservation confirmations) which do sign their
messages -and- obfuscate URLS will get trashed.

The problem is that if you re-write the body of all messages which have
obfuscated URLs, then you will trash legimate messages.

If you have some magic bullet that reliably detects phishes so you're
sure you won't FP on obfuscate URLS, then you don't need that message
re-write, just hit it with a spam score.
But so far I havn't seen a successful antiphish magic bullet and I've
seen lots of phishes.

-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: antiphishing

Posted by Christian Grunfeld <ch...@gmail.com>.

> Modifying headers -might- mess up DKIM, gpg, etc sigs (depending upon
> how they were done). Modifying bodies -will- mess up sigs.

I was not specifically talking about dkim signed mails. It is clear
that body rewriting mess up sigs. It is also clear that phishers dont
use dkim ! and if they do you have the certainty that the originating
domain has nothing to do with what the content claims to be !...unless
the phishing comes from the same domain ! (really bizarre) ! :D

Re: antiphishing

Posted by David B Funk <db...@engineering.uiowa.edu>.

On Wed, 12 Oct 2011, Christian Grunfeld wrote:

> > SA is a scoring filter, not a modifcation filter. Changing SA to rewrite
> > message bodies is, I think most if all will agree, beyond the scope of what
> > SA is intended to do, and beyond the scope of what it _should_ do.
>
> it does modify headers, subjects....why not bodies ?

Modifying headers -might- mess up DKIM, gpg, etc sigs (depending upon
how they were done). Modifying bodies -will- mess up sigs.

Mucking up a header might render it useless but will leave the message
mostly readable, messing up the body may well render the message
useless.


> > Certainly SA should detect and score such obfuscation, if the FP rate can be
> > kept low. But controlling what the end user sees in the body of the mail is
> > properly the MUA's job.
>
> No, MUAs interprets and shows html like browsers does and does not
> modify it. Detect such obfuscation can be as diffucult as to try SA to
> decode a capcha ! Humans can do better that task !

Umm, you've never seen Thunderbird warnings such as:

 "To protect your privacy Thunderbird has blocked remote content in this message"



-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: antiphishing

Posted by John Hardin <jh...@impsec.org>.

On Wed, 12 Oct 2011, Christian Grunfeld wrote:

>> Certainly SA should detect and score such obfuscation, if the FP rate 
>> can be kept low. But controlling what the end user sees in the body of 
>> the mail is properly the MUA's job.
>
> No, MUAs interprets and shows html like browsers does and does not
> modify it. Detect such obfuscation can be as diffucult as to try SA to
> decode a capcha ! Humans can do better that task !

My MUA does exactly that. If the link text differs from the link URI it 
displays the hostname/IP part of the URI next to the link text. If it 
detects what looks like obfuscation (i.e. the link text points at one 
domain and the link itself points at a different domain) it displays a 
warning that the links in the message are suspicious.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Vista: because the audio experience is *far* more important than
   network throughput.
-----------------------------------------------------------------------
  307 days since the first successful private orbital launch (SpaceX)

Re: antiphishing

Posted by Christian Grunfeld <ch...@gmail.com>.

> SA is a scoring filter, not a modifcation filter. Changing SA to rewrite
> message bodies is, I think most if all will agree, beyond the scope of what
> SA is intended to do, and beyond the scope of what it _should_ do.

it does modify headers, subjects....why not bodies ?

> Certainly SA should detect and score such obfuscation, if the FP rate can be
> kept low. But controlling what the end user sees in the body of the mail is
> properly the MUA's job.

No, MUAs interprets and shows html like browsers does and does not
modify it. Detect such obfuscation can be as diffucult as to try SA to
decode a capcha ! Humans can do better that task !

Re: antiphishing

Posted by John Hardin <jh...@impsec.org>.

On Wed, 12 Oct 2011, Christian Grunfeld wrote:

>>> It certainly seems like it would be very useful.  I see there's a
>>> __SPOOFED_URL rule, but it's hard to read and doesn't have a description.
>>
>> This is an issue that comes up on this list occasionally.  It sounds
>> like a good idea at first, but when you start looking into it, you find
>> that there is WAY too much legitimate email that does this for the rule
>> to be useful.
>
> But I didnt talk about a rule that adds a score ! I talk about writing
> the real url in the body next the anchor text and let the user see if
> both "agree" or not or if the url looks familiar to him.

SA is a scoring filter, not a modifcation filter. Changing SA to rewrite 
message bodies is, I think most if all will agree, beyond the scope of 
what SA is intended to do, and beyond the scope of what it _should_ do.

Certainly SA should detect and score such obfuscation, if the FP rate can 
be kept low. But controlling what the end user sees in the body of the 
mail is properly the MUA's job.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Maxim XXIX: The enemy of my enemy is my enemy's enemy. No more.
   No less.
-----------------------------------------------------------------------
  307 days since the first successful private orbital launch (SpaceX)

Re: antiphishing

Posted by Christian Grunfeld <ch...@gmail.com>.

2011/10/12 Bowie Bailey <Bo...@buc.com>:
> Please keep list traffic on the list.

sorry but you reply only to me first ! Check it!

> On 10/12/2011 3:25 PM, Christian Grunfeld wrote:
>> I see all genuine (non-spam) mails for subscriptions, checking and
>> activating accounts showing the long and crapy url !
>> And when the url is hidden and text is shown you have 99% phising chance.
>> It is true that other good mails like paypal ones sends you a button
>> and it would be bad idea to show the url inline.
>>
>>
>> 2011/10/12 Bowie Bailey <Bo...@buc.com>:
>>>
>>> Right.  I wasn't referring to your idea, I was replying to someone else
>>> who mentioned the __SPOOFED_URL rule.
>>>
>>> Writing in the real url is certainly an option and maybe not even a bad
>>> idea in certain cases.  However, just keep in mind that this will be
>>> UGLY.  In most cases (of non-spam) the real url is some sort of long,
>>> obnoxious tracking url.
>>>
>>> Do you really want to stick something like this:
>>>
>>> http://engage.advancedpublishing.com/t?r=45&c=17003&l=1046&ctl=50580:22813295B3FE26F750565933A5FBF73C4E8B5F87901A15B8&
>>>
>>> in the middle of one of your bosses nicely formatted html email
>>> newsletters?  (Just a random link pulled out of an email
>>> newsletter...and I've seen much worse)
>>>
>>> I think it's better to train people to pay attention to what they
>>> click.  The people who can't be trained to do this are the same people
>>> who will click the link even if you show them the real url.
>
>
> The example I gave was taken from a newsletter where the url was
> hidden.  Almost all email newsletters that I have seen do the same
> thing.  Currently, most of the spam I'm seeing does not attempt to hide
> the url at all.

certainly why are seeing different spam !

Re: antiphishing

Posted by Martin Gregorie <ma...@gregorie.org>.

On Wed, 2011-10-12 at 15:46 -0400, Bowie Bailey wrote:

> Currently, most of the spam I'm seeing does not attempt to hide
> the url at all.
> 
+1

Re: antiphishing

Posted by John Hardin <jh...@impsec.org>.

On Wed, 12 Oct 2011, David B Funk wrote:

> On Wed, 12 Oct 2011, Bowie Bailey wrote:
>
>> The example I gave was taken from a newsletter where the url was
>> hidden.  Almost all email newsletters that I have seen do the same
>> thing.  Currently, most of the spam I'm seeing does not attempt to hide
>> the url at all.
>
> Not too many spam do that but almost all phish that I've seen do.
>
> The point being that the number of legitimate messages that obfuscate
> the URL renders this potential antiphish technique too FP prone to
> be trustworthy. (sigh).

Possibly as one factor in a set of phishy signs...

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   Vista: because the audio experience is *far* more important than
   network throughput.
-----------------------------------------------------------------------
  307 days since the first successful private orbital launch (SpaceX)

Re: antiphishing

Posted by David B Funk <db...@engineering.uiowa.edu>.

On Wed, 12 Oct 2011, Bowie Bailey wrote:

> The example I gave was taken from a newsletter where the url was
> hidden.  Almost all email newsletters that I have seen do the same
> thing.  Currently, most of the spam I'm seeing does not attempt to hide
> the url at all.

Not too many spam do that but almost all phish that I've seen do.

The point being that the number of legitimate messages that obfuscate
the URL renders this potential antiphish technique too FP prone to
be trustworthy. (sigh).

-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: antiphishing

Posted by Bowie Bailey <Bo...@BUC.com>.

Please keep list traffic on the list.

On 10/12/2011 3:25 PM, Christian Grunfeld wrote:
> I see all genuine (non-spam) mails for subscriptions, checking and
> activating accounts showing the long and crapy url !
> And when the url is hidden and text is shown you have 99% phising chance.
> It is true that other good mails like paypal ones sends you a button
> and it would be bad idea to show the url inline.
>
>
> 2011/10/12 Bowie Bailey <Bo...@buc.com>:
>>
>> Right.  I wasn't referring to your idea, I was replying to someone else
>> who mentioned the __SPOOFED_URL rule.
>>
>> Writing in the real url is certainly an option and maybe not even a bad
>> idea in certain cases.  However, just keep in mind that this will be
>> UGLY.  In most cases (of non-spam) the real url is some sort of long,
>> obnoxious tracking url.
>>
>> Do you really want to stick something like this:
>>
>> http://engage.advancedpublishing.com/t?r=45&c=17003&l=1046&ctl=50580:22813295B3FE26F750565933A5FBF73C4E8B5F87901A15B8&
>>
>> in the middle of one of your bosses nicely formatted html email
>> newsletters?  (Just a random link pulled out of an email
>> newsletter...and I've seen much worse)
>>
>> I think it's better to train people to pay attention to what they
>> click.  The people who can't be trained to do this are the same people
>> who will click the link even if you show them the real url.


The example I gave was taken from a newsletter where the url was
hidden.  Almost all email newsletters that I have seen do the same
thing.  Currently, most of the spam I'm seeing does not attempt to hide
the url at all.

-- 
Bowie

Re: antiphishing

Posted by Christian Grunfeld <ch...@gmail.com>.

>> It certainly seems like it would be very useful.  I see there's a
>> __SPOOFED_URL rule, but it's hard to read and doesn't have a description.
>
> This is an issue that comes up on this list occasionally.  It sounds
> like a good idea at first, but when you start looking into it, you find
> that there is WAY too much legitimate email that does this for the rule
> to be useful.

But I didnt talk about a rule that adds a score ! I talk about writing
the real url in the body next the anchor text and let the user see if
both "agree" or not or if the url looks familiar to him.

Re: antiphishing

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.

>On Sat, Oct 15, 2011 at 12:38 AM, <da...@chaosreigns.com> wrote:
>> And I need to remind you that it hits almost as much ham as spam:
>> http://ruleqa.spamassassin.org/20111008-r1180336-n/T_SPOOFED_URL/detail
>>
>> I agree it seems like we should be able to improve it.  Maybe make
>> exceptions for known marketing trackers, as Adam Katz mentioned it has
>> problems with.

On 31.10.11 19:15, Mahmoud Khonji wrote:
>just to add a few more suggestions:
>* checking whether the anchor's actual URL (href URL) has the modal
>domain (a domain that is most frequently linked in the same email),
>and if it is not the modal domain then the email is spam.

That's what I've meant in my last ail to this thread. It would 
apparently require a SA plugin (not just a simple regexp rule)
but we'd be able allow different domains, e.g. bank example.com bought 
bank example.net etc.


>* checking the age of the href URL's domain via a Whois lookup (not
>all domains have the registration time stamp though), and if the age
>falls below certain thresholds then it's spam.

simple meta combining the rule above and DOB would catch this 
perfectly.

>* checking the domain rank via a search engine, and if the rank falls
>below certain thresholds then it's spam.

domain ranking would be just very different rulem could be combined 
with those above.

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Micro$oft random number generator: 0, 0, 0, 4.33e+67, 0, 0, 0...

Re: antiphishing

Posted by Mahmoud Khonji <m...@khonji.org>.

On Sat, Oct 15, 2011 at 12:38 AM, <da...@chaosreigns.com> wrote:
> And I need to remind you that it hits almost as much ham as spam:
> http://ruleqa.spamassassin.org/20111008-r1180336-n/T_SPOOFED_URL/detail
>
> I agree it seems like we should be able to improve it.  Maybe make
> exceptions for known marketing trackers, as Adam Katz mentioned it has
> problems with.

just to add a few more suggestions:
* checking whether the anchor's actual URL (href URL) has the modal
domain (a domain that is most frequently linked in the same email),
and if it is not the modal domain then the email is spam.
* checking the age of the href URL's domain via a Whois lookup (not
all domains have the registration time stamp though), and if the age
falls below certain thresholds then it's spam.
* checking the domain rank via a search engine, and if the rank falls
below certain thresholds then it's spam.

google already uses page ranks to reduce false positives in
misclassifying phishing websites (the result is then distributed via a
blacklist to FF/Chrome via google safe browsing API). Whois and modal
domain tests are also used in some proposed classifiers (but no idea
if they are used in production yet).

this can be helpful as phishing URLs/domains are often short-lived.
IIRC the average uptime for a phishing page/domain is ~2 hours (from
top of my head, didn't verify but should be close enough).

my concern is that this URL mismatch test might have too little added
value (0.599 S/O) to spend any expensive optimizations on it. it might
be more productive to invest time in other more promising tests and
make them better.

--
Regards,
Mahmoud Khonji
PGP Key: 0x92584ECA

Re: SPOOFED_URL Re: antiphishing

Posted by da...@chaosreigns.com.

On 10/18, Matus UHLAR - fantomas wrote:
> Very nice, however due to these and other circumstances mentioned I
> think that a plugin would be better, since it could define where to

Thanks.  It didn't work out, the results were worse than the older rule:

http://ruleqa.spamassassin.org/?daterev=20111018-r1185533-n&rule=%2Fspoofed_url

  MSECS    SPAM%     HAM%     S/O    RANK   SCORE  NAME   WHO/AGE
      0   1.6825   1.0301   0.620    0.55    0.01  T_SPOOFED_URL  
      0   1.2441   0.9989   0.555    0.53    0.01  T_SPOOFED_URL_HOST  
      0   2.1419   7.9151   0.213    0.42   (n/a)  __SPOOFED_URL  
      0   1.6915   7.7045   0.180    0.41   (n/a)  __SPOOFED_URL_HOST  

And yes, a plugin might be good to use
Mail::SpamAssassin::Util::RegistrarBoundaries::trim_domain() to use the
domain instead of the host.  But I doubt that's the biggest problem.

And I need to find out why my corpora aren't being included in the nightly
non-net ruleqa runs.

-- 
"Blades don't need reloading." - The Zombie Survival Guide by Max Brooks
http://www.ChaosReigns.com

Re: SPOOFED_URL Re: antiphishing

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.

On 14.10.11 18:07, darxus@chaosreigns.com wrote:
>Existing rule:
>
>rawbody  __SPOOFED_URL	m/<a\s[^>]{0,2048}\bhref=(?:3D)?.?(https?:[^>"'\# ]{8,29}[^>"'\# :\/?&=])[^>]{0,2048}>(?:[^<]{0,1024}<(?!\/a)[^>]{1,1024}>){0,99}\s{0,10}(?!\1)https?[^\w<]{1,3}[^<]{5}/i
>
>
>How about this, to only check for a changed domain part instead?
>
>rawbody SPOOFED_URL_DOMAIN /<a\s[^>]{0,2048}\bhref=(?:3D)?.?(https?:\/\/?[^\/>"'\# ]{8,29})[^>]{0,2048}>(?:[^<]{0,1024}<(?!\/a)[^>]{1,1024}>){0,99}\s{0,10}(?!\1)https?[^\w<]{1,3}[^<]{5}/i
>
>It matches this:
>
>  <a href="http://www.chaosreigns.com/">http://www.example.com</a>
>
>But does not match this (example from actual non-spam):
>
>  <a href="http://www.jr.com/tracking?ord_q_num=105725494&ord_q_zip=03076">http://www.jr.com/tracking</a>
>
>
>A very simplified form of this new one:
>
>rawbody SPOOFED_URL_DOMAIN /<a href="(https?:\/\/[^\/">]+)[^>]*>(?!\1)http/i
>
>That "(?!\1)" bit is nice and fancy.  It means "not what was in the first
>set of parentheses).  In the perlre man page: "A zero-width negative
>look-ahead assertion."

Very nice, however due to these and other circumstances mentioned I 
think that a plugin would be better, since it could define where to 
skip host name (and up to which level) and e.g. it could define whitelists
- who can spoof who, e.g. which mail company may "spoof" which bank.

However until then, this should still be worth trying.

-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
You have the right to remain silent. Anything you say will be misquoted,
then used against you.

Re: SPOOFED_URL Re: antiphishing

Posted by da...@chaosreigns.com.

Not relevant to the subject.  We're talking about where somebody is
maliciously making you think you're clicking on "www.youtube.com" when in
fact you're clicking on "www.ILikeSpam.com".

Somebody linking to one domain with an image hosted on another domain has
plenty of possibility to be legit.

You could do it.  You're welcome to try.  Maybe it'll even hit a usefully
larger percentage of spam than ham.  But it's not what we've been talking
about.

On 10/14, Christian Grunfeld wrote:
> you should be able to check against img src content, right?
> 
> 
> 2011/10/14 Christian Grunfeld <ch...@gmail.com>:
> > and what about when there is no anchor text in the link ? eg. paypal
> > image button
> >
> >
> > 2011/10/14  <da...@chaosreigns.com>:
> >> Existing rule:
> >>
> >> rawbody  __SPOOFED_URL  m/<a\s[^>]{0,2048}\bhref=(?:3D)?.?(https?:[^>"'\# ]{8,29}[^>"'\# :\/?&=])[^>]{0,2048}>(?:[^<]{0,1024}<(?!\/a)[^>]{1,1024}>){0,99}\s{0,10}(?!\1)https?[^\w<]{1,3}[^<]{5}/i
> >>
> >>
> >> How about this, to only check for a changed domain part instead?
> >>
> >> rawbody SPOOFED_URL_DOMAIN /<a\s[^>]{0,2048}\bhref=(?:3D)?.?(https?:\/\/?[^\/>"'\# ]{8,29})[^>]{0,2048}>(?:[^<]{0,1024}<(?!\/a)[^>]{1,1024}>){0,99}\s{0,10}(?!\1)https?[^\w<]{1,3}[^<]{5}/i
> >>
> >> It matches this:
> >>
> >>  <a href="http://www.chaosreigns.com/">http://www.example.com</a>
> >>
> >> But does not match this (example from actual non-spam):
> >>
> >>  <a href="http://www.jr.com/tracking?ord_q_num=105725494&ord_q_zip=03076">http://www.jr.com/tracking</a>
> >>
> >>
> >> A very simplified form of this new one:
> >>
> >> rawbody SPOOFED_URL_DOMAIN /<a href="(https?:\/\/[^\/">]+)[^>]*>(?!\1)http/i
> >>
> >> That "(?!\1)" bit is nice and fancy.  It means "not what was in the first
> >> set of parentheses).  In the perlre man page: "A zero-width negative
> >> look-ahead assertion."
> >>
> >> --
> >> "Every normal man must be tempted at times to spit upon his hands,
> >> hoist the black flag, and begin slitting throats."
> >>  - Henry Louis Mencken (1880-1956)
> >> http://www.ChaosReigns.com
> >>
> >
> 

-- 
"Every normal man must be tempted at times to spit upon his hands,
hoist the black flag, and begin slitting throats."
 - Henry Louis Mencken (1880-1956)
http://www.ChaosReigns.com

Re: SPOOFED_URL Re: antiphishing

Posted by Christian Grunfeld <ch...@gmail.com>.

you should be able to check against img src content, right?


2011/10/14 Christian Grunfeld <ch...@gmail.com>:
> and what about when there is no anchor text in the link ? eg. paypal
> image button
>
>
> 2011/10/14  <da...@chaosreigns.com>:
>> Existing rule:
>>
>> rawbody  __SPOOFED_URL  m/<a\s[^>]{0,2048}\bhref=(?:3D)?.?(https?:[^>"'\# ]{8,29}[^>"'\# :\/?&=])[^>]{0,2048}>(?:[^<]{0,1024}<(?!\/a)[^>]{1,1024}>){0,99}\s{0,10}(?!\1)https?[^\w<]{1,3}[^<]{5}/i
>>
>>
>> How about this, to only check for a changed domain part instead?
>>
>> rawbody SPOOFED_URL_DOMAIN /<a\s[^>]{0,2048}\bhref=(?:3D)?.?(https?:\/\/?[^\/>"'\# ]{8,29})[^>]{0,2048}>(?:[^<]{0,1024}<(?!\/a)[^>]{1,1024}>){0,99}\s{0,10}(?!\1)https?[^\w<]{1,3}[^<]{5}/i
>>
>> It matches this:
>>
>>  <a href="http://www.chaosreigns.com/">http://www.example.com</a>
>>
>> But does not match this (example from actual non-spam):
>>
>>  <a href="http://www.jr.com/tracking?ord_q_num=105725494&ord_q_zip=03076">http://www.jr.com/tracking</a>
>>
>>
>> A very simplified form of this new one:
>>
>> rawbody SPOOFED_URL_DOMAIN /<a href="(https?:\/\/[^\/">]+)[^>]*>(?!\1)http/i
>>
>> That "(?!\1)" bit is nice and fancy.  It means "not what was in the first
>> set of parentheses).  In the perlre man page: "A zero-width negative
>> look-ahead assertion."
>>
>> --
>> "Every normal man must be tempted at times to spit upon his hands,
>> hoist the black flag, and begin slitting throats."
>>  - Henry Louis Mencken (1880-1956)
>> http://www.ChaosReigns.com
>>
>

Re: SPOOFED_URL Re: antiphishing

Posted by da...@chaosreigns.com.

None of these rules will hit that.  That's what the second "http" is for.
"Hit the host name part of the href value of an anchor tag, then do *not*
match the same host name in the value part of the anchor, then hit 'href'".

I should've called it SPOOFED_URL_HOST, because this one is matching the
full host name, not just the domain.  I don't even know if we can get the
TLD logic for domain matching into a regex.  Without a modification to the
perl interpreter.

On 10/14, Christian Grunfeld wrote:
> and what about when there is no anchor text in the link ? eg. paypal
> image button
> 
> 
> 2011/10/14  <da...@chaosreigns.com>:
> > Existing rule:
> >
> > rawbody  __SPOOFED_URL  m/<a\s[^>]{0,2048}\bhref=(?:3D)?.?(https?:[^>"'\# ]{8,29}[^>"'\# :\/?&=])[^>]{0,2048}>(?:[^<]{0,1024}<(?!\/a)[^>]{1,1024}>){0,99}\s{0,10}(?!\1)https?[^\w<]{1,3}[^<]{5}/i
> >
> >
> > How about this, to only check for a changed domain part instead?
> >
> > rawbody SPOOFED_URL_DOMAIN /<a\s[^>]{0,2048}\bhref=(?:3D)?.?(https?:\/\/?[^\/>"'\# ]{8,29})[^>]{0,2048}>(?:[^<]{0,1024}<(?!\/a)[^>]{1,1024}>){0,99}\s{0,10}(?!\1)https?[^\w<]{1,3}[^<]{5}/i
> >
> > It matches this:
> >
> >  <a href="http://www.chaosreigns.com/">http://www.example.com</a>
> >
> > But does not match this (example from actual non-spam):
> >
> >  <a href="http://www.jr.com/tracking?ord_q_num=105725494&ord_q_zip=03076">http://www.jr.com/tracking</a>
> >
> >
> > A very simplified form of this new one:
> >
> > rawbody SPOOFED_URL_DOMAIN /<a href="(https?:\/\/[^\/">]+)[^>]*>(?!\1)http/i
> >
> > That "(?!\1)" bit is nice and fancy.  It means "not what was in the first
> > set of parentheses).  In the perlre man page: "A zero-width negative
> > look-ahead assertion."
> >
> > --
> > "Every normal man must be tempted at times to spit upon his hands,
> > hoist the black flag, and begin slitting throats."
> >  - Henry Louis Mencken (1880-1956)
> > http://www.ChaosReigns.com
> >
> 

-- 
"I finally figured out the only reason to be alive is to enjoy it."
- Rita Mae Brown
http://www.ChaosReigns.com

Re: SPOOFED_URL Re: antiphishing

Posted by Christian Grunfeld <ch...@gmail.com>.

and what about when there is no anchor text in the link ? eg. paypal
image button


2011/10/14  <da...@chaosreigns.com>:
> Existing rule:
>
> rawbody  __SPOOFED_URL  m/<a\s[^>]{0,2048}\bhref=(?:3D)?.?(https?:[^>"'\# ]{8,29}[^>"'\# :\/?&=])[^>]{0,2048}>(?:[^<]{0,1024}<(?!\/a)[^>]{1,1024}>){0,99}\s{0,10}(?!\1)https?[^\w<]{1,3}[^<]{5}/i
>
>
> How about this, to only check for a changed domain part instead?
>
> rawbody SPOOFED_URL_DOMAIN /<a\s[^>]{0,2048}\bhref=(?:3D)?.?(https?:\/\/?[^\/>"'\# ]{8,29})[^>]{0,2048}>(?:[^<]{0,1024}<(?!\/a)[^>]{1,1024}>){0,99}\s{0,10}(?!\1)https?[^\w<]{1,3}[^<]{5}/i
>
> It matches this:
>
>  <a href="http://www.chaosreigns.com/">http://www.example.com</a>
>
> But does not match this (example from actual non-spam):
>
>  <a href="http://www.jr.com/tracking?ord_q_num=105725494&ord_q_zip=03076">http://www.jr.com/tracking</a>
>
>
> A very simplified form of this new one:
>
> rawbody SPOOFED_URL_DOMAIN /<a href="(https?:\/\/[^\/">]+)[^>]*>(?!\1)http/i
>
> That "(?!\1)" bit is nice and fancy.  It means "not what was in the first
> set of parentheses).  In the perlre man page: "A zero-width negative
> look-ahead assertion."
>
> --
> "Every normal man must be tempted at times to spit upon his hands,
> hoist the black flag, and begin slitting throats."
>  - Henry Louis Mencken (1880-1956)
> http://www.ChaosReigns.com
>

Re: SPOOFED_URL Re: antiphishing

Posted by da...@chaosreigns.com.

Existing rule:

rawbody  __SPOOFED_URL	m/<a\s[^>]{0,2048}\bhref=(?:3D)?.?(https?:[^>"'\# ]{8,29}[^>"'\# :\/?&=])[^>]{0,2048}>(?:[^<]{0,1024}<(?!\/a)[^>]{1,1024}>){0,99}\s{0,10}(?!\1)https?[^\w<]{1,3}[^<]{5}/i


How about this, to only check for a changed domain part instead?

rawbody SPOOFED_URL_DOMAIN /<a\s[^>]{0,2048}\bhref=(?:3D)?.?(https?:\/\/?[^\/>"'\# ]{8,29})[^>]{0,2048}>(?:[^<]{0,1024}<(?!\/a)[^>]{1,1024}>){0,99}\s{0,10}(?!\1)https?[^\w<]{1,3}[^<]{5}/i

It matches this:

  <a href="http://www.chaosreigns.com/">http://www.example.com</a>

But does not match this (example from actual non-spam):

  <a href="http://www.jr.com/tracking?ord_q_num=105725494&ord_q_zip=03076">http://www.jr.com/tracking</a>


A very simplified form of this new one:

rawbody SPOOFED_URL_DOMAIN /<a href="(https?:\/\/[^\/">]+)[^>]*>(?!\1)http/i

That "(?!\1)" bit is nice and fancy.  It means "not what was in the first
set of parentheses).  In the perlre man page: "A zero-width negative
look-ahead assertion."

-- 
"Every normal man must be tempted at times to spit upon his hands,
hoist the black flag, and begin slitting throats."
 - Henry Louis Mencken (1880-1956)
http://www.ChaosReigns.com

SPOOFED_URL Re: antiphishing

Posted by da...@chaosreigns.com.

On 10/14, darxus@chaosreigns.com wrote:
> rawbody  __SPOOFED_URL	m/<a\s[^>]{0,2048}\bhref=(?:3D)?.?(https?:[^>"'\# ]{8,29}[^>"'\# :\/?&=])[^>]{0,2048}>(?:[^<]{0,1024}<(?!\/a)[^>]{1,1024}>){0,99}\s{0,10}(?!\1)https?[^\w<]{1,3}[^<]{5}/i

> I agree it seems like we should be able to improve it.  Maybe make
> exceptions for known marketing trackers, as Adam Katz mentioned it has
> problems with.  

I dug some of the hits out of my own corpora.  Of the 9 emails I looked at
*all* cases where it looked like this rule could have hit, matched at the
host name level.  So I think there is definite room for improvement there -
just check for a matching host name, ignore all the extra gunk after it.
Although I'm not certain it doesn't already try to do that, maybe I should
take more time to try to read it.  Okay, it's starting to sink in, and
looks like it's trying to match the whole url.  

Several examples where cases where somebody with a gmail account replied to
an email of mine and gmail converted the url in my plain text signature
to html:

throats.&quot;<br>=A0- Henry Louis Mencken (1880-1956)<br><a href=3D"http:/=
/www.chaosreigns.com/" target=3D"_blank">http://www.ChaosReigns.com</a><br>

And I did get to see lots of gross html.  Particularly from yahoo groups.
So maybe it would help to do some more html parsing (un-escaping) before
this rule.  I don't know how much work that would take.

But I didn't find any of the marketing trackers Adam mentioned.  

-- 
"Think, or I will set you on fire."
http://www.ChaosReigns.com

Re: antiphishing

Posted by da...@chaosreigns.com.

On 10/14, Matus UHLAR - fantomas wrote:
> While I have no doubt there is much of wanted mail with URL and text
> mismatch, I still would like to have such rule.

It exists, you're welcome to copy it out of the rules sandbox and use it,
false positives and all.  I already linked to it:
http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/khopesh/20_khop_experimental.cf?view=markup

rawbody  __SPOOFED_URL	m/<a\s[^>]{0,2048}\bhref=(?:3D)?.?(https?:[^>"'\# ]{8,29}[^>"'\# :\/?&=])[^>]{0,2048}>(?:[^<]{0,1024}<(?!\/a)[^>]{1,1024}>){0,99}\s{0,10}(?!\1)https?[^\w<]{1,3}[^<]{5}/i
# even with scrubbing, probably can't handle 'legit' tracking redirectors
meta	 SPOOFED_URL	__SPOOFED_URL && !(__VIA_ML || __SENDER_BOT || __YAHOO_BULK || __UNSUB_LINK || __THREADED || URL_SHORTENER)
describe SPOOFED_URL	Has a link whose text is a different URL

And I need to remind you that it hits almost as much ham as spam:
http://ruleqa.spamassassin.org/20111008-r1180336-n/T_SPOOFED_URL/detail

I agree it seems like we should be able to improve it.  Maybe make
exceptions for known marketing trackers, as Adam Katz mentioned it has
problems with.  

-- 
"Speed is a metaphor for freedom."
http://www.ChaosReigns.com

Re: antiphishing

Posted by Matus UHLAR - fantomas <uh...@fantomas.sk>.

>> On 10/12, Christian Grunfeld wrote:
>>> Many phishing mails exploit the bad knowledge of the difference
>>> between real url and link anchor text by simple users. So they show

>On 10/12/2011 2:25 PM, darxus@chaosreigns.com wrote:
>> Does spamassassin really not have a rule to detect this?  I just dug
>> up a perfect example - trying to look like an email from youtube, with
>> something like
>> '<a href="http://phishingjunk.com">http://www.youtube.com/stuff</a>',
>> and it didn't hit any rule that seemed relevant to that bit of deception.
>>
>> It certainly seems like it would be very useful.  I see there's a
>> __SPOOFED_URL rule, but it's hard to read and doesn't have a description.

On 12.10.11 14:49, Bowie Bailey wrote:
>This is an issue that comes up on this list occasionally.  It sounds
>like a good idea at first, but when you start looking into it, you find
>that there is WAY too much legitimate email that does this for the rule
>to be useful.

much of those could be detected and/or whitelisted
(or, at least blacklisted, to prevent phishing coming to someone's 
users).

While I have no doubt there is much of wanted mail with URL and text 
mismatch, I still would like to have such rule.

That could be used at least in meta rules on many systems.
-- 
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Save the whales. Collect the whole set.

Re: antiphishing

Posted by Noel <no...@gmail.com>.

On 10/12/2011 1:57 PM, Kelson Vibber wrote:
> Yeah. There's an awful lot of newsletter, opt-in advertisement,
> and even transactional mail traffic that uses URL redirectors for
> click-tracking purposes, and far too often they'll put the
> destination URL (or a simplified form of it) in as the link text.

Yes.  And banks, paypal, facebook, and other phishing targets are
frequent offenders of this.  Modifying the link is not the answer
since some of these legit sites are finally starting to DKIM sign mail.


> It's a horrible practice, IMO, since it essentially trains people to ignore what should be a major phishing indicator, but it's also very common.

+1


  -- Noel Jones

RE: antiphishing

Posted by Kelson Vibber <KV...@tollfreeforwarding.com>.

> -----Original Message-----
> From: Bowie Bailey [mailto:Bowie_Bailey@BUC.com]
>
> This is an issue that comes up on this list occasionally.  It sounds like a good
> idea at first, but when you start looking into it, you find that there is WAY too
> much legitimate email that does this for the rule to be useful.

Yeah. There's an awful lot of newsletter, opt-in advertisement, and even transactional mail traffic that uses URL redirectors for click-tracking purposes, and far too often they'll put the destination URL (or a simplified form of it) in as the link text.

It's a horrible practice, IMO, since it essentially trains people to ignore what should be a major phishing indicator, but it's also very common.

--Kelson Vibber

Re: antiphishing

Posted by Bowie Bailey <Bo...@BUC.com>.

On 10/12/2011 2:25 PM, darxus@chaosreigns.com wrote:
> On 10/12, Christian Grunfeld wrote:
>> Many phishing mails exploit the bad knowledge of the difference
>> between real url and link anchor text by simple users. So they show
> Does spamassassin really not have a rule to detect this?  I just dug
> up a perfect example - trying to look like an email from youtube, with
> something like
> '<a href="http://phishingjunk.com">http://www.youtube.com/stuff</a>', 
> and it didn't hit any rule that seemed relevant to that bit of deception.
>
> It certainly seems like it would be very useful.  I see there's a
> __SPOOFED_URL rule, but it's hard to read and doesn't have a description.

This is an issue that comes up on this list occasionally.  It sounds
like a good idea at first, but when you start looking into it, you find
that there is WAY too much legitimate email that does this for the rule
to be useful.

-- 
Bowie

Re: antiphishing

Posted by Adam Katz <an...@khopis.com>.

On 10/12/2011 11:48 AM, darxus@chaosreigns.com wrote:
> Which uses it as part of SPOOFED_URL (the "__" in the other rule is
> important), which is described as:
> "Has a link whose text is a different URL".  But that one hasn't made it
> into the default rule set yet.  Ah, it hits 1.1% of spam but also 0.7% of
> non-spam, shame:
> http://ruleqa.spamassassin.org/?daterev=20111008-r1180336-n&rule=%2Fspoofed
> (it got a T_ prepended to it due to being in testing)
> 
> Wonder what it's hitting in non-spam.  And if it could be improved by just
> checking for domain mismatch instead of complete url match, if it's not
> doing that already.

As noted in the comment right next to the rule, most of those hits are
marketing trackers.  Another abutting comment notes that LeadLander has
a truncation habit that used to cause it to mis-fire.  There are also
abbreviations, parsing errors (not necessarily from SA), and probably
also link shorteners and gags.

I was a little out of sync with subversion.  This is now fixed.

While the new version is a bit better, it's still nowhere near good
enough to become a stand-alone rule, even with all the help I tried to
give it.

Re: antiphishing

Posted by da...@chaosreigns.com.

On 10/12, Christian Grunfeld wrote:
> > It certainly seems like it would be very useful.  I see there's a
> > __SPOOFED_URL rule, but it's hard to read and doesn't have a description.
> 
> where did you find that rule ?

On my server in the file
/var/lib/spamassassin/3.004000/updates_spamassassin_org/72_active.cf

Looks like it comes from:
http://svn.apache.org/viewvc/spamassassin/trunk/rulesrc/sandbox/khopesh/20_khop_experimental.cf?view=markup
Which uses it as part of SPOOFED_URL (the "__" in the other rule is
important), which is described as:
"Has a link whose text is a different URL".  But that one hasn't made it
into the default rule set yet.  Ah, it hits 1.1% of spam but also 0.7% of
non-spam, shame:
http://ruleqa.spamassassin.org/?daterev=20111008-r1180336-n&rule=%2Fspoofed
(it got a T_ prepended to it due to being in testing)

Wonder what it's hitting in non-spam.  And if it could be improved by just
checking for domain mismatch instead of complete url match, if it's not
doing that already.

-- 
"Of course there's strength in numbers. But there's strength in sharp
weaponry too. Ironically, this lead to what we call 'civilization'."
- spore
http://www.ChaosReigns.com

Re: antiphishing

Posted by Christian Grunfeld <ch...@gmail.com>.

> It certainly seems like it would be very useful.  I see there's a
> __SPOOFED_URL rule, but it's hard to read and doesn't have a description.

where did you find that rule ?

Re: antiphishing

Posted by da...@chaosreigns.com.

On 10/12, Christian Grunfeld wrote:
> Many phishing mails exploit the bad knowledge of the difference
> between real url and link anchor text by simple users. So they show

Does spamassassin really not have a rule to detect this?  I just dug
up a perfect example - trying to look like an email from youtube, with
something like
'<a href="http://phishingjunk.com">http://www.youtube.com/stuff</a>', 
and it didn't hit any rule that seemed relevant to that bit of deception.

It certainly seems like it would be very useful.  I see there's a
__SPOOFED_URL rule, but it's hard to read and doesn't have a description.

-- 
"I would believe only in a God that knows how to Dance." - Nietzsche
http://www.ChaosReigns.com

Re: antiphishing

Posted by Martin Hepworth <ma...@gmail.com>.

Like mailscanner does then :-)

On Wednesday, 12 October 2011, Christian Grunfeld <
christian.grunfeld@gmail.com> wrote:
> Hi,
>
> I have an idea that I want to discuss with users and developers.
>
> Many phishing mails exploit the bad knowledge of the difference
> between real url and link anchor text by simple users. So they show
> atractive link text that points to hiden, unrecognized and evil urls.
> eg: exe files hiden by photo names, etc.
>
> My idea is to have a rewrite engine in spamassassin that can rewrite
> real url in place of the link anchor text or at least to write it near
> the anchor text without removing it. In that way people can check if
> both agree or if the url is known or unknown. It would be another step
> before the "inevitable click" :p
>
> The link functionality is not broken in any case (good or evil link)
> so genuine links can be followed and evil links can be warned !
>
> In sumary...replace text between <a> and </a> by the href or add the
> href next to the text with an ascii arrow (-->) or something like
> that.
>
> Cheers !
>

-- 
-- 
Martin Hepworth
Oxford, UK