You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@spamassassin.apache.org by francis picabia <fp...@gmail.com> on 2014/10/27 19:01:50 UTC

Re: spamassassin rule to combat phishing

On Fri, Sep 19, 2014 at 2:59 PM, John Hardin <jh...@impsec.org> wrote:

> On Fri, 19 Sep 2014, francis picabia wrote:
>
>  On Tue, Sep 16, 2014 at 5:27 PM, John Hardin <jh...@impsec.org> wrote:
>>
>>  On Tue, 16 Sep 2014, francis picabia wrote:
>>>
>>>  Hello,
>>>
>>>>
>>>> We just received the most authentic looking phishing I've seen. It was
>>>> professionally written, included a nice signature in the style used by
>>>> people at my workplace, and the target link was an exact replica of an
>>>> ezproxy website we run.
>>>>
>>>> The URL domain was only different by a few letters.  I'm thinking we
>>>> will
>>>> see more of these.  So here is a question perhaps someone can solve and
>>>> many of us can benefit from...
>>>>
>>>> How can I make a uri rule which matches
>>>>
>>>> example.com.junk/
>>>> but does not match
>>>> example.com/
>>>>
>>>>
>>>   uri  URI_EXAMPLE_EXTRA  m;^https?://(?:www\.)?example\.com[^/?];i
>>>
>>
>>
>> That's a great one liner. I'm glad I asked.  Thank you for this.
>>
>
> Warning: I did not actually test it. Please test it before putting it into
> production.
>
>
Yes, understood.  I did test and it seemed to work OK.

However another spoofed message was received today and the rule
did not capture it.

If I want to detect something in the form of:
random_server.example.com.junk
I need to wildcard the first bit.  Would that be:

uri  URI_EXAMPLE_EXTRA  m;^https?://(?:.*\.)?example\.com[^/?];i

I don't understand what the question mark and colon does inside the ( )
I thought it followed an optional char or expression.  Should it be
like this?

uri  URI_EXAMPLE_EXTRA  m;^https?://(.*\.)?example\.com[^/?];i

Re: spamassassin rule to combat phishing

Posted by francis picabia <fp...@gmail.com>.
On Wed, Oct 29, 2014 at 10:27 AM, francis picabia <fp...@gmail.com>
wrote:

> I've tested the rule:
>
> uri     URI_MYDOMAIN_PHISH
> m;^https?://(?:[^./]+\.)*example\.com[^/?];i
>
>
> is catching this sample newletter link:
>
> Oct 29 09:38:50.368 [24608] dbg: rules: ran uri rule
> URI_MYDOMAIN_PHISH ======> got hit: "http://example.com&"
>
> Complete email body content in test of newsletter link:
>
> <a target="_blank"
> href="http://www.environmental-expert.com/redirectnewsletter_login.asp?UR=
> L=http://www.environmental-expert.com&loginemail=user@example.com&loginc=
> ode=123456&utm_source=Articles_Waste_Recycling_01112014&utm_medium=em=
> ail&utm_campaign=newsletters&utm_content=logoclick"><img
> src="http://www.environmental-expert.com/newsletter/images/logo_dark_smal=
> l.gif"
> width="200" height="83" border="0"></a>
>
>
> I wonder how the RE can be tweaked to not match this case?
> I still don't understand the ?: part.
>

I don't know if it is the best solution, but adding & to the non-matching
clause has helped for the false positve and still catches the phishing
example:

uri     URI_MYDOMAIN_PHISH   m;^https?://(?:[^./]+\.)*example\.com[^/?&];i

Re: spamassassin rule to combat phishing

Posted by francis picabia <fp...@gmail.com>.
I've tested the rule:

uri     URI_MYDOMAIN_PHISH
m;^https?://(?:[^./]+\.)*example\.com[^/?];i


is catching this sample newletter link:

Oct 29 09:38:50.368 [24608] dbg: rules: ran uri rule
URI_MYDOMAIN_PHISH ======> got hit: "http://example.com&"

Complete email body content in test of newsletter link:

<a target="_blank"
href="http://www.environmental-expert.com/redirectnewsletter_login.asp?UR=
L=http://www.environmental-expert.com&loginemail=user@example.com&loginc=
ode=123456&utm_source=Articles_Waste_Recycling_01112014&utm_medium=em=
ail&utm_campaign=newsletters&utm_content=logoclick"><img
src="http://www.environmental-expert.com/newsletter/images/logo_dark_smal=
l.gif"
width="200" height="83" border="0"></a>


I wonder how the RE can be tweaked to not match this case?
I still don't understand the ?: part.

Re: spamassassin rule to combat phishing

Posted by francis picabia <fp...@gmail.com>.
On Tue, Oct 28, 2014 at 11:47 AM, francis picabia <fp...@gmail.com>
wrote:

>
>
> On Mon, Oct 27, 2014 at 4:55 PM, John Hardin <jh...@impsec.org> wrote:
>
>> On Mon, 27 Oct 2014, francis picabia wrote:
>>
>>    uri  URI_EXAMPLE_EXTRA  m;^https?://(?:www\.)?example\.com[^/?];i
>>>>>>
>>>>>
>>> However another spoofed message was received today and the rule
>>> did not capture it.
>>>
>>> If I want to detect something in the form of:
>>> random_server.example.com.junk
>>> I need to wildcard the first bit.  Would that be:
>>>
>>> uri  URI_EXAMPLE_EXTRA  m;^https?://(?:.*\.)?example\.com[^/?];i
>>>
>>> I don't understand what the question mark and colon does inside the ( )
>>> I thought it followed an optional char or expression.  Should it be
>>> like this?
>>>
>>> uri  URI_EXAMPLE_EXTRA  m;^https?://(.*\.)?example\.com[^/?];i
>>>
>>
>> (?:) means "group, don't remember the match". () remembers what's matched
>> for future use in the RE (e.g. to check for repeated strings like
>> "abcabcabcabc".
>>
>> Try this:
>>
>>   uri  URI_EXAMPLE_EXTRA  m;^https?://(?:[^./]+\.)*example\.com[^/?];i
>>
>>
> Once again, thanks for the RE coding.
>
> I found a false positive it captured with my attempt at this :
>
>  <a href="
> http://www.newslettersite.com/redirectnewsletter_login.asp?URL=http://www.secondsite.com/PYB/contact_us.asp&loginemail=user@example.com&logincode=123456&utm_source=Articles_Air_01112014&utm_medium=email&utm_campaign=newsletter&utm_content=contactus
> "
>
> I've tested your rule with that and it does not tag for the above.
> Great.  Hopefully useful to others facing domain spoofs in phishing.
>
> I thought this was a representative test case, but apparently
there is something triggering a false positive when the
email is a newsletter which embeds a user's email within URLs.

In the sample I've seen, there are 34 such possible links which may have
triggered the issue, but I don't know which.

I ran the quarantined sample through spamassassin -D and it shows:

Oct 28 16:24:01.391 [28945] dbg: rules: ran uri rule URI_MYDOMAIN_PHISH
======> got hit: "http://example.com&"

On prior lines in the trace I see other uri rules getting hits, but it
seems to be about different URLs.  The entire body of the email is base64
encoded.  Extracting that part and running base64 -d I am not finding
the hit described by SA trace.

This is my method:

zcat spam-jUVZBDml0wS5.gz | grep 'http://example.com'

So the URL is not in the non-base64 part.

zcat spam-jUVZBDml0wS5.gz > /tmp/spamfull
cp /tmp/spamfull /tmp/spam64
vi /tmp/spam64  (to remove headers)
base64  -d /tmp/spam64  | grep 'http://example.com'

(no matchs)

Double checked with:

spamassassin -D -lint < /tmp/spamfull 2>&1 | grep http://example.com

nothing is output except the line above with URI_MYDOMAIN_PHISH.

Is there any suggestion on how to nail down where the match is happening?

Re: spamassassin rule to combat phishing

Posted by francis picabia <fp...@gmail.com>.
On Mon, Oct 27, 2014 at 4:55 PM, John Hardin <jh...@impsec.org> wrote:

> On Mon, 27 Oct 2014, francis picabia wrote:
>
>    uri  URI_EXAMPLE_EXTRA  m;^https?://(?:www\.)?example\.com[^/?];i
>>>>>
>>>>
>> However another spoofed message was received today and the rule
>> did not capture it.
>>
>> If I want to detect something in the form of:
>> random_server.example.com.junk
>> I need to wildcard the first bit.  Would that be:
>>
>> uri  URI_EXAMPLE_EXTRA  m;^https?://(?:.*\.)?example\.com[^/?];i
>>
>> I don't understand what the question mark and colon does inside the ( )
>> I thought it followed an optional char or expression.  Should it be
>> like this?
>>
>> uri  URI_EXAMPLE_EXTRA  m;^https?://(.*\.)?example\.com[^/?];i
>>
>
> (?:) means "group, don't remember the match". () remembers what's matched
> for future use in the RE (e.g. to check for repeated strings like
> "abcabcabcabc".
>
> Try this:
>
>   uri  URI_EXAMPLE_EXTRA  m;^https?://(?:[^./]+\.)*example\.com[^/?];i
>
>
Once again, thanks for the RE coding.

I found a false positive it captured with my attempt at this :

 <a href="
http://www.newslettersite.com/redirectnewsletter_login.asp?URL=http://www.secondsite.com/PYB/contact_us.asp&loginemail=user@example.com&logincode=123456&utm_source=Articles_Air_01112014&utm_medium=email&utm_campaign=newsletter&utm_content=contactus
"

I've tested your rule with that and it does not tag for the above.
Great.  Hopefully useful to others facing domain spoofs in phishing.

Re: spamassassin rule to combat phishing

Posted by John Hardin <jh...@impsec.org>.
On Mon, 27 Oct 2014, francis picabia wrote:

>>>>   uri  URI_EXAMPLE_EXTRA  m;^https?://(?:www\.)?example\.com[^/?];i
>
> However another spoofed message was received today and the rule
> did not capture it.
>
> If I want to detect something in the form of:
> random_server.example.com.junk
> I need to wildcard the first bit.  Would that be:
>
> uri  URI_EXAMPLE_EXTRA  m;^https?://(?:.*\.)?example\.com[^/?];i
>
> I don't understand what the question mark and colon does inside the ( )
> I thought it followed an optional char or expression.  Should it be
> like this?
>
> uri  URI_EXAMPLE_EXTRA  m;^https?://(.*\.)?example\.com[^/?];i

(?:) means "group, don't remember the match". () remembers what's matched 
for future use in the RE (e.g. to check for repeated strings like 
"abcabcabcabc".

Try this:

   uri  URI_EXAMPLE_EXTRA  m;^https?://(?:[^./]+\.)*example\.com[^/?];i


-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   ...the Fates notice those who buy chainsaws...
                                               -- www.darwinawards.com
-----------------------------------------------------------------------
  4 days until Halloween