You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2011/10/17 22:01:17 UTC

[Bug 6676] New: Add SPOOFED_URL_HOST to Darxus' sandbox

https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6676

             Bug #: 6676
           Summary: Add SPOOFED_URL_HOST to Darxus' sandbox
           Product: Spamassassin
           Version: 3.4.0
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Rules
        AssignedTo: dev@spamassassin.apache.org
        ReportedBy: Darxus@ChaosReigns.com
    Classification: Unclassified


Created attachment 4980
  --> https://issues.apache.org/SpamAssassin/attachment.cgi?id=4980
SPOOFED_URL_HOST patch

Khopesh has a SPOOFED_URL rule in his sandbox which matches cases like:

<a href="http://www.spammerdomain.com">http://www.youtube.com</a>

But it also hits cases like:

<a
href="http://www.legitdomain.com/?variable=ILikeToTrackAllKindsOfRandomJunk">http://www.legitdomain.com</a>

SPOOFED_URL_HOST, in the attached patch, is a small modification of the rule
that only matches where the host part of the URL is different.

I believe this doesn't require any voting, just a commit, since I don't have
commit access.


Of the 10,923 hams in my ham corpora, SPOOFED_URL hits 48, and
__SPOOFED_URL_HOST hits 10 of those, 4 of which are Google Calendar "legit"
marketing tracker cases.

Can't just ignore all cases going through google's redirector, because spammers
can then just route all their links through it.  But we could add a check for
DKIM_VALID_AU and "From: Google Calendar <ca...@google.com>".

SPOOFED_URL hits 184 of my spams, __SPOOFED_URL_HOST hits 139 of those (of
5,256 spams).


All the hams hit by __SPOOFED_URL_HOST in my corpus:

Google Calendar:

<a href=3D"http://www.google.com/url?q=3Dhttp%3A%2F%2Fwww.templecon.=
org%2F&amp;usd=3D2&amp;usg=3DAFQjCNHGQtanthD0JfX4FmbFcyr2L_dqMw" target=3D"=
_blank">http://www.templecon.org/</a>

<a href=3D"http://www.google.com/url?q=3Dhttp%3A%2F%2Fwww.3rdcome.or=
g&amp;usd=3D2&amp;usg=3DAFQjCNGLyz4TL3lsRogAtJheCpPEHswi7Q" target=3D"_blan=
k">http://www.3rdcome.org</a>

<a href=3D"http://www.goo=
gle.com/url?q=3Dhttp%3A%2F%2Fwww.somervilleopenstudios.org%2F&amp;usd=3D2&a=
mp;usg=3DAFQjCNED0k2VJve6M8pLRNRFcUnekaSCKg" target=3D"_blank">http://www.s=
omervilleopenstudios.org/</a>

<a href=3D"http://www.google.com/url?q=3Dhttp%3A%2F%2Fwww.=
wmos.org%2F&amp;usd=3D2&amp;usg=3DAFQjCNEicUtvxpcEJ8V5Nem2RTycomPYMQ" targe=
t=3D"_blank">http://www.wmos.org/</a>



MobileMe Mail htmlification flaw (dropped a "."):
<a href="http://www.youtube.com/watch?v=ywBwUiq6v4o"
_mce_href="http://www.youtube.com/watch?v=ywBwUiq5v4o">http://www.youtubecom/watch?v=ywBwUiq5v4o</a>

Botched htmlization from jockeycomfort.com:
<a href=3D"http://e=
mail.jockeycomfort.com/a/hBN2$afB8ardYB8bVlZAAAfkY4S/mobile?t_params=3DEMAI=
L%3D[redacted]%2540chaosreigns.com">http:///mobile</a>
(username replaced with "[redacted]")

Can't find the problem in an email from nhliberty.

Can't find problem in metalshapers yahoo groups email from CaptonZap@aol.com.

TurboTax:
<A HREF=3D"http://inf=
o1.turbotax.com/[redacted]">http://privacy.intuit.com</A>
(Both domains, turbotax.com and intuit.com, owned by the same organization.)

A private list:
<a href=3D"http://www.=
nramedia.org/t/193719/4743427/6880/0/" target=3D"_blank">http://www.nraila.=
org/Legislation/Read.aspx?ID=3D7061</a>
(Both domains owned by the same organization.)

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6676] Add SPOOFED_URL_HOST to Darxus' sandbox

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6676

Adam Katz <an...@khopis.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |antispam@khopis.com

--- Comment #4 from Adam Katz <an...@khopis.com> 2011-10-17 21:51:30 UTC ---
Both rules are useful.  Mine is more FP-prone but will also hit more TPs.  It's
just a matter of figuring out which other rules to pair with these to best nail
phish without hitting ham or marketing.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6676] Add SPOOFED_URL_HOST to Darxus' sandbox

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6676

--- Comment #5 from Darxus <Da...@ChaosReigns.com> 2011-10-18 18:32:37 UTC ---
SPOOFED_URL_HOST had worse results than SPOOFED_URL.

Ham that hit, in my corpus:
9 were google calendar marketing tracker urls.
6 were third party marketing trackers r20.rs6.net, links.mkt030.com,
*.delivery.net.
5 were different hosts in the same domain, possibly all from one company.
2 were different domains owned by the same company (intuit.com / turbotax.com).
2 were emails I shouldn't have had in my corpus, that were off list replies on
the subject of this rule.
1 was a bug in somebody's conversion from plain text to html in a reply.
1 I didn't manage to track down.

So there are definitely opportunities for improvement.
These three third party marketing trackers look pretty safe to grant exceptions
to, because it's not immediately obvious how someone could use them as a
redirector to an arbitrary url.  
And the google calendar thing could be handled with verification that it's
coming from google calendar.
A plugin could be written to use
Mail::SpamAssassin::Util::RegistrarBoundaries::trim_domain() to only hit
changes in domain name, and not changes in host name within the same domain.
That would leave 2 hits with different domains owned by the same company, 2
conversion to html bugs, and 1 I couldn't figure out.

All the ham hits:

www.amazon.com -> amazon.com (same domain, different host)

r20.rs6.net - marketing tracker

email.capitalone.com -> turbotax.intuit.com/affiliate/capitalone35 - from
capitalone about turbotax

google calendar
google calendar
google calendar
google calendar
google calendar
google calendar

mailman htmlification bug + different host in same domain

sprint.r.delivery.net - marketing tracker

google calendar
google calendar

links.mkt030.com - marketing tracker

Jockey broken htmlification

google calendar

Different host name in same domain

Unknown problem with metalshapers yahoo group

turbotax.com -> intuit.com (same company, 2 domains)

different host, same domain

different host, same domain

different domain, same company

different host, same domain

r20.rs6.net - marketing tracker

r20.rs6.net - marketing tracker

r20.rs6.net - marketing tracker

r20.rs6.net - marketing tracker

Oops - off list reply on the subject of this rule, removed from corpora
Oops - off list reply on the subject of this rule, removed from corpora

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6676] Add SPOOFED_URL_HOST to Darxus' sandbox

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6676

Kevin A. McGrail <km...@pccc.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kmcgrail@pccc.com

--- Comment #1 from Kevin A. McGrail <km...@pccc.com> 2011-10-17 20:28:37 UTC ---
Looks like a good tweak on the old rule.  The goal will be to replace khopesh's
rule with this one if it looks good on the mass checks?

 svn commit -m 'Added Spoofed URL test rule per bug 6676 for Darxus'
Adding         kmcgrail/20_darxus_experimental.cf
Transmitting file data .
Committed revision 1185356.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6676] Add SPOOFED_URL_HOST to Darxus' sandbox

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6676

Darxus <Da...@ChaosReigns.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |Darxus@ChaosReigns.com
         Resolution|                            |FIXED

--- Comment #2 from Darxus <Da...@ChaosReigns.com> 2011-10-17 20:33:19 UTC ---
Yes, thanks.

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

[Bug 6676] Add SPOOFED_URL_HOST to Darxus' sandbox

Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6676

--- Comment #3 from Darxus <Da...@ChaosReigns.com> 2011-10-17 20:42:03 UTC ---
This came up on the users list here:
http://old.nabble.com/antiphishing-to32640290.html#a32640442

-- 
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.