You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spamassassin.apache.org by bu...@bugzilla.spamassassin.org on 2011/10/17 22:01:17 UTC
[Bug 6676] New: Add SPOOFED_URL_HOST to Darxus' sandbox
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6676
Bug #: 6676
Summary: Add SPOOFED_URL_HOST to Darxus' sandbox
Product: Spamassassin
Version: 3.4.0
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Rules
AssignedTo: dev@spamassassin.apache.org
ReportedBy: Darxus@ChaosReigns.com
Classification: Unclassified
Created attachment 4980
--> https://issues.apache.org/SpamAssassin/attachment.cgi?id=4980
SPOOFED_URL_HOST patch
Khopesh has a SPOOFED_URL rule in his sandbox which matches cases like:
<a href="http://www.spammerdomain.com">http://www.youtube.com</a>
But it also hits cases like:
<a
href="http://www.legitdomain.com/?variable=ILikeToTrackAllKindsOfRandomJunk">http://www.legitdomain.com</a>
SPOOFED_URL_HOST, in the attached patch, is a small modification of the rule
that only matches where the host part of the URL is different.
I believe this doesn't require any voting, just a commit, since I don't have
commit access.
Of the 10,923 hams in my ham corpora, SPOOFED_URL hits 48, and
__SPOOFED_URL_HOST hits 10 of those, 4 of which are Google Calendar "legit"
marketing tracker cases.
Can't just ignore all cases going through google's redirector, because spammers
can then just route all their links through it. But we could add a check for
DKIM_VALID_AU and "From: Google Calendar <ca...@google.com>".
SPOOFED_URL hits 184 of my spams, __SPOOFED_URL_HOST hits 139 of those (of
5,256 spams).
All the hams hit by __SPOOFED_URL_HOST in my corpus:
Google Calendar:
<a href=3D"http://www.google.com/url?q=3Dhttp%3A%2F%2Fwww.templecon.=
org%2F&usd=3D2&usg=3DAFQjCNHGQtanthD0JfX4FmbFcyr2L_dqMw" target=3D"=
_blank">http://www.templecon.org/</a>
<a href=3D"http://www.google.com/url?q=3Dhttp%3A%2F%2Fwww.3rdcome.or=
g&usd=3D2&usg=3DAFQjCNGLyz4TL3lsRogAtJheCpPEHswi7Q" target=3D"_blan=
k">http://www.3rdcome.org</a>
<a href=3D"http://www.goo=
gle.com/url?q=3Dhttp%3A%2F%2Fwww.somervilleopenstudios.org%2F&usd=3D2&a=
mp;usg=3DAFQjCNED0k2VJve6M8pLRNRFcUnekaSCKg" target=3D"_blank">http://www.s=
omervilleopenstudios.org/</a>
<a href=3D"http://www.google.com/url?q=3Dhttp%3A%2F%2Fwww.=
wmos.org%2F&usd=3D2&usg=3DAFQjCNEicUtvxpcEJ8V5Nem2RTycomPYMQ" targe=
t=3D"_blank">http://www.wmos.org/</a>
MobileMe Mail htmlification flaw (dropped a "."):
<a href="http://www.youtube.com/watch?v=ywBwUiq6v4o"
_mce_href="http://www.youtube.com/watch?v=ywBwUiq5v4o">http://www.youtubecom/watch?v=ywBwUiq5v4o</a>
Botched htmlization from jockeycomfort.com:
<a href=3D"http://e=
mail.jockeycomfort.com/a/hBN2$afB8ardYB8bVlZAAAfkY4S/mobile?t_params=3DEMAI=
L%3D[redacted]%2540chaosreigns.com">http:///mobile</a>
(username replaced with "[redacted]")
Can't find the problem in an email from nhliberty.
Can't find problem in metalshapers yahoo groups email from CaptonZap@aol.com.
TurboTax:
<A HREF=3D"http://inf=
o1.turbotax.com/[redacted]">http://privacy.intuit.com</A>
(Both domains, turbotax.com and intuit.com, owned by the same organization.)
A private list:
<a href=3D"http://www.=
nramedia.org/t/193719/4743427/6880/0/" target=3D"_blank">http://www.nraila.=
org/Legislation/Read.aspx?ID=3D7061</a>
(Both domains owned by the same organization.)
--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
[Bug 6676] Add SPOOFED_URL_HOST to Darxus' sandbox
Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6676
Adam Katz <an...@khopis.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |antispam@khopis.com
--- Comment #4 from Adam Katz <an...@khopis.com> 2011-10-17 21:51:30 UTC ---
Both rules are useful. Mine is more FP-prone but will also hit more TPs. It's
just a matter of figuring out which other rules to pair with these to best nail
phish without hitting ham or marketing.
--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
[Bug 6676] Add SPOOFED_URL_HOST to Darxus' sandbox
Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6676
--- Comment #5 from Darxus <Da...@ChaosReigns.com> 2011-10-18 18:32:37 UTC ---
SPOOFED_URL_HOST had worse results than SPOOFED_URL.
Ham that hit, in my corpus:
9 were google calendar marketing tracker urls.
6 were third party marketing trackers r20.rs6.net, links.mkt030.com,
*.delivery.net.
5 were different hosts in the same domain, possibly all from one company.
2 were different domains owned by the same company (intuit.com / turbotax.com).
2 were emails I shouldn't have had in my corpus, that were off list replies on
the subject of this rule.
1 was a bug in somebody's conversion from plain text to html in a reply.
1 I didn't manage to track down.
So there are definitely opportunities for improvement.
These three third party marketing trackers look pretty safe to grant exceptions
to, because it's not immediately obvious how someone could use them as a
redirector to an arbitrary url.
And the google calendar thing could be handled with verification that it's
coming from google calendar.
A plugin could be written to use
Mail::SpamAssassin::Util::RegistrarBoundaries::trim_domain() to only hit
changes in domain name, and not changes in host name within the same domain.
That would leave 2 hits with different domains owned by the same company, 2
conversion to html bugs, and 1 I couldn't figure out.
All the ham hits:
www.amazon.com -> amazon.com (same domain, different host)
r20.rs6.net - marketing tracker
email.capitalone.com -> turbotax.intuit.com/affiliate/capitalone35 - from
capitalone about turbotax
google calendar
google calendar
google calendar
google calendar
google calendar
google calendar
mailman htmlification bug + different host in same domain
sprint.r.delivery.net - marketing tracker
google calendar
google calendar
links.mkt030.com - marketing tracker
Jockey broken htmlification
google calendar
Different host name in same domain
Unknown problem with metalshapers yahoo group
turbotax.com -> intuit.com (same company, 2 domains)
different host, same domain
different host, same domain
different domain, same company
different host, same domain
r20.rs6.net - marketing tracker
r20.rs6.net - marketing tracker
r20.rs6.net - marketing tracker
r20.rs6.net - marketing tracker
Oops - off list reply on the subject of this rule, removed from corpora
Oops - off list reply on the subject of this rule, removed from corpora
--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
[Bug 6676] Add SPOOFED_URL_HOST to Darxus' sandbox
Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6676
Kevin A. McGrail <km...@pccc.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |kmcgrail@pccc.com
--- Comment #1 from Kevin A. McGrail <km...@pccc.com> 2011-10-17 20:28:37 UTC ---
Looks like a good tweak on the old rule. The goal will be to replace khopesh's
rule with this one if it looks good on the mass checks?
svn commit -m 'Added Spoofed URL test rule per bug 6676 for Darxus'
Adding kmcgrail/20_darxus_experimental.cf
Transmitting file data .
Committed revision 1185356.
--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
[Bug 6676] Add SPOOFED_URL_HOST to Darxus' sandbox
Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6676
Darxus <Da...@ChaosReigns.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
CC| |Darxus@ChaosReigns.com
Resolution| |FIXED
--- Comment #2 from Darxus <Da...@ChaosReigns.com> 2011-10-17 20:33:19 UTC ---
Yes, thanks.
--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
[Bug 6676] Add SPOOFED_URL_HOST to Darxus' sandbox
Posted by bu...@bugzilla.spamassassin.org.
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6676
--- Comment #3 from Darxus <Da...@ChaosReigns.com> 2011-10-17 20:42:03 UTC ---
This came up on the users list here:
http://old.nabble.com/antiphishing-to32640290.html#a32640442
--
Configure bugmail: https://issues.apache.org/SpamAssassin/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.