You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spamassassin.apache.org by John Fawcett <jo...@michaweb.net> on 2004/04/17 12:22:08 UTC

[long] summary of currently unparsed url types

I'd just like to summarize the current position with regard to url types
which are not currently parsed correctly by sa and ask for some help with
tests using version 3.

Yahoo offers a public redirection service. You can enter a url like this:
http://rds.yahoo.com/*http://www.google.com
and you get sent to www.google.com. (By the way I'm not sure what the point
of this is, because unlike
tinyurl.com the yahoo url is longer. However it sure comes in handy to
spammers who are trying
to get past sa URI rulesets.)

Spam which is not picked up correctly by sa uri filters often contains
redirection urls, even though the redirected domain is in sc.surbl.org. Jeff
Chan has opened a bug against URIDNSBL.pm to ask for support for parsing out
the spammer domain from redirected urls.
http://bugzilla.spamassassin.org/show_bug.cgi?id=3261

Things are getting more complicated, because spam coming through seems to
contain features which
avoid it being picked up even by an altered parser which strips off the
http://rds.yahoo.com/* part.

I wanted to make a summary of current understanding of the url types which
break parsing. I've tested these with SpamCopURI and ver 2.63. If someone
offers to test (from case 2 onwards)
with URIDNSBL and version 3, I'll post suitable test cases.

1.http://rds.yahoo.com/*http://spammer.domain.tld/aaaaaaaaaa (bug 3261)
Workaround in PerMsgStatus.pm:
     $uri =~ s/^http:\/\/(?:drs|rd).yahoo.com\/[^\*]+\*(.*)$/$1/g;

2.http://rds.yahoo.com/*%68ttp://spammer.domain.tld/aaaaaaaa (follow-up to
bug 3261
including test case)
(the other possible variations on this which I haven't seen as yet can use
%NN instead of
any or all the 'http' characters in the redirected domain. e.g.
http://rds.yahoo.com/*%68%74%74%70://spammer.domain.tld/aaaaaaaa

Workaround in PerMsgStatus.pm:
         $uri =~ s/\%68/h/g;
         $uri =~ s/\%74/t/g;
         $uri =~ s/\%70/p/g;

3. http://rd.yahoo.com/winery/college/banbury/*http:/len=
derserv.com?partid=3Darlenders

The redirect url is formally incorrect (there is a single slash
after http) but browsers have no problem with this. The parser
cannot handle it.

Workaround in PerMsgStatus.pm:
    $uri =~ s/http:\/([^\/])/http:\/\/$1/g;

By the way, this url contains 'quotable printable' characters ('= newline'
and '=3d')
which are not causing problems to the parser. Neither is the absence
of a trailing slash before the ? causing problems in parsing.

4. URLS without http: in front of them. The following seen in a browser
reads:
"Please copy and paste this link into your browser healthyexchange.biz "

<p>
P<advisory>l<aboveboard>e<compose>a<geochronology>s<moral>e<palfrey> <rada=
r>c<symptomatic>o<yankee>p<conduit>y<souffle> <intake>a<arise>n<eocene>d <=
thickish>paste <impact>this <broadloom>link <road>i<dichotomous>n<quinine>=
t<scoreboard>o y<eager>o<impact>ur b<archenemy>r<band>o<wallop>wser <b> he=
althyexchange.biz</b>

Probably not much that can be dones with this.

5.
http://http://www.eager-18.com/_7953f10b575a18d044cdec5a40bd4f22//?d=vision
Here the double http prevents this being parsed. (OK it wasn't in
sc.surbl.org but even
if it was it wouldn't have been picked up)

Workaround in PerMsgStatus.pm:
    $uri =~ s/http:\/\/http:\/\//http:\/\//g;

John

Re: [long] summary of currently unparsed url types

Posted by Loren Wilton <lw...@earthlink.net>.

Thanks for the rules fodder!

BTW, msn also has an open redirector that is seeing much use:

uri   LWTEST_REDIRECT1 m'http://g.msn.com/0AD0000[A-Z]/\d{6}\.1[/\?]'i
describe LWTEST_REDIRECT1 Open MSN redirector found in URL

        Loren

----- Original Message ----- 
From: "John Fawcett" <jo...@michaweb.net>
To: <sp...@incubator.apache.org>; <di...@lists.surbl.org>
Sent: Saturday, April 17, 2004 3:22 AM
Subject: [long] summary of currently unparsed url types


> I'd just like to summarize the current position with regard to url types
> which are not currently parsed correctly by sa and ask for some help with
> tests using version 3.
>
> Yahoo offers a public redirection service. You can enter a url like this:
> http://rds.yahoo.com/*http://www.google.com
> and you get sent to www.google.com. (By the way I'm not sure what the
point
> of this is, because unlike
> tinyurl.com the yahoo url is longer. However it sure comes in handy to
> spammers who are trying
> to get past sa URI rulesets.)
>
> Spam which is not picked up correctly by sa uri filters often contains
> redirection urls, even though the redirected domain is in sc.surbl.org.
Jeff
> Chan has opened a bug against URIDNSBL.pm to ask for support for parsing
out
> the spammer domain from redirected urls.
> http://bugzilla.spamassassin.org/show_bug.cgi?id=3261
>
> Things are getting more complicated, because spam coming through seems to
> contain features which
> avoid it being picked up even by an altered parser which strips off the
> http://rds.yahoo.com/* part.
>
> I wanted to make a summary of current understanding of the url types which
> break parsing. I've tested these with SpamCopURI and ver 2.63. If someone
> offers to test (from case 2 onwards)
> with URIDNSBL and version 3, I'll post suitable test cases.
>
> 1.http://rds.yahoo.com/*http://spammer.domain.tld/aaaaaaaaaa (bug 3261)
> Workaround in PerMsgStatus.pm:
>      $uri =~ s/^http:\/\/(?:drs|rd).yahoo.com\/[^\*]+\*(.*)$/$1/g;
>
> 2.http://rds.yahoo.com/*%68ttp://spammer.domain.tld/aaaaaaaa (follow-up to
> bug 3261
> including test case)
> (the other possible variations on this which I haven't seen as yet can use
> %NN instead of
> any or all the 'http' characters in the redirected domain. e.g.
> http://rds.yahoo.com/*%68%74%74%70://spammer.domain.tld/aaaaaaaa
>
> Workaround in PerMsgStatus.pm:
>          $uri =~ s/\%68/h/g;
>          $uri =~ s/\%74/t/g;
>          $uri =~ s/\%70/p/g;
>
> 3. http://rd.yahoo.com/winery/college/banbury/*http:/len=
> derserv.com?partid=3Darlenders
>
> The redirect url is formally incorrect (there is a single slash
> after http) but browsers have no problem with this. The parser
> cannot handle it.
>
> Workaround in PerMsgStatus.pm:
>     $uri =~ s/http:\/([^\/])/http:\/\/$1/g;
>
> By the way, this url contains 'quotable printable' characters ('= newline'
> and '=3d')
> which are not causing problems to the parser. Neither is the absence
> of a trailing slash before the ? causing problems in parsing.
>
> 4. URLS without http: in front of them. The following seen in a browser
> reads:
> "Please copy and paste this link into your browser healthyexchange.biz "
>
> <p>
> P<advisory>l<aboveboard>e<compose>a<geochronology>s<moral>e<palfrey>
<rada=
> r>c<symptomatic>o<yankee>p<conduit>y<souffle> <intake>a<arise>n<eocene>d
<=
> thickish>paste <impact>this <broadloom>link
<road>i<dichotomous>n<quinine>=
> t<scoreboard>o y<eager>o<impact>ur b<archenemy>r<band>o<wallop>wser <b>
he=
> althyexchange.biz</b>
>
> Probably not much that can be dones with this.
>
> 5.
>
http://http://www.eager-18.com/_7953f10b575a18d044cdec5a40bd4f22//?d=vision
> Here the double http prevents this being parsed. (OK it wasn't in
> sc.surbl.org but even
> if it was it wouldn't have been picked up)
>
> Workaround in PerMsgStatus.pm:
>     $uri =~ s/http:\/\/http:\/\//http:\/\//g;
>
> John
>