You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Christian Lerrahn <li...@penpal4u.net> on 2007/12/08 02:18:44 UTC

[users@httpd] Redirecting paths with extra slashes

Hi,
I have a site that for some reason got indexed by Google with wrong
paths which contain too many slashes. E.g. instead of an entry for

/foo/bar

there might be one for

/foo///bar

Now this is no problem for a Apache but it is for some of my pages
because they use relative paths. If I have an image
in /foor/bar/index.html which looks like

<img src=../../img/img1.jpg>

that will work fine for /foo/bar but not for /foo///bar which will then
look for /foo/img instead of /img.

What I would like to do is get Apache to redirect wrong addresses like
this to the correct one. That way I get the relative paths correct and
hopefully one day Google will have it right, too. I was thinking of
mod_rewrite but I couldn't get it to work. However, I only can get
rules that either don't seem to do anything or ones that get me into an
infinite loop of redirects. Can anybody give me a hint?

Cheers,
Christian

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Redirecting paths with extra slashes

Posted by Christian Lerrahn <li...@penpal4u.net>.
On Sun, 9 Dec 2007 11:06:31 -0500
"Eric Covener" <co...@gmail.com> wrote:

> On Dec 9, 2007 1:09 AM, Christian Lerrahn <li...@penpal4u.net> wrote:
> 
> > RewriteRule (.*/)/+(.*) $1$2 [R=permanent,L]
> >
> > fixes almost all of my problems. The only problem that remains is
> > that the pattern doesn't match at the beginning of the path.
> 
> 
> Works for me in perl and httpd.  Try RewriteLog and looking at the
> HTTP request/response directly.

When I log the rewriting, I see that the URL ///foo/bar gets logged
as /foo/bar already. However, this is only the case for the rewrite
log. In the access.log, I will still see the URL as I entered it. That
might also explain why the leading slashes disappear if there is a
redirect due to a match somewhere else in the path. Still the question
remains, why there is a 'wrong' path in the rewrite log.

@Eric: As you say that this works for you, I assume that you get
redirected. What version of Apache do you use? I'm on 2.2.6.

While there are no practical problems with leading slashes, I'd still
like to get rid of them, too.

Cheers,
Christian

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Redirecting paths with extra slashes

Posted by Eric Covener <co...@gmail.com>.
On Dec 9, 2007 1:09 AM, Christian Lerrahn <li...@penpal4u.net> wrote:

> RewriteRule (.*/)/+(.*) $1$2 [R=permanent,L]
>
> fixes almost all of my problems. The only problem that remains is that
> the pattern doesn't match at the beginning of the path.


Works for me in perl and httpd.  Try RewriteLog and looking at the
HTTP request/response directly.



-- 
Eric Covener
covener@gmail.com

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Redirecting paths with extra slashes

Posted by Christian Lerrahn <li...@penpal4u.net>.
On Sat, 8 Dec 2007 15:40:09 +0100
Torsten Foertsch <to...@gmx.net> wrote:

> On Sat 08 Dec 2007, Christian Lerrahn wrote:
> > > RewriteEngine On
> > > RewriteRule (.*)//+(.*) $1$2 [R=permanent,L]
> >
> > Thanks for that. I'm sorry to still bother. I'd like to get rid of
> > paths like //foo/bar, too, which do not match with this rule. To be
> > honest I don't quite understand the rule. That's probably the reason
> > why I can't modify if correctly to match to //foo/bar as well. When
> > I saw the regexp, I thought that I would end up without any slashes
> > but obviously I'm not. Wouldn't matching /foo//bar/ match as
> > $1=/foo and $2=bar/ ? Why does it not match like that? Then also it
> > seems to me that (.*) should also match an empty string which would
> > mean that leading slashes would get stripped, too. Why does that
> > not happen?
> 
> You need to know that * in regexes is greedy. That means it eats up
> as many characters as it could to match the regexp. So in /foo///bar
> $1 gets /foo/ and not only /foo.
> 
> What you need for $1 is a nongreedy one (*? instead of *), something
> like this:
> 
> RewriteRule (.*?)//+(.*) $1/$2 ...
> 
> You can try this in a little Perl-onliner:
> 
> perl -ne 'BEGIN {$|=1; print "> "} if(m!(.*?)//+(.*)!) {print
> "$1\t$2\n"} else {print "no match\n"} print "> "'
> 
> It offers you a "> " prompt to enter a string that is matched against
> that regexp. Then $1 and $2 are printed delimited by a tab-character.
> 
> You'll see that the new regexp matches even at the beginning of the
> line:
> 
> > /foo/bar
> no match
> > /foo//bar
> /foo    bar
> > /foo///bar
> /foo    bar
> > ///foo///bar
>         foo///bar
> > //foo//bar
>         foo//bar

I realised that the matching was greedy and assumed that the question
mark would serve the same purpose as in perl. However, ///foo/bar
should still match even if the pattern is greedy. After all, there is
no match to // between foo and bar. However, it does not match on // at
the beginning.
I actually was wrong in my last post. The rule

RewriteRule (.*/)/+(.*) $1$2 [R=permanent,L]

fixes almost all of my problems. The only problem that remains is that
the pattern doesn't match at the beginning of the path. The weird thing
is that a path like

//foo//bar

will get converted to /foo/bar in 2 redirection which are a match on
the first // first (//foo//bar -> /foo//bar) and then a match on the
later occurrence of // (i.e. /foo//bar -> /foo/bar). No, this does not
make any sense to me. :(

> The last 2 of the examples above reveal another problem with the
> approach. The RewriteRule matches only the first occurrence and then
> sends a redirect to the browser. If your URL contains multiple
> occurrences of subsequent slashes you may hit the browser's redirect
> limit.
> 
> To overcome that you can try to loop in mod_rewrite (untested):
> 
> RewriteRule (.*?)//+(.*) $1/$2 [E=R:$1/$2,N]
> 
> RewriteCond %{ENV:R} .
> RewriteRule . %{ENV:R} [R=permanent,L]

This doesn't matter too much to me. URLs that have more than one place
with too many slashes are rather rare. Therefore I'm ok with that
resulting in more than one redirect.

Cheers,
Christian

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Redirecting paths with extra slashes

Posted by Torsten Foertsch <to...@gmx.net>.
On Sat 08 Dec 2007, Christian Lerrahn wrote:
> > RewriteEngine On
> > RewriteRule (.*)//+(.*) $1$2 [R=permanent,L]
>
> Thanks for that. I'm sorry to still bother. I'd like to get rid of
> paths like //foo/bar, too, which do not match with this rule. To be
> honest I don't quite understand the rule. That's probably the reason
> why I can't modify if correctly to match to //foo/bar as well. When I
> saw the regexp, I thought that I would end up without any slashes but
> obviously I'm not. Wouldn't matching /foo//bar/ match as $1=/foo and
> $2=bar/ ? Why does it not match like that? Then also it seems to me
> that (.*) should also match an empty string which would mean that
> leading slashes would get stripped, too. Why does that not happen?

You need to know that * in regexes is greedy. That means it eats up as many 
characters as it could to match the regexp. So in /foo///bar $1 gets /foo/ 
and not only /foo.

What you need for $1 is a nongreedy one (*? instead of *), something like 
this:

RewriteRule (.*?)//+(.*) $1/$2 ...

You can try this in a little Perl-onliner:

perl -ne 'BEGIN {$|=1; print "> "} if(m!(.*?)//+(.*)!) {print "$1\t$2\n"} else 
{print "no match\n"} print "> "'

It offers you a "> " prompt to enter a string that is matched against that 
regexp. Then $1 and $2 are printed delimited by a tab-character.

You'll see that the new regexp matches even at the beginning of the line:

> /foo/bar
no match
> /foo//bar
/foo    bar
> /foo///bar
/foo    bar
> ///foo///bar
        foo///bar
> //foo//bar
        foo//bar

The last 2 of the examples above reveal another problem with the approach. The 
RewriteRule matches only the first occurrence and then sends a redirect to 
the browser. If your URL contains multiple occurrences of subsequent slashes 
you may hit the browser's redirect limit.

To overcome that you can try to loop in mod_rewrite (untested):

RewriteRule (.*?)//+(.*) $1/$2 [E=R:$1/$2,N]

RewriteCond %{ENV:R} .
RewriteRule . %{ENV:R} [R=permanent,L]

Torsten

--
A: It reverses the normal flow of conversation.
Q: What's wrong with top-posting?
A: Top-posting.
Q: What's the biggest scourge on plain text email discussions?

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Redirecting paths with extra slashes

Posted by Christian Lerrahn <li...@penpal4u.net>.
On Fri, 7 Dec 2007 22:38:33 -0500
"Joshua Slive" <jo...@slive.ca> wrote:

> On Dec 7, 2007 8:18 PM, Christian Lerrahn <li...@penpal4u.net> wrote:
> > Hi,
> > I have a site that for some reason got indexed by Google with wrong
> > paths which contain too many slashes. E.g. instead of an entry for
> >
> > /foo/bar
> >
> > there might be one for
> >
> > /foo///bar
> >
> > Now this is no problem for a Apache but it is for some of my pages
> > because they use relative paths. If I have an image
> > in /foor/bar/index.html which looks like
> >
> > <img src=../../img/img1.jpg>
> >
> > that will work fine for /foo/bar but not for /foo///bar which will
> > then look for /foo/img instead of /img.
> >
> > What I would like to do is get Apache to redirect wrong addresses
> > like this to the correct one. That way I get the relative paths
> > correct and hopefully one day Google will have it right, too. I was
> > thinking of mod_rewrite but I couldn't get it to work. However, I
> > only can get rules that either don't seem to do anything or ones
> > that get me into an infinite loop of redirects. Can anybody give me
> > a hint?
> 
> RewriteEngine On
> RewriteRule (.*)//+(.*) $1$2 [R=permanent,L]

Thanks for that. I'm sorry to still bother. I'd like to get rid of
paths like //foo/bar, too, which do not match with this rule. To be
honest I don't quite understand the rule. That's probably the reason
why I can't modify if correctly to match to //foo/bar as well. When I
saw the regexp, I thought that I would end up without any slashes but
obviously I'm not. Wouldn't matching /foo//bar/ match as $1=/foo and
$2=bar/ ? Why does it not match like that? Then also it seems to me
that (.*) should also match an empty string which would mean that
leading slashes would get stripped, too. Why does that not happen?

I'm really sorry. I'm usually not too bad with regexps (at least on
Perl) but this one is confusing me. :(

Cheers,
Christian

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Redirecting paths with extra slashes

Posted by Joshua Slive <jo...@slive.ca>.
On Dec 7, 2007 8:18 PM, Christian Lerrahn <li...@penpal4u.net> wrote:
> Hi,
> I have a site that for some reason got indexed by Google with wrong
> paths which contain too many slashes. E.g. instead of an entry for
>
> /foo/bar
>
> there might be one for
>
> /foo///bar
>
> Now this is no problem for a Apache but it is for some of my pages
> because they use relative paths. If I have an image
> in /foor/bar/index.html which looks like
>
> <img src=../../img/img1.jpg>
>
> that will work fine for /foo/bar but not for /foo///bar which will then
> look for /foo/img instead of /img.
>
> What I would like to do is get Apache to redirect wrong addresses like
> this to the correct one. That way I get the relative paths correct and
> hopefully one day Google will have it right, too. I was thinking of
> mod_rewrite but I couldn't get it to work. However, I only can get
> rules that either don't seem to do anything or ones that get me into an
> infinite loop of redirects. Can anybody give me a hint?

RewriteEngine On
RewriteRule (.*)//+(.*) $1$2 [R=permanent,L]

Joshua.

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org