You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modules-dev@httpd.apache.org by Joshua Marantz <jm...@google.com> on 2011/03/13 02:15:12 UTC

Saving the original request URI ahead of a mod_rewrite

Hi,

A new bug has surfaced in mod_pagespeed that we understand, but would
welcome advice on the best way to fix.  The problem is tracked in
http://code.google.com/p/modpagespeed/issues/detail?id=234.

Briefly, mod_pagespeed<http://code.google.com/speed/page-speed/docs/using_mod.html>seeks
to improve the performance of web sites by rewriting them while being
served from Apache.   mod_pagespeed transforms the HTML text in an output
filter.  To do this correctly, mod_pagespeed needs to know what URL that
browser thinks it has when it is displaying a site.

The failure scenario is when a site has a RewriteRule in an .htaaccess file.
 The request->unparsed_uri gets rewritten by mod_rewrite, so by the time
mod_pagespeed runs it has the wrong idea of where the page is.  This makes
mod_pagespeed resolve relative URLs embedded in the HTML in a manner
inconsistent from the browser.  We thought we had a solution to this problem
by putting in an early-running hook that saves the original
request->unparsed_uri in request->notes.

That seems to work in some cases, but, we've found, not when the RewriteRule
is in an .htaccess file.  In that case, mod_rewrite triggers an "internal
redirect", which causes an entirely new 'request' to be allocated, which
does *not* have a copy of the original request->notes.  It does make a
hacked version of request->subprocess_env however, prepending "REDIRECT_" to
each key.  It also seems that the new request has a pointer to the original
request (which has the note) in request->prev.  But this new request,
without the notes, is the one that's passed to mod_pagespeed's output
filter, with request->unparsed_uri pointing to the rewritten URL, which is
not consistent with the browser's URL bar.

So I'm writing to this group to get suggestions on the most robust way to
fix this.  Here are some ideas:

1. Add an early 'create_request' hook and use that to copy the 'notes' that
we care about from request->prev.
2. Change from storing notes in request->notes to request->subprocess_env.
 When we go to do the lookup with our key, we can look up REDIRECT_key if a
note with our original key is not found.  This strikes me as a brittle hack.
3. Follow the request->prev chain when looking up notes.  This strikes me as
risky because I have no idea what happens to the ->prev chain through all
the modules in the Apache eco-system, or how far down the chain I might have
to go.

So I like #1 best.  Any other opinions or ideas?

-Josh

Re: Saving the original request URI ahead of a mod_rewrite

Posted by Joshua Marantz <jm...@google.com>.
A more detailed analysis of what happens in Apache for this testcase is
here: http://code.google.com/p/modpagespeed/issues/detail?id=234#c16

<http://code.google.com/p/modpagespeed/issues/detail?id=234#c16>The code
says "internal_internal_redirect" but it makes a new request.  I haven't had
a chance to debug whether the pool would be a good place to attach our copy
of the original uri.  I'll check that out as well as r->main.

Thanks for the suggestions.

On Sat, Mar 12, 2011 at 10:02 PM, Eric Covener <co...@gmail.com> wrote:

> On Sat, Mar 12, 2011 at 9:45 PM, Ben Noordhuis <in...@bnoordhuis.nl> wrote:
> > On Sun, Mar 13, 2011 at 02:48, Eric Covener <co...@gmail.com> wrote:
> >>> So I like #1 best.  Any other opinions or ideas?
> >>
> >> I solved a similar problem recently by using apr_pool_userdata_set on
> >> r->pool which you can still find after the internal redirects of
> >> rewrite in htaccess / with PT flag.
> >
> > What Joshua is describing sounds like a sub-request and those get
> > their own pool, without the user data of the parent pool. I would
> > probably follow r->main.
>
> OP specifically mentions "internal redirect" and rewrite-in-htaccess.
>

Re: Saving the original request URI ahead of a mod_rewrite

Posted by Joshua Marantz <jm...@google.com>.
This is great data, Ben & Eric.

I'm a little worried about the "follow until" in terms of performance
(probably not really an issue most of the time) and vulnerability to other
modules corrupting state (for which mod_pagespeed would be blamed loudly on
twitter :)

So I'm still thinking of the using an early 'create_request' hook so that we
can propagate our 'notes' incrementally, but we should certainly be using
request->main in combination with request->prev to do the propagation if
that's the right thing to do.

On Sun, Mar 13, 2011 at 9:35 AM, Ben Noordhuis <in...@bnoordhuis.nl> wrote:

> On Sun, Mar 13, 2011 at 13:15, Eric Covener <co...@gmail.com> wrote:
> > r->main doesn't change on an internal redirect AFAICT.
>
> You're right. And there is ap_internal_fast_redirect() that works
> different still. The only thing I can think of that should work for
> all three is to follow r->main until it's NULL, then follow r->prev
> until you're at the root request.
>

Re: Saving the original request URI ahead of a mod_rewrite

Posted by Ben Noordhuis <in...@bnoordhuis.nl>.
On Sun, Mar 13, 2011 at 13:15, Eric Covener <co...@gmail.com> wrote:
> r->main doesn't change on an internal redirect AFAICT.

You're right. And there is ap_internal_fast_redirect() that works
different still. The only thing I can think of that should work for
all three is to follow r->main until it's NULL, then follow r->prev
until you're at the root request.

Re: Saving the original request URI ahead of a mod_rewrite

Posted by Eric Covener <co...@gmail.com>.
On Sat, Mar 12, 2011 at 10:26 PM, Ben Noordhuis <in...@bnoordhuis.nl> wrote:
> On Sun, Mar 13, 2011 at 04:02, Eric Covener <co...@gmail.com> wrote:
>> OP specifically mentions "internal redirect" and rewrite-in-htaccess.
>
> Hah, the moment I fired off that email I thought "oh wait, mod_rewrite
> *does* do an internal redirect somewhere".
>
> Internal redirects share a pool so your suggestion would work but
> following r->main is still the better solution, I think, if only
> because it works for both redirects and sub-requests.

r->main doesn't change on an internal redirect AFAICT.

Re: Saving the original request URI ahead of a mod_rewrite

Posted by Ben Noordhuis <in...@bnoordhuis.nl>.
On Sun, Mar 13, 2011 at 04:02, Eric Covener <co...@gmail.com> wrote:
> OP specifically mentions "internal redirect" and rewrite-in-htaccess.

Hah, the moment I fired off that email I thought "oh wait, mod_rewrite
*does* do an internal redirect somewhere".

Internal redirects share a pool so your suggestion would work but
following r->main is still the better solution, I think, if only
because it works for both redirects and sub-requests.

Re: Saving the original request URI ahead of a mod_rewrite

Posted by Eric Covener <co...@gmail.com>.
On Sat, Mar 12, 2011 at 9:45 PM, Ben Noordhuis <in...@bnoordhuis.nl> wrote:
> On Sun, Mar 13, 2011 at 02:48, Eric Covener <co...@gmail.com> wrote:
>>> So I like #1 best.  Any other opinions or ideas?
>>
>> I solved a similar problem recently by using apr_pool_userdata_set on
>> r->pool which you can still find after the internal redirects of
>> rewrite in htaccess / with PT flag.
>
> What Joshua is describing sounds like a sub-request and those get
> their own pool, without the user data of the parent pool. I would
> probably follow r->main.

OP specifically mentions "internal redirect" and rewrite-in-htaccess.

Re: Saving the original request URI ahead of a mod_rewrite

Posted by Ben Noordhuis <in...@bnoordhuis.nl>.
On Sun, Mar 13, 2011 at 02:48, Eric Covener <co...@gmail.com> wrote:
>> So I like #1 best.  Any other opinions or ideas?
>
> I solved a similar problem recently by using apr_pool_userdata_set on
> r->pool which you can still find after the internal redirects of
> rewrite in htaccess / with PT flag.

What Joshua is describing sounds like a sub-request and those get
their own pool, without the user data of the parent pool. I would
probably follow r->main.

Re: Saving the original request URI ahead of a mod_rewrite

Posted by Eric Covener <co...@gmail.com>.
> So I like #1 best.  Any other opinions or ideas?

I solved a similar problem recently by using apr_pool_userdata_set on
r->pool which you can still find after the internal redirects of
rewrite in htaccess / with PT flag.