You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modules-dev@httpd.apache.org by Joshua Marantz <jm...@google.com> on 2011/01/01 00:16:08 UTC

Re: Overriding mod_rewrite from another module

Thanks for the quick response and the promising idea for a hack.  Looking at
mod_rewrite.c this does indeed look a lot more surgical, if, perhaps,
fragile, as mod_rewrite.c doesn't expose that string-constant in any formal
interface (even as a #define in a .h).  Nevertheless the solution is
easy-to-implement and easy-to-test, so...thanks!

I'm also still wondering if there's a good source of official documentation
for the detailed semantics of interfaces like ap_hook_translate_name.
 Neither a Google Search, a  stackoverflow.com search, nor the Apache
Modules<http://www.amazon.com/Apache-Modules-Book-Application-Development/dp/0132409674/ref=sr_1_1?ie=UTF8&qid=1293837117&sr=8-1>book
offer much detail.
code.google.com fares a little better but just points to 4 existing usages.

-Josh

On Fri, Dec 31, 2010 at 1:50 PM, Ben Noordhuis <in...@bnoordhuis.nl> wrote:

> On Fri, Dec 31, 2010 at 18:17, Joshua Marantz <jm...@google.com> wrote:
> > Is there a better way to solve the original problem: preventing
> mod_rewrite
> > from corrupting mod_pagespeed's resources?
>
> From memory and from a quick peek at mod_rewrite.c: in your
> translate_name hook, set a "mod_rewrite_rewritten" note in r->notes
> with value "0" and return DECLINED. That'll trick mod_rewrite into
> thinking that it has already processed the request.
>

Re: Overriding mod_rewrite from another module

Posted by Joshua Marantz <jm...@google.com>.
On Mon, Jan 3, 2011 at 6:15 PM, Eric Covener <co...@gmail.com> wrote:

> >> The access checking on mod_pagespeed resources is
> >> redundant, because the resource will either be served from cache (in
> which
> >> case it had to be authenticated to get into the cache in the first
> place) or
> >> will be decoded and the original resource(s) fetched from the same
> server
> >> with full authentication.
>
> Re: suppressing mod_authz_host: This doesn't sound like it guards
> against a user that meets the AAA conditions causing the resource to
> be cached and served to users who would not have met the AAA
> restrictions.


This is a good point, but I think I'm covered.  mod_pagespeed will only
rewrite resources that are publicly cacheable.  What does AAA stand for?
 Authorization & Authentication in Apache or something?  In any case I've
abandoned, for the moment, the attempt to bypass mod_authz_host on a
per-request basis.


> Maybe you are missing a map_to_storage callback to tell
> the core that this thing will really, really not be served from the
> filesystem.
>

I was not aware of the concept of a "map_to_storage callback" at all.  I
will have to investigate.  This may be very helpful.  Thanks.


> Re: suppressing rewrite.  Your comments in the src imply that rewrite
> is doing some of what you're also suppressing in
> server/core.c:ap_core_translate_name().  Also, it's odd that your
> scheme for suppressing mod_rewrite wasn't a no-op for rewrite in
> htaccess context, since these use the RUN_ALL fixups hook to do its
> magic, but maybe you're catching a break there?
>

It's quite possible that the previous hack where we use the node
"mod_rewrite_rewritten" would break if mod_rewrite.c:hook_uri2file's
functional component could get called by mod_rewrite.c:hook_fixup, but I
haven't analyzed the module deeply enough to understand it at that level.

But I think the present hack, where we don't turn off mod_rewrite but just
ignore its output via our own request->note will be more robust.  At least I
hope it will.

In my testing 2 weeks ago I had trouble invoking mod_rewrite from .htaccess.
 I'll have to try again.

-Josh

Re: Overriding mod_rewrite from another module

Posted by Eric Covener <co...@gmail.com>.
>> The access checking on mod_pagespeed resources is
>> redundant, because the resource will either be served from cache (in which
>> case it had to be authenticated to get into the cache in the first place) or
>> will be decoded and the original resource(s) fetched from the same server
>> with full authentication.

Re: suppressing mod_authz_host: This doesn't sound like it guards
against a user that meets the AAA conditions causing the resource to
be cached and served to users who would not have met the AAA
restrictions.  Maybe you are missing a map_to_storage callback to tell
the core that this thing will really, really not be served from the
filesystem.

Re: suppressing rewrite.  Your comments in the src imply that rewrite
is doing some of what you're also suppressing in
server/core.c:ap_core_translate_name().  Also, it's odd that your
scheme for suppressing mod_rewrite wasn't a no-op for rewrite in
htaccess context, since these use the RUN_ALL fixups hook to do its
magic, but maybe you're catching a break there?

Re: Overriding mod_rewrite from another module

Posted by Ben Noordhuis <in...@bnoordhuis.nl>.
On Mon, Jan 3, 2011 at 23:19, Joshua Marantz <jm...@google.com> wrote:
> My goal is not to remove authentication from the server; only from messing
> with my module's rewritten resource.  The above statement is just observing
> that, while it's possible to shunt off mod_rewrite by returning OK from an
> upstream handler, the same is not true of mod_authz_host because it's
> invoked with a different magic macro.

My bad, I parsed your post as 'mod_authz_host is a core module and
cannot be removed' which is obviously false but not what you meant.

Yes, all auth_checker hooks are run. You can't prevent it but you can
catch the 403 on the rebound and complain loudly in the logs.
Actually, that's a lie. You can prevent it and that might also answer
this next bit...

> There may exist some buffer in Apache that's 8k.  But I have traced through
> failing requests earlier that were more like 256 bytes.  This was reported
> as mod_pagespeed Issue
> 9<http://code.google.com/p/modpagespeed/issues/detail?id=9> and
> resolved by limiting the number of css files that could be combined together
> so that we did not exceed the pathname limitations.  I'm pretty sure it was
> due to some built-in filter or core element in httpd trying to map the URL
> to a filename (which is not necessary as far as mod_pagespeed is concerned)
> and bumping into an OS path limitation (showing up as 403 Forbidden).

This might be the doing of core_map_to_storage(). Never run into it
myself (with URLs up to 4K, anyway) but there you go.

Okay, here is a dirty secret: if you hook map_to_storage and return
DONE, you bypass Apache's authentication stack - and nearly all other
hooks too. Probably an exceedingly bad idea.

You can however use it to prevent core_map_to_storage() from running.
Just return OK and you're set.

> I'm still interested in your opinion on my solution where I (inspired by
> your hack) save the original URL in request->notes and then use *that* in my
> resource handler in lieu of request->unparsed_uri.  This change is now
> committed to svn trunk (but not released in a formal patch) as
> http://code.google.com/p/modpagespeed/source/detail?r=348 .

Sounds fine, that's the kind of stuff request notes are for.

Re: Overriding mod_rewrite from another module

Posted by Joshua Marantz <jm...@google.com>.
On Mon, Jan 3, 2011 at 4:50 PM, Ben Noordhuis <in...@bnoordhuis.nl> wrote:

>  > This means that returning OK from my handler does not prevent
> > mod_authz_host's handler from being called.
>
> You're mistaken, Joshua. The access_checker hook by default is empty.
> mod_authz_host is a module and it can be disabled (if you're on a
> Debian/Ubuntu system, run `a2dismod authz_host` and reload Apache).
>

My perspective is that my team has implemented an Apache module that was
launched on Nov 3 2010.  Since its launch, we've encountered a variety of
compatibility reports with other modules, notably mod_rewrite.

My goal is not to remove authentication from the server; only from messing
with my module's rewritten resource.  The above statement is just observing
that, while it's possible to shunt off mod_rewrite by returning OK from an
upstream handler, the same is not true of mod_authz_host because it's
invoked with a different magic macro.

With respect to the URL length, I'm fairly sure it's nearly 8K (grep
> for HUGE_STRING_LEN in core_filters.c).
>

There may exist some buffer in Apache that's 8k.  But I have traced through
failing requests earlier that were more like 256 bytes.  This was reported
as mod_pagespeed Issue
9<http://code.google.com/p/modpagespeed/issues/detail?id=9> and
resolved by limiting the number of css files that could be combined together
so that we did not exceed the pathname limitations.  I'm pretty sure it was
due to some built-in filter or core element in httpd trying to map the URL
to a filename (which is not necessary as far as mod_pagespeed is concerned)
and bumping into an OS path limitation (showing up as 403 Forbidden).

I confess I'm not entirely sure what you are trying to accomplish.
> You're serving up custom content and you're afraid mod_rewrite is
> going to munch the URL? Or is it more involved than that?
>

That's exactly right.  The simplest example is mod_pagespeed can infinitely
extend the cache lifetime of a js file, without compromising the site
owner's ability to propagate changes quickly, by putting an md5-hash of the
css content into the URL.

old: <script src="scripts/hacks.js"></script>
new: <script src="scripts/hacks.js*.pagespeed.ce.HASH.js*"></script>

If some mod_rewrite rule munges "scripts/hacks.js.ce.pagespeed.HASH.js",
then mod_pagespeed will fail to serve it.

The issue is most simply stated in a Stack Overflow article:
http://stackoverflow.com/questions/4099659/mod-rewrite-mod-pagespeed-rewritecond

In this case, the user had hand-entered a mod_rewrite rule that broke
mod_pagespeed so it made sense for him to fix it there.  However, we have
heard reports of other cases where a user installs some content-generation
software that generate mod_rewrite rules that break mod_pagespeed.  Such
users may not even know what mod_rewrite is, so they can't easily work
around the broken rules.  This issue is reported as mod_pagespeed
Issue 63<http://code.google.com/p/modpagespeed/issues/detail?id=63>
.

Hope this clears things up.

I'm still interested in your opinion on my solution where I (inspired by
your hack) save the original URL in request->notes and then use *that* in my
resource handler in lieu of request->unparsed_uri.  This change is now
committed to svn trunk (but not released in a formal patch) as
http://code.google.com/p/modpagespeed/source/detail?r=348 .

-Josh

Re: Overriding mod_rewrite from another module

Posted by Ben Noordhuis <in...@bnoordhuis.nl>.
On Mon, Jan 3, 2011 at 22:07, Joshua Marantz <jm...@google.com> wrote:
> I answered my own question by implementing it and failing.  You can't bypass
> mod_authz_host because it gets invoked via the magic macro:
>
>  AP_IMPLEMENT_HOOK_RUN_ALL(int,access_checker,
>                          (request_rec *r), (r), OK, DECLINED)
>
> This means that returning OK from my handler does not prevent
> mod_authz_host's handler from being called.

You're mistaken, Joshua. The access_checker hook by default is empty.
mod_authz_host is a module and it can be disabled (if you're on a
Debian/Ubuntu system, run `a2dismod authz_host` and reload Apache).

With respect to the URL length, I'm fairly sure it's nearly 8K (grep
for HUGE_STRING_LEN in core_filters.c).

> I still add a translate_name hook to run prior to mod_rewrite, but I don't
> try to prevent mod_rewrite from corrupting my URL. Instead I just squirrel
> away the uncorrupted URL in my own entry in request->notes so that I can use
> that rather than request->unparsed_uri downstream when processing the
> request.  This seems to work well.  The only drawback is if the site admin
> adds a mod_rewrite rule that mutates mod_pagespeed's resource name into
> something that does not pass authentication, then mod_authz_host will reject
> the request before I can process it.  This seems like a reasonable tradeoff
> as that configuration would likely be borked in other ways besides
> mod_pagespeed resources.

I confess I'm not entirely sure what you are trying to accomplish.
You're serving up custom content and you're afraid mod_rewrite is
going to munch the URL? Or is it more involved than that?

Re: Overriding mod_rewrite from another module

Posted by Joshua Marantz <jm...@google.com>.
I answered my own question by implementing it and failing.  You can't bypass
mod_authz_host because it gets invoked via the magic macro:

  AP_IMPLEMENT_HOOK_RUN_ALL(int,access_checker,
                          (request_rec *r), (r), OK, DECLINED)

This means that returning OK from my handler does not prevent
mod_authz_host's handler from being called.

I came up with a simpler idea that does not require depending on
string-literals in mod_rewrite.c.

I still add a translate_name hook to run prior to mod_rewrite, but I don't
try to prevent mod_rewrite from corrupting my URL. Instead I just squirrel
away the uncorrupted URL in my own entry in request->notes so that I can use
that rather than request->unparsed_uri downstream when processing the
request.  This seems to work well.  The only drawback is if the site admin
adds a mod_rewrite rule that mutates mod_pagespeed's resource name into
something that does not pass authentication, then mod_authz_host will reject
the request before I can process it.  This seems like a reasonable tradeoff
as that configuration would likely be borked in other ways besides
mod_pagespeed resources.

Commentary would be welcome.

-Josh

On Mon, Jan 3, 2011 at 1:10 PM, Joshua Marantz <jm...@google.com> wrote:

> I have implemented Ben's hack in mod_pagespeed in
> http://code.google.com/p/modpagespeed/source/detail?r=345 .  It works
> great.  But I am concerned that a subtle change to mod_rewrite.c will break
> this hack silently.  We would catch it in our regression tests, but the
> large number of Apache users that have downloaded mod_pagespeed do not
> generally run our regression tests.
>
> I have another idea for a solution that I'd like to see opinions on.
> Looking at Nick Kew's book, it seems like I could set request->filename to
> whatever I wanted, return OK, but then also shunt off access_checker for my
> rewritten resources.  The access checking on mod_pagespeed resources is
> redundant, because the resource will either be served from cache (in which
> case it had to be authenticated to get into the cache in the first place) or
> will be decoded and the original resource(s) fetched from the same server
> with full authentication.
>
> I'd appreciate any comments on this approach.
>
> -Josh
>
>
> On Mon, Jan 3, 2011 at 11:40 AM, Joshua Marantz <jm...@google.com>wrote:
>
>> OK I tried to find a more robust alternative but could not.  I was
>> thinking I could duplicate whatever mod_rewrite was doing to set the request
>> filename that appears to be complex and probably no less brittle.
>>
>> I have another query on this.  In reality we do *not* want our rewritten
>> resources to be associated with a filename at all.  Apache should never look
>> for such things in the file system under ../htdocs -- they will not be
>> there.  We also do not need it to validate or authenticate on these static
>> resources.
>>
>> In particular, we have found that there is some path through Apache that
>> imposes what looks like a file-system-based limitation on URL segments (e.g.
>> around 256 bytes).  This limitation is inconvenient and, as far as I can
>> tell, superfluous.  URL limits imposed by proxies and browsers are more like
>> 2k bytes, which would allow us to encode more metadata in URLs (e.g.
>> sprites).  Is there some magic setting we could put into the request
>> structure to tell Apache not to interpret the request as being mapped from a
>> file, but just to pass it through to our handler?
>>
>> Thanks!
>> -Josh
>>
>> On Sat, Jan 1, 2011 at 6:24 AM, Ben Noordhuis <in...@bnoordhuis.nl> wrote:
>>
>>> On Sat, Jan 1, 2011 at 00:16, Joshua Marantz <jm...@google.com>
>>> wrote:
>>> > Thanks for the quick response and the promising idea for a hack.
>>>  Looking at
>>> > mod_rewrite.c this does indeed look a lot more surgical, if, perhaps,
>>> > fragile, as mod_rewrite.c doesn't expose that string-constant in any
>>> formal
>>> > interface (even as a #define in a .h).  Nevertheless the solution is
>>> > easy-to-implement and easy-to-test, so...thanks!
>>>
>>> You're welcome, Joshua. :)
>>>
>>> You could try persuading a core committer to add this as a
>>> (semi-)official extension. Nick Kew reads this list, Paul Querna often
>>> idles in #node.js at freenode.net.
>>>
>>> > I'm also still wondering if there's a good source of official
>>> documentation
>>> > for the detailed semantics of interfaces like ap_hook_translate_name.
>>> >  Neither a Google Search, a  stackoverflow.com search, nor the Apache
>>> > Modules<
>>> http://www.amazon.com/Apache-Modules-Book-Application-Development/dp/0132409674/ref=sr_1_1?ie=UTF8&qid=1293837117&sr=8-1
>>> >book
>>> > offer much detail.
>>> > code.google.com fares a little better but just points to 4 existing
>>> usages.
>>>
>>> This question comes up often. In my experience the online
>>> documentation is almost always outdated, incomplete or outright wrong.
>>> I don't bother looking things up, I go straight to the source.
>>>
>>> It's a kind of job security, I suppose. There are only a handful of
>>> people that truly and deeply understand Apache. We can ask any hourly
>>> rate we want!
>>>
>>
>>
>

Re: Overriding mod_rewrite from another module

Posted by Joshua Marantz <jm...@google.com>.
I have implemented Ben's hack in mod_pagespeed in
http://code.google.com/p/modpagespeed/source/detail?r=345 .  It works great.
 But I am concerned that a subtle change to mod_rewrite.c will break this
hack silently.  We would catch it in our regression tests, but the large
number of Apache users that have downloaded mod_pagespeed do not generally
run our regression tests.

I have another idea for a solution that I'd like to see opinions on.
Looking at Nick Kew's book, it seems like I could set request->filename to
whatever I wanted, return OK, but then also shunt off access_checker for my
rewritten resources.  The access checking on mod_pagespeed resources is
redundant, because the resource will either be served from cache (in which
case it had to be authenticated to get into the cache in the first place) or
will be decoded and the original resource(s) fetched from the same server
with full authentication.

I'd appreciate any comments on this approach.

-Josh

On Mon, Jan 3, 2011 at 11:40 AM, Joshua Marantz <jm...@google.com> wrote:

> OK I tried to find a more robust alternative but could not.  I was thinking
> I could duplicate whatever mod_rewrite was doing to set the request filename
> that appears to be complex and probably no less brittle.
>
> I have another query on this.  In reality we do *not* want our rewritten
> resources to be associated with a filename at all.  Apache should never look
> for such things in the file system under ../htdocs -- they will not be
> there.  We also do not need it to validate or authenticate on these static
> resources.
>
> In particular, we have found that there is some path through Apache that
> imposes what looks like a file-system-based limitation on URL segments (e.g.
> around 256 bytes).  This limitation is inconvenient and, as far as I can
> tell, superfluous.  URL limits imposed by proxies and browsers are more like
> 2k bytes, which would allow us to encode more metadata in URLs (e.g.
> sprites).  Is there some magic setting we could put into the request
> structure to tell Apache not to interpret the request as being mapped from a
> file, but just to pass it through to our handler?
>
> Thanks!
> -Josh
>
> On Sat, Jan 1, 2011 at 6:24 AM, Ben Noordhuis <in...@bnoordhuis.nl> wrote:
>
>> On Sat, Jan 1, 2011 at 00:16, Joshua Marantz <jm...@google.com> wrote:
>> > Thanks for the quick response and the promising idea for a hack.
>>  Looking at
>> > mod_rewrite.c this does indeed look a lot more surgical, if, perhaps,
>> > fragile, as mod_rewrite.c doesn't expose that string-constant in any
>> formal
>> > interface (even as a #define in a .h).  Nevertheless the solution is
>> > easy-to-implement and easy-to-test, so...thanks!
>>
>> You're welcome, Joshua. :)
>>
>> You could try persuading a core committer to add this as a
>> (semi-)official extension. Nick Kew reads this list, Paul Querna often
>> idles in #node.js at freenode.net.
>>
>> > I'm also still wondering if there's a good source of official
>> documentation
>> > for the detailed semantics of interfaces like ap_hook_translate_name.
>> >  Neither a Google Search, a  stackoverflow.com search, nor the Apache
>> > Modules<
>> http://www.amazon.com/Apache-Modules-Book-Application-Development/dp/0132409674/ref=sr_1_1?ie=UTF8&qid=1293837117&sr=8-1
>> >book
>> > offer much detail.
>> > code.google.com fares a little better but just points to 4 existing
>> usages.
>>
>> This question comes up often. In my experience the online
>> documentation is almost always outdated, incomplete or outright wrong.
>> I don't bother looking things up, I go straight to the source.
>>
>> It's a kind of job security, I suppose. There are only a handful of
>> people that truly and deeply understand Apache. We can ask any hourly
>> rate we want!
>>
>
>

Re: Overriding mod_rewrite from another module

Posted by Joshua Marantz <jm...@google.com>.
OK I tried to find a more robust alternative but could not.  I was thinking
I could duplicate whatever mod_rewrite was doing to set the request filename
that appears to be complex and probably no less brittle.

I have another query on this.  In reality we do *not* want our rewritten
resources to be associated with a filename at all.  Apache should never look
for such things in the file system under ../htdocs -- they will not be
there.  We also do not need it to validate or authenticate on these static
resources.

In particular, we have found that there is some path through Apache that
imposes what looks like a file-system-based limitation on URL segments (e.g.
around 256 bytes).  This limitation is inconvenient and, as far as I can
tell, superfluous.  URL limits imposed by proxies and browsers are more like
2k bytes, which would allow us to encode more metadata in URLs (e.g.
sprites).  Is there some magic setting we could put into the request
structure to tell Apache not to interpret the request as being mapped from a
file, but just to pass it through to our handler?

Thanks!
-Josh

On Sat, Jan 1, 2011 at 6:24 AM, Ben Noordhuis <in...@bnoordhuis.nl> wrote:

> On Sat, Jan 1, 2011 at 00:16, Joshua Marantz <jm...@google.com> wrote:
> > Thanks for the quick response and the promising idea for a hack.  Looking
> at
> > mod_rewrite.c this does indeed look a lot more surgical, if, perhaps,
> > fragile, as mod_rewrite.c doesn't expose that string-constant in any
> formal
> > interface (even as a #define in a .h).  Nevertheless the solution is
> > easy-to-implement and easy-to-test, so...thanks!
>
> You're welcome, Joshua. :)
>
> You could try persuading a core committer to add this as a
> (semi-)official extension. Nick Kew reads this list, Paul Querna often
> idles in #node.js at freenode.net.
>
> > I'm also still wondering if there's a good source of official
> documentation
> > for the detailed semantics of interfaces like ap_hook_translate_name.
> >  Neither a Google Search, a  stackoverflow.com search, nor the Apache
> > Modules<
> http://www.amazon.com/Apache-Modules-Book-Application-Development/dp/0132409674/ref=sr_1_1?ie=UTF8&qid=1293837117&sr=8-1
> >book
> > offer much detail.
> > code.google.com fares a little better but just points to 4 existing
> usages.
>
> This question comes up often. In my experience the online
> documentation is almost always outdated, incomplete or outright wrong.
> I don't bother looking things up, I go straight to the source.
>
> It's a kind of job security, I suppose. There are only a handful of
> people that truly and deeply understand Apache. We can ask any hourly
> rate we want!
>

Re: Overriding mod_rewrite from another module

Posted by Ben Noordhuis <in...@bnoordhuis.nl>.
On Sat, Jan 1, 2011 at 00:16, Joshua Marantz <jm...@google.com> wrote:
> Thanks for the quick response and the promising idea for a hack.  Looking at
> mod_rewrite.c this does indeed look a lot more surgical, if, perhaps,
> fragile, as mod_rewrite.c doesn't expose that string-constant in any formal
> interface (even as a #define in a .h).  Nevertheless the solution is
> easy-to-implement and easy-to-test, so...thanks!

You're welcome, Joshua. :)

You could try persuading a core committer to add this as a
(semi-)official extension. Nick Kew reads this list, Paul Querna often
idles in #node.js at freenode.net.

> I'm also still wondering if there's a good source of official documentation
> for the detailed semantics of interfaces like ap_hook_translate_name.
>  Neither a Google Search, a  stackoverflow.com search, nor the Apache
> Modules<http://www.amazon.com/Apache-Modules-Book-Application-Development/dp/0132409674/ref=sr_1_1?ie=UTF8&qid=1293837117&sr=8-1>book
> offer much detail.
> code.google.com fares a little better but just points to 4 existing usages.

This question comes up often. In my experience the online
documentation is almost always outdated, incomplete or outright wrong.
I don't bother looking things up, I go straight to the source.

It's a kind of job security, I suppose. There are only a handful of
people that truly and deeply understand Apache. We can ask any hourly
rate we want!