You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modules-dev@httpd.apache.org by rm...@tuxteam.de on 2011/05/04 11:34:46 UTC

using mod_proxy for subrequests

Hello list,

as the subject line says, I'm trying to run a subrequest through
mod_proxy and need to post-process the subrequests response data.
Looking at older posts on this list it seems as if the only way to
accomplish this is:

(1)  create a subrequest with ap_sub_req_lookup_uri(...)

(2) modify parts of the created subrequest (filename, handler, proxyreq
etc.)

(3) Install a filter that captures the response data

(4) run that subrequest

Now, (1) seems unelegant since it does need a valid URI which has
nothing to do with the final proxy request. Hence the value of the
subrequest's status has no meaning -- but isn't this exactly the purpose
of subrequests? To quote Nick Kew '....to run a fast partial request, to
gather information: what would happen if we ran thos request?'
Is there really no way to create a subrequest directly aiming at
mod_proxy.
It would be utterly nice to be able to access a (proxied) subrequests
metadata (content-type, etag etc.) before running the filter.

Any ideas? Mabe a nice API extension for Apache or mod_proxy?

TIA Ralf Mattes


Re: using mod_proxy for subrequests

Posted by rm...@tuxteam.de.
On Wed, May 04, 2011 at 02:00:33PM +0200, Sorin Manolache wrote:
> 
> I didn't mean that I'm really clueless. I trawled through the apache
> sources quite extensively and I decided to do it. And there's a
> commercial/financial stake in my case too.
> 
> If you look at mod_proxy's sources, there're 4 places in which r->main
> is checked, two in ap_proxy_http_request, one in
> ap_proxy_backend_broke and one in mod_proxy_ajp.c
> 
> In the first place, If-Match, If-None-Match, If-Range,
> If-Modified-Since, If-Unmodified-Since are not passed through in the
> subrequest.
> 
> In the second place, for subrequests:
> 
> *) the connection is marked to be closed after the request
> *) Content-Length and Transfer-Encoding are removed
> *) the main request body, if any, is not forwarded to the subrequest's backend.
> 
> So if you set subreq->main to NULL you won't have the effects listed above.
> 
> In ap_proxy_backend_broke, if r is a subrequest and the backend broke,
> the main request response is marked as non-cacheable.
> 
> I didn't look into mod_proxy_ajp.c.

Yes, but what makes me feel quite uneasy is the fact that both your
solution as well as mine rely on "internal" knowledge and assumptions
build on that. From a programmers point of view this is o.k. in an open
source implementation but this creates administrative nightmares ...
What happens iff the programmers of mod_proxy decide to change their
internal processing? After all, line  426 ff. in mod_proxy.c aren't part
of a published API. So, maybe years after installing your fine module,
an inocent software update breaks it ... 8-/

I guess an exported mod_proxy function to fetch metadata would be a nice 
thing to have.

 Cheers, RalfD


> Sorin

Re: using mod_proxy for subrequests

Posted by Sorin Manolache <so...@gmail.com>.
On Wed, May 4, 2011 at 12:39,  <rm...@tuxteam.de> wrote:
> On Wed, May 04, 2011 at 11:36:35AM +0200, Sorin Manolache wrote:
>> On Wed, May 4, 2011 at 11:34,  <rm...@tuxteam.de> wrote:
>> > Hello list,
>> >
>> > as the subject line says, I'm trying to run a subrequest through
>> > mod_proxy and need to post-process the subrequests response data.
>> > Looking at older posts on this list it seems as if the only way to
>> > accomplish this is:
>> >
>> > (1)  create a subrequest with ap_sub_req_lookup_uri(...)
>> >
>> > (2) modify parts of the created subrequest (filename, handler, proxyreq
>> > etc.)
>> >
>> > (3) Install a filter that captures the response data
>> >
>> > (4) run that subrequest
>>
>> Play it in conjunction to RewriteRules:
>>
>> RewriteCond     %{IS_SUBREQ}            true
>> RewriteRule     ^/some_name$
>> http://backend.host.net/path?query_string [P]
>
> Hmm, I don't seem to get what's you do different compared with my
> approach:
>
>
>> request_rec *subr = ap_sub_req_method_uri("GET", "/some_name", r, NULL);
>
> Same as my (1)
> Here, "/some_name" is still an arbitrary URI and _not_ the proxy URI I
> want to query. BTW, this does clutter the URL namespace, a big no-no in
> my usecase ...
>
>> ap_add_output_filter(post_processing_filter_name, filter_context,
>> subr, subr->connection);
>
> Same as my (3)
>
>> int status = ap_run_subreq(subr);
>> int http_status = subr->status;
>> // optional: subr->main = r;
>> if (ap_is_HTTP_ERROR(status) || ap_is_HTTP_ERROR(http_status))
>>    // some error handling
>> }
>
> And you still need to _run_ the subrequest to get at the restponse
> status etc.
>
>>
>> There are some subtleties here:
>>
>> 1. The rewrite rules are ran in the translate_name hook. If you want
>> to use %{ENV:request_note_name} in your rewrite rule, you have to copy
>> them somehow (for example in another translate_name callback that is
>> run before the mod_rewrite callbacks) from the main request notes to
>> the subrequest notes.
>>
>> 2. Subrequests are not kept alive. In order to keep them alive, you
>> could try to hook APR_OPTIONAL_HOOK(proxy, fixups, &proxy_fixups,
>> NULL, NULL, APR_HOOK_MIDDLE). In the proxy_fixups callback, you can
>> set subr->main = NULL; Then, after ap_run_subreq, you can re-set
>> subr->main = r (the "optional" line in the code example above). i
>
> But that means loosing all request context in the subrequest! One of
> tthe main reasons to use mod_proxy instead of
> some-arbitrary-webclient-lib is the fact that mod_proxy passes all
> incomming header to the backend server. A must in my case.

The request_rec structure of the subrequest is already correctly set
up when I cut its link to the main request.

>> I'm
>> using this trick but I do not know all its consequences.
>
> Hmmm - bold. The costs of server downtime might easily exeed my
> monthly income in this case :-)

I didn't mean that I'm really clueless. I trawled through the apache
sources quite extensively and I decided to do it. And there's a
commercial/financial stake in my case too.

If you look at mod_proxy's sources, there're 4 places in which r->main
is checked, two in ap_proxy_http_request, one in
ap_proxy_backend_broke and one in mod_proxy_ajp.c

In the first place, If-Match, If-None-Match, If-Range,
If-Modified-Since, If-Unmodified-Since are not passed through in the
subrequest.

In the second place, for subrequests:

*) the connection is marked to be closed after the request
*) Content-Length and Transfer-Encoding are removed
*) the main request body, if any, is not forwarded to the subrequest's backend.

So if you set subreq->main to NULL you won't have the effects listed above.

In ap_proxy_backend_broke, if r is a subrequest and the backend broke,
the main request response is marked as non-cacheable.

I didn't look into mod_proxy_ajp.c.

Sorin

Re: using mod_proxy for subrequests

Posted by rm...@tuxteam.de.
On Wed, May 04, 2011 at 11:36:35AM +0200, Sorin Manolache wrote:
> On Wed, May 4, 2011 at 11:34,  <rm...@tuxteam.de> wrote:
> > Hello list,
> >
> > as the subject line says, I'm trying to run a subrequest through
> > mod_proxy and need to post-process the subrequests response data.
> > Looking at older posts on this list it seems as if the only way to
> > accomplish this is:
> >
> > (1)  create a subrequest with ap_sub_req_lookup_uri(...)
> >
> > (2) modify parts of the created subrequest (filename, handler, proxyreq
> > etc.)
> >
> > (3) Install a filter that captures the response data
> >
> > (4) run that subrequest
> 
> Play it in conjunction to RewriteRules:
> 
> RewriteCond     %{IS_SUBREQ}            true
> RewriteRule     ^/some_name$
> http://backend.host.net/path?query_string [P]

Hmm, I don't seem to get what's you do different compared with my
approach:


> request_rec *subr = ap_sub_req_method_uri("GET", "/some_name", r, NULL);

Same as my (1)
Here, "/some_name" is still an arbitrary URI and _not_ the proxy URI I
want to query. BTW, this does clutter the URL namespace, a big no-no in
my usecase ...

> ap_add_output_filter(post_processing_filter_name, filter_context,
> subr, subr->connection);

Same as my (3)

> int status = ap_run_subreq(subr);
> int http_status = subr->status;
> // optional: subr->main = r;
> if (ap_is_HTTP_ERROR(status) || ap_is_HTTP_ERROR(http_status))
>    // some error handling
> }

And you still need to _run_ the subrequest to get at the restponse
status etc. 

> 
> There are some subtleties here:
> 
> 1. The rewrite rules are ran in the translate_name hook. If you want
> to use %{ENV:request_note_name} in your rewrite rule, you have to copy
> them somehow (for example in another translate_name callback that is
> run before the mod_rewrite callbacks) from the main request notes to
> the subrequest notes.
> 
> 2. Subrequests are not kept alive. In order to keep them alive, you
> could try to hook APR_OPTIONAL_HOOK(proxy, fixups, &proxy_fixups,
> NULL, NULL, APR_HOOK_MIDDLE). In the proxy_fixups callback, you can
> set subr->main = NULL; Then, after ap_run_subreq, you can re-set
> subr->main = r (the "optional" line in the code example above). i

But that means loosing all request context in the subrequest! One of
tthe main reasons to use mod_proxy instead of
some-arbitrary-webclient-lib is the fact that mod_proxy passes all
incomming header to the backend server. A must in my case.

> I'm
> using this trick but I do not know all its consequences.

Hmmm - bold. The costs of server downtime might easily exeed my
monthly income in this case :-)


cheers, RalfD


> Sorin
> 
> 
> >
> > Now, (1) seems unelegant since it does need a valid URI which has
> > nothing to do with the final proxy request. Hence the value of the
> > subrequest's status has no meaning -- but isn't this exactly the purpose
> > of subrequests? To quote Nick Kew '....to run a fast partial request, to
> > gather information: what would happen if we ran thos request?'
> > Is there really no way to create a subrequest directly aiming at
> > mod_proxy.
> > It would be utterly nice to be able to access a (proxied) subrequests
> > metadata (content-type, etag etc.) before running the filter.
> >
> > Any ideas? Mabe a nice API extension for Apache or mod_proxy?
> >
> > TIA Ralf Mattes
> >
> >

Re: using mod_proxy for subrequests

Posted by Sorin Manolache <so...@gmail.com>.
On Wed, May 4, 2011 at 11:34,  <rm...@tuxteam.de> wrote:
> Hello list,
>
> as the subject line says, I'm trying to run a subrequest through
> mod_proxy and need to post-process the subrequests response data.
> Looking at older posts on this list it seems as if the only way to
> accomplish this is:
>
> (1)  create a subrequest with ap_sub_req_lookup_uri(...)
>
> (2) modify parts of the created subrequest (filename, handler, proxyreq
> etc.)
>
> (3) Install a filter that captures the response data
>
> (4) run that subrequest

Play it in conjunction to RewriteRules:

RewriteCond     %{IS_SUBREQ}            true
RewriteRule     ^/some_name$
http://backend.host.net/path?query_string [P]

request_rec *subr = ap_sub_req_method_uri("GET", "/some_name", r, NULL);
ap_add_output_filter(post_processing_filter_name, filter_context,
subr, subr->connection);
int status = ap_run_subreq(subr);
int http_status = subr->status;
// optional: subr->main = r;
if (ap_is_HTTP_ERROR(status) || ap_is_HTTP_ERROR(http_status))
   // some error handling
}

There are some subtleties here:

1. The rewrite rules are ran in the translate_name hook. If you want
to use %{ENV:request_note_name} in your rewrite rule, you have to copy
them somehow (for example in another translate_name callback that is
run before the mod_rewrite callbacks) from the main request notes to
the subrequest notes.

2. Subrequests are not kept alive. In order to keep them alive, you
could try to hook APR_OPTIONAL_HOOK(proxy, fixups, &proxy_fixups,
NULL, NULL, APR_HOOK_MIDDLE). In the proxy_fixups callback, you can
set subr->main = NULL; Then, after ap_run_subreq, you can re-set
subr->main = r (the "optional" line in the code example above). I'm
using this trick but I do not know all its consequences.

Sorin


>
> Now, (1) seems unelegant since it does need a valid URI which has
> nothing to do with the final proxy request. Hence the value of the
> subrequest's status has no meaning -- but isn't this exactly the purpose
> of subrequests? To quote Nick Kew '....to run a fast partial request, to
> gather information: what would happen if we ran thos request?'
> Is there really no way to create a subrequest directly aiming at
> mod_proxy.
> It would be utterly nice to be able to access a (proxied) subrequests
> metadata (content-type, etag etc.) before running the filter.
>
> Any ideas? Mabe a nice API extension for Apache or mod_proxy?
>
> TIA Ralf Mattes
>
>