You are viewing a plain text version of this content. The canonical link for it is here.
Posted to apache-bugdb@apache.org by Richard Goerwitz <Ri...@Brown.EDU> on 1997/12/30 01:12:29 UTC

mod_proxy/1606: ProxyPass ain't useful; but it could be if a ProxyFilter directive were added

>Number:         1606
>Category:       mod_proxy
>Synopsis:       ProxyPass ain't useful; but it could be if a ProxyFilter directive were added
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    apache
>State:          open
>Class:          change-request
>Submitter-Id:   apache
>Arrival-Date:   Mon Dec 29 16:20:00 PST 1997
>Last-Modified:
>Originator:     Richard_Goerwitz@Brown.EDU
>Organization:
apache
>Release:        1.2+
>Environment:
all
>Description:
ProxyPass does not deal well with links like HREF="/directory/file.html", where
/directory does not exist on the proxy server.  This has been well documented,
both in the bug tracking system, and now in the Apache docs.

What is really needed is a ProxyFilter command that allows site administrator to
run pages through an external filter.  The filter should receive a full set of
environment variables on the one hand, and the data from the proxied server on
the other (via stdin).

I have rewritten the proxy module locally to do this, and it works fine.  So far
I allow ProxyPass only in the virtual host configs.  I should have made it usable
in per-directory configs as well.

I also strip out content-length headers, so the filter can modify the data stream
in arbitrary ways (only works for 1.0 proxies, of course - which 1.2.4 is).

Basically I just treat the filter the way the CGI module treats CGI scripts, with
the exception that 1) I fork twice, and feed the data from the first child to the
second via stdin (the second child then execs the filter), and 2) the parent process
gets the first child's stdout fd, which it then uses to send filtered data back
to the client.  So ProxyFilter doesn't actually do any responding.

I fork twice to avoid deadlock (first child writes from proxied server to filter;
second child execs filter; parent reads filter's stdout - the filter must write
to stdout).

What I really should have done (if 1.2.4 were a 1.1 proxy) was to hold off on
outputting headers until I could put together a new (filtered) content-length,
then output that new content-length along with the other headers, and finally
the rest of the data.

What a ProxyFilter directive does is allow people to rewrite incoming data
arbitrarily, overcoming whatever perceived shortcomings there are in ProxyPass.
It would also allow site administrators to filter data the way handlers filter
data on their own site.

We're finding ProxyFilter very useful here at Brown.  Does this seem something
more broadly useful?
>How-To-Repeat:

>Fix:

>Audit-Trail:
>Unformatted:
[In order for any reply to be added to the PR database, ]
[you need to include <ap...@Apache.Org> in the Cc line ]
[and leave the subject line UNCHANGED.  This is not done]
[automatically because of the potential for mail loops. ]