You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Boyle Owen <Ow...@swx.com> on 2002/06/03 15:26:48 UTC

RE: Reverse Proxy munges chars

>Hi All,
>I'm trying to use Apache as a reverse proxy, allowing users to view a
>site through the proxy:
>ProxyPass /
>http://remote_server.otherdomain.com:7777/servlet?config=true
>
>It almost works.  When apache requests the new URL however, it is
>replacing the ? with %3F, which is failing on
>remote_server.otherdomain.com .
>
>How do I make apache pass the "?" as-is, instead of converting it to
>"%3F" before requesting it from remote_server.otherdomain.com?

I'm not sure you can do this. Characters after the "?" are not part of the URL. Since there is nothing conditional on the LHS of the mapping, why not make "config=true" the default in your servlet then you can use the simpler rule:

	ProxyPass / http://remote_server.otherdomain.com:7777/servlet

Rgds,

Owen Boyle

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: Reverse Proxy munges chars

Posted by Josh Wyatt <Jo...@hcssystems.com>.
Boyle Owen wrote:
> 
> >Hi All,
> >I'm trying to use Apache as a reverse proxy, allowing users to view a
> >site through the proxy:
> >ProxyPass /
> >http://remote_server.otherdomain.com:7777/servlet?config=true
> >
> >It almost works.  When apache requests the new URL however, it is
> >replacing the ? with %3F, which is failing on
> >remote_server.otherdomain.com .
> >
> >How do I make apache pass the "?" as-is, instead of converting it to
> >"%3F" before requesting it from remote_server.otherdomain.com?
> 
> I'm not sure you can do this. Characters after the "?" are not part of the URL. Since there is nothing conditional on the LHS of the mapping, why not make "config=true" the default in your servlet then you can use the simpler rule:
> 
>         ProxyPass / http://remote_server.otherdomain.com:7777/servlet
> 
> Rgds,
> 
> Owen Boyle

Hi Owen,

Thanks for the reply.  I don't believe your claim about the query string
not being part of the URL is true.

Check out RFC 1738, "Uniform Resource Locators" (
http://www.w3.org/Addressing/rfc1738.txt ).  Specifically, section 3.3
[1].

I don't understand why Apache is munging the ? into %3F.  This seems
mostly broken.  In fact, according to section 2.2 of the same RFC,
Apache is doing exactly the *wrong* thing[2] (note the last sentence).

Thanks,
Josh


[1]:
3.3. HTTP

   The HTTP URL scheme is used to designate Internet resources
   accessible using HTTP (HyperText Transfer Protocol).

   The HTTP protocol is specified elsewhere. This specification only
   describes the syntax of HTTP URLs.

   An HTTP URL takes the form:

      http://<host>:<port>/<path>?<searchpart>

   where <host> and <port> are as described in Section 3.1. If :<port>
   is omitted, the port defaults to 80.  No user name or password is
   allowed.  <path> is an HTTP selector, and <searchpart> is a query
   string. The <path> is optional, as is the <searchpart> and its
   preceding "?". If neither <path> nor <searchpart> is present, the "/"
   may also be omitted.

   Within the <path> and <searchpart> components, "/", ";", "?" are
   reserved.  The "/" character may be used within HTTP to designate a
   hierarchical structure.



[2]:
   Reserved: 

   Many URL schemes reserve certain characters for a special meaning:
   their appearance in the scheme-specific part of the URL has a
   designated semantics. If the character corresponding to an octet is
   reserved in a scheme, the octet must be encoded.  The characters ";",
   "/", "?", ":", "@", "=" and "&" are the characters which may be
   reserved for special meaning within a scheme. No other characters may
   be reserved within a scheme.

   Usually a URL has the same interpretation when an octet is
   represented by a character and when it encoded. However, this is not
   true for reserved characters: encoding a character reserved for a
   particular scheme may change the semantics of a URL.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: Reverse Proxy munges chars

Posted by Josh Wyatt <Jo...@hcssystems.com>.
Joshua Slive wrote:
> 
> On Thu, 6 Jun 2002, Josh Wyatt wrote:
> > With HTTP URI's, the "?" is a reserved character.  When dealing with
> > HTTP URI's, RFC 1738 says "don't encode special characters".
> 
> Actually, it is more like don't encode special characters when they are
> used in their special context.  When they are used outside their special
> context, the MUST be encoded.  ProxyPass does not allow you to pass a
> query string, so it assumes that the ? does not designate a query string
> and should be encoded.

Roger that.  I was not aware that ProxyPass didn't allow query-strings-
since they're definitively part of the URI (again, RFC 1738 section 3.3
I believe) I assumed they were proxyable.  Interestingly, I did try
Squid, which does the Right Thing.

Unfortunately, Squid isn't very good at reverse-proxying, and is really
too much for this application.

> 
> > Actually, the only URI ever requested is the above URI.  The client
> > communicates with only that URI via GETs and POSTs.  The URI never
> > changes- thank the maker.
> 
> Well, that is a very special case, and is not what ProxyPass was designed
> for.  RerwiteRule, on the other hand, can handle it.
> 
> > > ProxyPass is meant for relatively simple transformations.  If you need
> > > something complex, you'll need to use mod_rewrite.  To get you started:
> >
> > Yep, I've already used mod_rewrite, and it works - like a redirect.
> > Only problem is that I've got to have a proxy.  I could easily rewrite
> > the URL to redirect to the actual server, but that bypasses the proxy,
> > breaking all kinds of stuff I'm trying to do (like single-source traffic
> > accounting, for one thing).
> 
> Check the example I gave you again.  It uses the "P" flag to RewriteRule,
> which makes mod_rewrite use mod_proxy to grab the content just like
> ProxyPass.

You're right- I had not tried the P flag, and still haven't.  But I
trust that it will do what you say ;) .  Sorry for my initial oversight.

> Perhaps you have pointed out that the docs for ProxyPass are a little weak
> in this area.  They should point out that
> 
> 1. You can't pass a query string; and

That would certainly help.  They should also explain why-  Reading RFC
1738 it looks like you should, since it's officially part of the URI.

> 2. You can use RewriteRule for more complicated stuff.
> 
> Please submit a bug report against the documentation mentioning that if
> you have a chance.

Will do.  Thank you for all of your information and assistance.  If
you're ever in Raleigh, NC, look me up and we'll go have a few beers.

> 
> Joshua.

Thanks again,
Josh

> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
> For additional commands, e-mail: users-help@httpd.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: Reverse Proxy munges chars

Posted by Joshua Slive <jo...@slive.ca>.
On Thu, 6 Jun 2002, Josh Wyatt wrote:
> With HTTP URI's, the "?" is a reserved character.  When dealing with
> HTTP URI's, RFC 1738 says "don't encode special characters".

Actually, it is more like don't encode special characters when they are
used in their special context.  When they are used outside their special
context, the MUST be encoded.  ProxyPass does not allow you to pass a
query string, so it assumes that the ? does not designate a query string
and should be encoded.

> Actually, the only URI ever requested is the above URI.  The client
> communicates with only that URI via GETs and POSTs.  The URI never
> changes- thank the maker.

Well, that is a very special case, and is not what ProxyPass was designed
for.  RerwiteRule, on the other hand, can handle it.

> > ProxyPass is meant for relatively simple transformations.  If you need
> > something complex, you'll need to use mod_rewrite.  To get you started:
>
> Yep, I've already used mod_rewrite, and it works - like a redirect.
> Only problem is that I've got to have a proxy.  I could easily rewrite
> the URL to redirect to the actual server, but that bypasses the proxy,
> breaking all kinds of stuff I'm trying to do (like single-source traffic
> accounting, for one thing).

Check the example I gave you again.  It uses the "P" flag to RewriteRule,
which makes mod_rewrite use mod_proxy to grab the content just like
ProxyPass.

Perhaps you have pointed out that the docs for ProxyPass are a little weak
in this area.  They should point out that

1. You can't pass a query string; and

2. You can use RewriteRule for more complicated stuff.

Please submit a bug report against the documentation mentioning that if
you have a chance.

Joshua.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: Reverse Proxy munges chars

Posted by Josh Wyatt <Jo...@hcssystems.com>.
Firstly, Joshua, thanks for the reply!

Joshua Slive wrote:
<snip>
> > Specifically, section 2.2 of RFC 1738 says that special characters (such
> > as "?" in HTTP) should never be encoded- and it looks like mod_proxy
> > encodes ? to %3F during proxy.
> 
> I don't see that.  mod_proxy does not encode the query string; it is only
> this particular directive that does, and for very good reasons.

Something is encoding the ?; I suspect it is happening in
ap_proxy_canonenc in src/modules/proxy/proxy_util.c .

With HTTP URI's, the "?" is a reserved character.  When dealing with
HTTP URI's, RFC 1738 says "don't encode special characters".  mod_proxy
is doing it for some reason.  I played around with proxy_util.c a little
trying to get the desired effect, but alas, I'm weak on C, so my
experimentation is a little limited.

> > > >ProxyPass /
> > > >http://remote_server.otherdomain.com:7777/servlet?config=true
> 
> I don't see how that is supposed to work.  What happens when someone
> requests "http://yourserver.example.com/foo?bar"?  Is apache supposed to
> deliver
> "http://remote_server.otherdomain.com:7777/servlet?config=true/foo?bar"?

Actually, the only URI ever requested is the above URI.  The client
communicates with only that URI via GETs and POSTs.  The URI never
changes- thank the maker.  

It looks like a good rewrite candidate on the actual application
server.  However, I don't get control over that machine, nor would I
want to institute a non-standard (i.e. not out of the box) modification
to many application servers.

> 
> ProxyPass is meant for relatively simple transformations.  If you need
> something complex, you'll need to use mod_rewrite.  To get you started:

Yep, I've already used mod_rewrite, and it works - like a redirect. 
Only problem is that I've got to have a proxy.  I could easily rewrite
the URL to redirect to the actual server, but that bypasses the proxy,
breaking all kinds of stuff I'm trying to do (like single-source traffic
accounting, for one thing).

<snip>

> Joshua.

Again, I appreciate the discussion and thought.  I just wish there were
a solution to this problem.  I'm convinced that mod_proxy is converting
? to %3F somewhere- if perhaps someone could direct me to the location
in the source code, that's all I'd need at this point.

BTW, not sure if I mentioned it earlier, but I'm running 1.3.24 here.

Again, thanks for the responses.

Thanks,
josh

> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
> For additional commands, e-mail: users-help@httpd.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: Reverse Proxy munges chars

Posted by Joshua Slive <jo...@slive.ca>.
On Thu, 6 Jun 2002, Josh Wyatt wrote:

> Hrm- several days and I haven't heard any response about this issue,
> other than from Owen...
>
> Any devel folks care to comment on this?  Especially about mod_proxy
> being broken with regards to RFC 1738?
>
> Specifically, section 2.2 of RFC 1738 says that special characters (such
> as "?" in HTTP) should never be encoded- and it looks like mod_proxy
> encodes ? to %3F during proxy.

I don't see that.  mod_proxy does not encode the query string; it is only
this particular directive that does, and for very good reasons.

>
> > >ProxyPass /
> > >http://remote_server.otherdomain.com:7777/servlet?config=true

I don't see how that is supposed to work.  What happens when someone
requests "http://yourserver.example.com/foo?bar"?  Is apache supposed to
deliver
"http://remote_server.otherdomain.com:7777/servlet?config=true/foo?bar"?

ProxyPass is meant for relatively simple transformations.  If you need
something complex, you'll need to use mod_rewrite.  To get you started:

RewriteEngine On
RewriteRule ^/(.*)
http://remote_server.otherdomain.com:7777/servlet/$1?config=true [P,QSA]

Note that I just made a guess about where the original URL should be
tacked on.  You can choose to do it however you want.

Joshua.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: Reverse Proxy munges chars

Posted by Josh Wyatt <Jo...@hcssystems.com>.
Hrm- several days and I haven't heard any response about this issue,
other than from Owen...

Any devel folks care to comment on this?  Especially about mod_proxy
being broken with regards to RFC 1738?

Specifically, section 2.2 of RFC 1738 says that special characters (such
as "?" in HTTP) should never be encoded- and it looks like mod_proxy
encodes ? to %3F during proxy.

Any other thoughts?  I can't believe this is broken for anyone trying to
use Apache to reverse proxy from Oracle IAS 9i.

Thanks,
Josh

Boyle Owen wrote:
> 
> >Hi All,
> >I'm trying to use Apache as a reverse proxy, allowing users to view a
> >site through the proxy:
> >ProxyPass /
> >http://remote_server.otherdomain.com:7777/servlet?config=true
> >
> >It almost works.  When apache requests the new URL however, it is
> >replacing the ? with %3F, which is failing on
> >remote_server.otherdomain.com .
> >
> >How do I make apache pass the "?" as-is, instead of converting it to
> >"%3F" before requesting it from remote_server.otherdomain.com?
> 
> I'm not sure you can do this. Characters after the "?" are not part of the URL. Since there is nothing conditional on the LHS of the mapping, why not make "config=true" the default in your servlet then you can use the simpler rule:
> 
>         ProxyPass / http://remote_server.otherdomain.com:7777/servlet
> 
> Rgds,
> 
> Owen Boyle

Hi Owen,

Thanks for the reply.  I don't believe your claim about the query string
not being part of the URL is true.

Check out RFC 1738, "Uniform Resource Locators" (
http://www.w3.org/Addressing/rfc1738.txt ).  Specifically, section 3.3
[1].

I don't understand why Apache is munging the ? into %3F.  This seems
mostly broken.  In fact, according to section 2.2 of the same RFC,
Apache is doing exactly the *wrong* thing[2] (note the last sentence).

Thanks,
Josh


[1]:
3.3. HTTP

   The HTTP URL scheme is used to designate Internet resources
   accessible using HTTP (HyperText Transfer Protocol).

   The HTTP protocol is specified elsewhere. This specification only
   describes the syntax of HTTP URLs.

   An HTTP URL takes the form:

      http://<host>:<port>/<path>?<searchpart>

   where <host> and <port> are as described in Section 3.1. If :<port>
   is omitted, the port defaults to 80.  No user name or password is
   allowed.  <path> is an HTTP selector, and <searchpart> is a query
   string. The <path> is optional, as is the <searchpart> and its
   preceding "?". If neither <path> nor <searchpart> is present, the "/"
   may also be omitted.

   Within the <path> and <searchpart> components, "/", ";", "?" are
   reserved.  The "/" character may be used within HTTP to designate a
   hierarchical structure.



[2]:
   Reserved: 

   Many URL schemes reserve certain characters for a special meaning:
   their appearance in the scheme-specific part of the URL has a
   designated semantics. If the character corresponding to an octet is
   reserved in a scheme, the octet must be encoded.  The characters ";",
   "/", "?", ":", "@", "=" and "&" are the characters which may be
   reserved for special meaning within a scheme. No other characters may
   be reserved within a scheme.

   Usually a URL has the same interpretation when an octet is
   represented by a character and when it encoded. However, this is not
   true for reserved characters: encoding a character reserved for a
   particular scheme may change the semantics of a URL.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org