You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Nick Kew <ni...@webthing.com> on 2007/10/08 00:50:59 UTC

Re: Broken URI-unescaping in mod_proxy

On Thu, 13 Sep 2007 08:47:13 -0700
"Roy T. Fielding" <fi...@gbiv.com> wrote:

>    Proxies are absolutely
> forbidden from making any change to the URI -- they must forward
> as is or return an error.

This is at the root of PR41798, and the others I've marked as
duplicates of it.

In fact, it seems to be simpler to fix than I realised.
Despite standard URL manipulation, the URL is correct at the
point where it's passed to proxy_http_canon (+clones like
proxy_balancer_canon).  It is specifically ap_proxy_canonenc
that corrupt URLs containing escaped characters.

The bug is fixed if we just remove ap_proxy_canonenc!

Looking more closely at ap_proxy_canonenc, it is indeed
just plain wrong at this point:

/*
 * Convert a URL-encoded string to canonical form.
 * It decodes characters which need not be encoded,
 * and encodes those which must be encoded, and does not touch
 * those which must not be touched.
 */

The first clause (decodes characters which need not be encoded)
is the culprit here, directly responsible for the bug.
Re-encoding characters that must be encoded is AFAICT superfluous:
if the URL contains disallowed bytes at this point due to a
bug in our earlier processing, we should reject it with 400
rather than change it.

Given my history of fluffing up late night patches, I'll leave
this for now.  But if noone shouts, I'll replace ap_proxy_canonenc
with a simple validity check in the morning.

Example:
GET http://redirector/redirect-to/http%3A%2F%2Fexample.com%2F
Proxy correctly preserves %2F, but rewrites %3A to a colon.
Likewise it'll incorrectly unescape ? and &.  This breaks
things like yahoo redirector.

-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/

Re: Broken URI-unescaping in mod_proxy

Posted by Nick Kew <ni...@webthing.com>.
On Mon, 08 Oct 2007 11:17:23 +0200
Ruediger Pluem <rp...@apache.org> wrote:


> Please check that your patch does not fall into the traps I mentioned
> in
> 
> http://mail-archives.apache.org/mod_mbox/httpd-dev/200709.mbox/%3c46E450D9.2020601@apache.org%3e

Yesterday's discovery that suddenly makes this look easy, is that
we're talking about a canonicalisation happening in fixups, long
after the security-sensitive parsing of incoming URLs.

I'm currently concentrating on the forward proxy.  The reverse
proxy is different, and the code path in question is already
slightly different for it.  Testcasing that is the main 
remaining TBD.

BTW, I should've added: a good forward proxy testcase is the URL
posted by the reporter in PR#42592.

-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/

Re: Broken URI-unescaping in mod_proxy

Posted by rahul <ra...@Sun.COM>.
[Graham Leggett:]
| > It would be nice to have different modules for reverse proxy and forward
| > proxy.. from an FTP perspective.
| >
| > There is a fairly large difference in FTP (and perhaps in other protocols
| > too) in terms of the optimizations that needs to be done for forward proxy
| > and reverse proxy.
| >
| > In forward proxy, we can not assume the kind of ftp servers the client
| > requests. So when there is an error of some sort we should try again
| > with a syntax that might be acceptable to the next possible type of
| > server.
| >
| > In the reverse proxy, this is wrong, and introduces unnecessary
| > overheads in network traffic (where it would be simpler to ask the user
| > to provide the type of server in the backend and error out if the ftp
| > server returns error.)
| 
| There is no need to have separate module to achieve this - providing a
| mechanism to override certain behaviour when the administrator wants it,
| but defaulting to RFC compliant behaviour will achieve the same thing.

True, my point is that these choices are distributed all over the code.
While it can certainly be run together, it would be much cleaner to have
different modules with emphasis on different things while using a common
ftp_util base for things that are similar.

Another problem is with the default behavior. What is nice for a forward
ftp proxy is not a correct default for a reverse ftp proxy (as explained
above).

Thirdly, the FTP rfc is silent (AFAIK - please correct me if I am wrong.)
in things like LIST format. so there is no RFC compliant behavior to
default to for this. 

                                    rahul
--
1. e4 _

Re: Broken URI-unescaping in mod_proxy

Posted by Graham Leggett <mi...@sharp.fm>.
On Thu, October 11, 2007 2:41 pm, rahul wrote:

> In the case of a forward proxy you cant expect to get any of these
> information from the configuration. So you start with the most general
> assumption, and if it fails, you try the next one by interrogating the
> server. The requirement of highest importance is to try to find a
> sequence of commands that matches the request and serve it.

As it does now, yes.

> For a reverse proxy, the scenario changes. You assume with a default
> assumption which may (most possibly) partially be supplied by the
> configuration, Thus if the ftp server returns an error on any thing, you
> error out and return with out trying any thing else. The requirement
> here is to avoid loading the FTP server you are front ending too much.
>
> The conflict between these two interests can be seen in all the places I
> mentioned.

As I said earlier, it is perfectly reasonable to add functionality that
allows the admin to specify concrete behaviour, so instead of "try A, and
if that fails try B" the admin can just say "try B, don't waste your time
trying A". This can be accomplished by a few lines of code in the existing
module.

> (OT):
> In case you do have spare time, and working on the mod_proxy_ftp, could
> you please review the patches (41676, 43231, 43275, 31490, 27835)? I had
> submitted these some time back but does not seem to have garnered any
> interest.

Let me make some time and take a look.

Regards,
Graham
--



Re: Broken URI-unescaping in mod_proxy

Posted by rahul <Ra...@Sun.COM>.
[Graham Leggett:]
| > True, my point is that these choices are distributed all over the code.
| > While it can certainly be run together, it would be much cleaner to have
| > different modules with emphasis on different things while using a common
| > ftp_util base for things that are similar.
 
| The ftp stuff should only be contained in mod_proxy_ftp, which is itself a
| submodule of mod_proxy, these changes should definitely not be all over
| the code.

Since the discussion was about merits of splitting mod_proxy into reverse
and forward proxy modules, I assumed that the corresponding for ftp too.

| If you split mod_proxy_ftp into two, you now have two almost identical
| modules where one half contains a feature and the other half doesn't,
| leading to situations where something is supported in the reverse proxy
| case, but not the forward proxy case. I don't see any value in this.

Things that are identical can be factored out (as I mentioned) into a
separate util file.

| > Another problem is with the default behavior. What is nice for a forward
| > ftp proxy is not a correct default for a reverse ftp proxy (as explained
| > above).
| >
| > Thirdly, the FTP rfc is silent (AFAIK - please correct me if I am wrong.)
| > in things like LIST format. so there is no RFC compliant behavior to
| > default to for this.
| 
| The ftp proxy handles this by following "safe" behaviour, such as "cd"ing
| individually to each directory component instead of assuming a potentially
| incorrect path separator. This makes the proxy work everywhere, but in
| cases where an admin knows what kind of proxy is being used, it's quite
| sensible to offer the admin the option to hard code a separator, or hard
| code a list format, so as to cut down on network traffic. None of this
| justifies a split in modules though.

----
This is one place. (LIST/NLST) where different systems vary widely in
their implementation.

Another is how you interpret the result of SIZE command (not in RFC,
but most implement it - so if it works it is a file if it does not it
_may_ be a directory.)

A different one is what kind of connection you use to transfer data,
ie PASSIVE/EPASV/PORT,..

One more is how you do CWD as you have mentioned above.

Another is that you may want to restrict the ftp server you front end to
some users with out going to the ftp server to verify.

----
In the case of a forward proxy you cant expect to get any of these
information from the configuration. So you start with the most general
assumption, and if it fails, you try the next one by interrogating the
server. The requirement of highest importance is to try to find a
sequence of commands that matches the request and serve it.

For a reverse proxy, the scenario changes. You assume with a default
assumption which may (most possibly) partially be supplied by the
configuration, Thus if the ftp server returns an error on any thing, you
error out and return with out trying any thing else. The requirement
here is to avoid loading the FTP server you are front ending too much.

The conflict between these two interests can be seen in all the places I
mentioned.
----

I think a better way to do this is to code it around two state-machines,
one each for RFP and FFP. The STM should be responsible for what step to
take after each response/error from the server, If we do that then it will
be possible to localize the above decisions to just those with the rest
of FTP code being shared.

| Please don't forget that mod_proxy already contains a non trivial
| structure: mod_proxy is a "parent" module, providing hooks to
| mod_proxy_http and mod_proxy_ftp, which are child modules. Splitting the
| modules yet again adds a documentation and support burden - now people
| must be educated on why they must choose module A instead of almost
| identical module B.

IMHO the people generally think of their setup as either of a proxy or
as a load balancer/reverse proxy.  It would be easy for them to make the
connection. We do have a precedence in different directives having a large
overlap as in ProxyPass and Rewrite rules.

(OT):
In case you do have spare time, and working on the mod_proxy_ftp, could
you please review the patches (41676, 43231, 43275, 31490, 27835)? I had
submitted these some time back but does not seem to have garnered any
interest.

                                    rahul
--
1. e4 _

Re: Broken URI-unescaping in mod_proxy

Posted by Graham Leggett <mi...@sharp.fm>.
On Thu, October 11, 2007 12:53 pm, rahul wrote:

> True, my point is that these choices are distributed all over the code.
> While it can certainly be run together, it would be much cleaner to have
> different modules with emphasis on different things while using a common
> ftp_util base for things that are similar.

The ftp stuff should only be contained in mod_proxy_ftp, which is itself a
submodule of mod_proxy, these changes should definitely not be all over
the code.

If you split mod_proxy_ftp into two, you now have two almost identical
modules where one half contains a feature and the other half doesn't,
leading to situations where something is supported in the reverse proxy
case, but not the forward proxy case. I don't see any value in this.

> Another problem is with the default behavior. What is nice for a forward
> ftp proxy is not a correct default for a reverse ftp proxy (as explained
> above).
>
> Thirdly, the FTP rfc is silent (AFAIK - please correct me if I am wrong.)
> in things like LIST format. so there is no RFC compliant behavior to
> default to for this.

The ftp proxy handles this by following "safe" behaviour, such as "cd"ing
individually to each directory component instead of assuming a potentially
incorrect path separator. This makes the proxy work everywhere, but in
cases where an admin knows what kind of proxy is being used, it's quite
sensible to offer the admin the option to hard code a separator, or hard
code a list format, so as to cut down on network traffic. None of this
justifies a split in modules though.

Please don't forget that mod_proxy already contains a non trivial
structure: mod_proxy is a "parent" module, providing hooks to
mod_proxy_http and mod_proxy_ftp, which are child modules. Splitting the
modules yet again adds a documentation and support burden - now people
must be educated on why they must choose module A instead of almost
identical module B.

Regards,
Graham
--



Re: Broken URI-unescaping in mod_proxy

Posted by rahul <Ra...@Sun.COM>.
[Graham Leggett:]
| > It would be nice to have different modules for reverse proxy and forward
| > proxy.. from an FTP perspective.
| >
| > There is a fairly large difference in FTP (and perhaps in other protocols
| > too) in terms of the optimizations that needs to be done for forward proxy
| > and reverse proxy.
| >
| > In forward proxy, we can not assume the kind of ftp servers the client
| > requests. So when there is an error of some sort we should try again
| > with a syntax that might be acceptable to the next possible type of
| > server.
| >
| > In the reverse proxy, this is wrong, and introduces unnecessary
| > overheads in network traffic (where it would be simpler to ask the user
| > to provide the type of server in the backend and error out if the ftp
| > server returns error.)
|
| There is no need to have separate module to achieve this - providing a
| mechanism to override certain behaviour when the administrator wants it,
| but defaulting to RFC compliant behaviour will achieve the same thing.

True, my point is that these choices are distributed all over the code.
While it can certainly be run together, it would be much cleaner to have
different modules with emphasis on different things while using a common
ftp_util base for things that are similar.

Another problem is with the default behavior. What is nice for a forward
ftp proxy is not a correct default for a reverse ftp proxy (as explained
above).

Thirdly, the FTP rfc is silent (AFAIK - please correct me if I am wrong.)
in things like LIST format. so there is no RFC compliant behavior to
default to for this.

                                    rahul
--
1. e4 _

Re: Broken URI-unescaping in mod_proxy

Posted by Graham Leggett <mi...@sharp.fm>.
On Wed, October 10, 2007 3:27 pm, rahul wrote:

> It would be nice to have different modules for reverse proxy and forward
> proxy.. from an FTP perspective.
>
> There is a fairly large difference in FTP (and perhaps in other protocols
> too) in terms of the optimizations that needs to be done for forward proxy
> and reverse proxy.
>
> In forward proxy, we can not assume the kind of ftp servers the client
> requests. So when there is an error of some sort we should try again
> with a syntax that might be acceptable to the next possible type of
> server.
>
> In the reverse proxy, this is wrong, and introduces unnecessary
> overheads in network traffic (where it would be simpler to ask the user
> to provide the type of server in the backend and error out if the ftp
> server returns error.)

There is no need to have separate module to achieve this - providing a
mechanism to override certain behaviour when the administrator wants it,
but defaulting to RFC compliant behaviour will achieve the same thing.

Regards,
Graham
--



Re: Broken URI-unescaping in mod_proxy

Posted by rahul <Ra...@Sun.COM>.
[Nick Kew:]
| On Wed, 10 Oct 2007 00:17:18 +0200
| Graham Leggett <mi...@sharp.fm> wrote:
| 
| > As I recall there is very little difference between the code for 
| > "forward proxy" and the code for "reverse proxy", the key differences 
| > being to send a Proxy-Auth instead of Auth where appropriate, and
| > other minor things.
| 
| There's quite a lot of difference in the path manipulation,
| though both have similar bugs.
| 
| > Separating the module into two will just mean two modules with
| > virtually identical code: a breeding ground for all sorts of problems.
| 
| Not necessarily.  A reworking can factor out more of the common
| code into proxy_util, leaving mod_proxy and mod_gateway as
| fairly slim modules.

It would be nice to have different modules for reverse proxy and forward
proxy.. from an FTP perspective.

There is a fairly large difference in FTP (and perhaps in other protocols
too) in terms of the optimizations that needs to be done for forward proxy
and reverse proxy.

In forward proxy, we can not assume the kind of ftp servers the client
requests. So when there is an error of some sort we should try again
with a syntax that might be acceptable to the next possible type of
server. 

In the reverse proxy, this is wrong, and introduces unnecessary
overheads in network traffic (where it would be simpler to ask the user
to provide the type of server in the backend and error out if the ftp
server returns error.)

(examples include negotiating PORT/PASSIVE/EPRT/EPSV connections, the
syntax of various commands like LIST etc..)


                                    rahul
--
1. e4 _



Re: Broken URI-unescaping in mod_proxy

Posted by Nick Kew <ni...@webthing.com>.
On Wed, 10 Oct 2007 00:17:18 +0200
Graham Leggett <mi...@sharp.fm> wrote:

> As I recall there is very little difference between the code for 
> "forward proxy" and the code for "reverse proxy", the key differences 
> being to send a Proxy-Auth instead of Auth where appropriate, and
> other minor things.

There's quite a lot of difference in the path manipulation,
though both have similar bugs.

> Separating the module into two will just mean two modules with
> virtually identical code: a breeding ground for all sorts of problems.

Not necessarily.  A reworking can factor out more of the common
code into proxy_util, leaving mod_proxy and mod_gateway as
fairly slim modules.

But pending a round tuit, the question remains academic.

-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/

Re: Broken URI-unescaping in mod_proxy

Posted by Graham Leggett <mi...@sharp.fm>.
Akins, Brian wrote:

>> For the millionth time, if that is a problem then separate the proxy
>> module from the gateway ("reverse proxy") module.  They do not belong
>> together.
> 
> +1.  This would sway me more to go back to the "stock" modules.  The
> "reverse proxy" could be much more aggressive with keep-alives, connection
> pools, etc.

As I recall there is very little difference between the code for 
"forward proxy" and the code for "reverse proxy", the key differences 
being to send a Proxy-Auth instead of Auth where appropriate, and other 
minor things.

Separating the module into two will just mean two modules with virtually 
identical code: a breeding ground for all sorts of problems.

Regards,
Graham
--

Re: Broken URI-unescaping in mod_proxy

Posted by "Akins, Brian" <Br...@turner.com>.
On 10/8/07 1:44 PM, "Roy T. Fielding" <ro...@gmail.com> wrote:


> For the millionth time, if that is a problem then separate the proxy
> module from the gateway ("reverse proxy") module.  They do not belong
> together.

+1.  This would sway me more to go back to the "stock" modules.  The
"reverse proxy" could be much more aggressive with keep-alives, connection
pools, etc.



-- 
Brian Akins
Chief Operations Engineer
Turner Digital Media Technologies

Re: Broken URI-unescaping in mod_proxy

Posted by "Roy T. Fielding" <ro...@gmail.com>.
On Oct 8, 2007, at 2:17 AM, Ruediger Pluem wrote:
> Please check that your patch does not fall into the traps I  
> mentioned in
>
> http://mail-archives.apache.org/mod_mbox/httpd-dev/200709.mbox/% 
> 3c46E450D9.2020601@apache.org%3e
>
> on this thread. Otherwise we create a security issue (at least for  
> reverse proxies and
> for reverse proxies Roy's statement is not valid as it is only  
> valid for *proxies*).

For the millionth time, if that is a problem then separate the proxy
module from the gateway ("reverse proxy") module.  They do not belong
together.

....Roy


Re: Broken URI-unescaping in mod_proxy

Posted by Ruediger Pluem <rp...@apache.org>.

On 10/08/2007 12:50 AM, Nick Kew wrote:
> On Thu, 13 Sep 2007 08:47:13 -0700
> "Roy T. Fielding" <fi...@gbiv.com> wrote:
> 
>>    Proxies are absolutely
>> forbidden from making any change to the URI -- they must forward
>> as is or return an error.
> 
> This is at the root of PR41798, and the others I've marked as
> duplicates of it.
> 
> In fact, it seems to be simpler to fix than I realised.
> Despite standard URL manipulation, the URL is correct at the
> point where it's passed to proxy_http_canon (+clones like
> proxy_balancer_canon).  It is specifically ap_proxy_canonenc
> that corrupt URLs containing escaped characters.
> 
> The bug is fixed if we just remove ap_proxy_canonenc!
> 
> Looking more closely at ap_proxy_canonenc, it is indeed
> just plain wrong at this point:
> 
> /*
>  * Convert a URL-encoded string to canonical form.
>  * It decodes characters which need not be encoded,
>  * and encodes those which must be encoded, and does not touch
>  * those which must not be touched.
>  */
> 
> The first clause (decodes characters which need not be encoded)
> is the culprit here, directly responsible for the bug.
> Re-encoding characters that must be encoded is AFAICT superfluous:
> if the URL contains disallowed bytes at this point due to a
> bug in our earlier processing, we should reject it with 400
> rather than change it.
> 
> Given my history of fluffing up late night patches, I'll leave
> this for now.  But if noone shouts, I'll replace ap_proxy_canonenc
> with a simple validity check in the morning.

Please check that your patch does not fall into the traps I mentioned in

http://mail-archives.apache.org/mod_mbox/httpd-dev/200709.mbox/%3c46E450D9.2020601@apache.org%3e

on this thread. Otherwise we create a security issue (at least for reverse proxies and
for reverse proxies Roy's statement is not valid as it is only valid for *proxies*).

Regards

RĂ¼diger