You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Adam Kocoloski <ad...@gmail.com> on 2008/09/03 20:38:08 UTC

URL-decoding reverse proxy breaks remote replication

Hi, I installed CouchDB behind nginx the other day and noticed that  
remote replication didn't work.  The problem seems to be that

a) CouchDB stores the replication history in a local doc with an ID  
formed from the URL-encoded paths to the source and target DBs,

b) nginx decodes all %2Fs in the URLs it processes, and

c) couch_httpd chokes on a GET request for the replication history doc  
using the decoded URL delivered by nginx.

My workaround was to encode "/" as "|" in the ID of the replication  
history document.  It seemed simpler than doing extra special-casing  
in couch_httpd to handle decoded "/" characters in replication docIDs,  
and I didn't see any way to turn off URL decoding in nginx.  Best,

Adam

--- a/trunk/src/couchdb/couch_rep.erl
+++ b/trunk/src/couchdb/couch_rep.erl
@@ -28,6 +28,9 @@ url_encode([H|T]) ->
          [H|url_encode(T)];
      H == $_; H == $.; H == $-; H == $: ->
          [H|url_encode(T)];
+    % nginx will decode the %2F which makes couch_httpd blow up
+    H == $/ ->
+        [$||url_encode(T)];
      true ->
          case lists:flatten(io_lib:format("~.16.0B", [H])) of
          [X, Y] ->




Re: URL-decoding reverse proxy breaks remote replication

Posted by Jeremy Wall <jw...@google.com>.
Unfortunately the server is only running 2.2.3 and I can't upgrade it so I
guess I'm stuck. :-(

On Wed, Sep 3, 2008 at 3:03 PM, Adam Kocoloski <ad...@gmail.com>wrote:

> Hi Jeremy, I think Apache added a "nocanon" keyword in 2.2.7+ that's
> supposed to pass raw URLs onto the backend.  Have you tried that?  Best,
>
> Adam
>
>
> On Sep 3, 2008, at 3:26 PM, Jeremy Wall wrote:
>
>  An Apache reverse proxy also breaks with url encodings. So that's at least
>> one other proxy that does it.
>>
>> On Wed, Sep 3, 2008 at 2:13 PM, Damien Katz <da...@apache.org> wrote:
>>
>>  This is an issue I've been anticipating for a while, which is proxies
>>> messing around with the url encoding and causing problems.
>>>
>>> CouchDB url elements are delimited by slashes, for example "GET
>>> db/doc/fileattachment". But any of the elements "db" "doc" or
>>> "attachment"
>>> could have slashes in them,  if slashes are url encoded (%20 I think).
>>>  So
>>> using the slashes requires that the proxies keep the encoding exactly
>>> intact, instead of normalizing encoded urls to slashes.
>>>
>>> I've discussed this a while ago and was advised that proxies shouldn't
>>> mess
>>> with the URL encodings. So too me, my default position is this to me is a
>>> bug in nginx. However, I can be convinced otherwise, if other proxies or
>>> tools tend to do the same thing.
>>>
>>> -Damien
>>>
>>>
>>> On Sep 3, 2008, at 2:38 PM, Adam Kocoloski wrote:
>>>
>>> Hi, I installed CouchDB behind nginx the other day and noticed that
>>> remote
>>>
>>>> replication didn't work.  The problem seems to be that
>>>>
>>>> a) CouchDB stores the replication history in a local doc with an ID
>>>> formed
>>>> from the URL-encoded paths to the source and target DBs,
>>>>
>>>> b) nginx decodes all %2Fs in the URLs it processes, and
>>>>
>>>> c) couch_httpd chokes on a GET request for the replication history doc
>>>> using the decoded URL delivered by nginx.
>>>>
>>>> My workaround was to encode "/" as "|" in the ID of the replication
>>>> history document.  It seemed simpler than doing extra special-casing in
>>>> couch_httpd to handle decoded "/" characters in replication docIDs, and
>>>> I
>>>> didn't see any way to turn off URL decoding in nginx.  Best,
>>>>
>>>> Adam
>>>>
>>>> --- a/trunk/src/couchdb/couch_rep.erl
>>>> +++ b/trunk/src/couchdb/couch_rep.erl
>>>> @@ -28,6 +28,9 @@ url_encode([H|T]) ->
>>>>      [H|url_encode(T)];
>>>>  H == $_; H == $.; H == $-; H == $: ->
>>>>      [H|url_encode(T)];
>>>> +    % nginx will decode the %2F which makes couch_httpd blow up
>>>> +    H == $/ ->
>>>> +        [$||url_encode(T)];
>>>>  true ->
>>>>      case lists:flatten(io_lib:format("~.16.0B", [H])) of
>>>>      [X, Y] ->
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>

Re: URL-decoding reverse proxy breaks remote replication

Posted by Adam Kocoloski <ad...@gmail.com>.
Hi Jeremy, I think Apache added a "nocanon" keyword in 2.2.7+ that's  
supposed to pass raw URLs onto the backend.  Have you tried that?  Best,

Adam

On Sep 3, 2008, at 3:26 PM, Jeremy Wall wrote:

> An Apache reverse proxy also breaks with url encodings. So that's at  
> least
> one other proxy that does it.
>
> On Wed, Sep 3, 2008 at 2:13 PM, Damien Katz <da...@apache.org> wrote:
>
>> This is an issue I've been anticipating for a while, which is proxies
>> messing around with the url encoding and causing problems.
>>
>> CouchDB url elements are delimited by slashes, for example "GET
>> db/doc/fileattachment". But any of the elements "db" "doc" or  
>> "attachment"
>> could have slashes in them,  if slashes are url encoded (%20 I  
>> think).  So
>> using the slashes requires that the proxies keep the encoding exactly
>> intact, instead of normalizing encoded urls to slashes.
>>
>> I've discussed this a while ago and was advised that proxies  
>> shouldn't mess
>> with the URL encodings. So too me, my default position is this to  
>> me is a
>> bug in nginx. However, I can be convinced otherwise, if other  
>> proxies or
>> tools tend to do the same thing.
>>
>> -Damien
>>
>>
>> On Sep 3, 2008, at 2:38 PM, Adam Kocoloski wrote:
>>
>> Hi, I installed CouchDB behind nginx the other day and noticed that  
>> remote
>>> replication didn't work.  The problem seems to be that
>>>
>>> a) CouchDB stores the replication history in a local doc with an  
>>> ID formed
>>> from the URL-encoded paths to the source and target DBs,
>>>
>>> b) nginx decodes all %2Fs in the URLs it processes, and
>>>
>>> c) couch_httpd chokes on a GET request for the replication history  
>>> doc
>>> using the decoded URL delivered by nginx.
>>>
>>> My workaround was to encode "/" as "|" in the ID of the replication
>>> history document.  It seemed simpler than doing extra special- 
>>> casing in
>>> couch_httpd to handle decoded "/" characters in replication  
>>> docIDs, and I
>>> didn't see any way to turn off URL decoding in nginx.  Best,
>>>
>>> Adam
>>>
>>> --- a/trunk/src/couchdb/couch_rep.erl
>>> +++ b/trunk/src/couchdb/couch_rep.erl
>>> @@ -28,6 +28,9 @@ url_encode([H|T]) ->
>>>       [H|url_encode(T)];
>>>   H == $_; H == $.; H == $-; H == $: ->
>>>       [H|url_encode(T)];
>>> +    % nginx will decode the %2F which makes couch_httpd blow up
>>> +    H == $/ ->
>>> +        [$||url_encode(T)];
>>>   true ->
>>>       case lists:flatten(io_lib:format("~.16.0B", [H])) of
>>>       [X, Y] ->
>>>
>>>
>>>
>>>
>>


Re: URL-decoding reverse proxy breaks remote replication

Posted by Jeremy Wall <jw...@google.com>.
An Apache reverse proxy also breaks with url encodings. So that's at least
one other proxy that does it.

On Wed, Sep 3, 2008 at 2:13 PM, Damien Katz <da...@apache.org> wrote:

> This is an issue I've been anticipating for a while, which is proxies
> messing around with the url encoding and causing problems.
>
> CouchDB url elements are delimited by slashes, for example "GET
> db/doc/fileattachment". But any of the elements "db" "doc" or "attachment"
> could have slashes in them,  if slashes are url encoded (%20 I think).  So
> using the slashes requires that the proxies keep the encoding exactly
> intact, instead of normalizing encoded urls to slashes.
>
> I've discussed this a while ago and was advised that proxies shouldn't mess
> with the URL encodings. So too me, my default position is this to me is a
> bug in nginx. However, I can be convinced otherwise, if other proxies or
> tools tend to do the same thing.
>
> -Damien
>
>
> On Sep 3, 2008, at 2:38 PM, Adam Kocoloski wrote:
>
>  Hi, I installed CouchDB behind nginx the other day and noticed that remote
>> replication didn't work.  The problem seems to be that
>>
>> a) CouchDB stores the replication history in a local doc with an ID formed
>> from the URL-encoded paths to the source and target DBs,
>>
>> b) nginx decodes all %2Fs in the URLs it processes, and
>>
>> c) couch_httpd chokes on a GET request for the replication history doc
>> using the decoded URL delivered by nginx.
>>
>> My workaround was to encode "/" as "|" in the ID of the replication
>> history document.  It seemed simpler than doing extra special-casing in
>> couch_httpd to handle decoded "/" characters in replication docIDs, and I
>> didn't see any way to turn off URL decoding in nginx.  Best,
>>
>> Adam
>>
>> --- a/trunk/src/couchdb/couch_rep.erl
>> +++ b/trunk/src/couchdb/couch_rep.erl
>> @@ -28,6 +28,9 @@ url_encode([H|T]) ->
>>        [H|url_encode(T)];
>>    H == $_; H == $.; H == $-; H == $: ->
>>        [H|url_encode(T)];
>> +    % nginx will decode the %2F which makes couch_httpd blow up
>> +    H == $/ ->
>> +        [$||url_encode(T)];
>>    true ->
>>        case lists:flatten(io_lib:format("~.16.0B", [H])) of
>>        [X, Y] ->
>>
>>
>>
>>
>

Re: URL-decoding reverse proxy breaks remote replication

Posted by Damien Katz <da...@apache.org>.
This is an issue I've been anticipating for a while, which is proxies  
messing around with the url encoding and causing problems.

CouchDB url elements are delimited by slashes, for example "GET db/doc/ 
fileattachment". But any of the elements "db" "doc" or "attachment"  
could have slashes in them,  if slashes are url encoded (%20 I  
think).  So using the slashes requires that the proxies keep the  
encoding exactly intact, instead of normalizing encoded urls to slashes.

I've discussed this a while ago and was advised that proxies shouldn't  
mess with the URL encodings. So too me, my default position is this to  
me is a bug in nginx. However, I can be convinced otherwise, if other  
proxies or tools tend to do the same thing.

-Damien

On Sep 3, 2008, at 2:38 PM, Adam Kocoloski wrote:

> Hi, I installed CouchDB behind nginx the other day and noticed that  
> remote replication didn't work.  The problem seems to be that
>
> a) CouchDB stores the replication history in a local doc with an ID  
> formed from the URL-encoded paths to the source and target DBs,
>
> b) nginx decodes all %2Fs in the URLs it processes, and
>
> c) couch_httpd chokes on a GET request for the replication history  
> doc using the decoded URL delivered by nginx.
>
> My workaround was to encode "/" as "|" in the ID of the replication  
> history document.  It seemed simpler than doing extra special-casing  
> in couch_httpd to handle decoded "/" characters in replication  
> docIDs, and I didn't see any way to turn off URL decoding in nginx.   
> Best,
>
> Adam
>
> --- a/trunk/src/couchdb/couch_rep.erl
> +++ b/trunk/src/couchdb/couch_rep.erl
> @@ -28,6 +28,9 @@ url_encode([H|T]) ->
>         [H|url_encode(T)];
>     H == $_; H == $.; H == $-; H == $: ->
>         [H|url_encode(T)];
> +    % nginx will decode the %2F which makes couch_httpd blow up
> +    H == $/ ->
> +        [$||url_encode(T)];
>     true ->
>         case lists:flatten(io_lib:format("~.16.0B", [H])) of
>         [X, Y] ->
>
>
>