You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Adam Kocoloski <ad...@gmail.com> on 2008/09/03 20:38:08 UTC
URL-decoding reverse proxy breaks remote replication
Hi, I installed CouchDB behind nginx the other day and noticed that
remote replication didn't work. The problem seems to be that
a) CouchDB stores the replication history in a local doc with an ID
formed from the URL-encoded paths to the source and target DBs,
b) nginx decodes all %2Fs in the URLs it processes, and
c) couch_httpd chokes on a GET request for the replication history doc
using the decoded URL delivered by nginx.
My workaround was to encode "/" as "|" in the ID of the replication
history document. It seemed simpler than doing extra special-casing
in couch_httpd to handle decoded "/" characters in replication docIDs,
and I didn't see any way to turn off URL decoding in nginx. Best,
Adam
--- a/trunk/src/couchdb/couch_rep.erl
+++ b/trunk/src/couchdb/couch_rep.erl
@@ -28,6 +28,9 @@ url_encode([H|T]) ->
[H|url_encode(T)];
H == $_; H == $.; H == $-; H == $: ->
[H|url_encode(T)];
+ % nginx will decode the %2F which makes couch_httpd blow up
+ H == $/ ->
+ [$||url_encode(T)];
true ->
case lists:flatten(io_lib:format("~.16.0B", [H])) of
[X, Y] ->
Re: URL-decoding reverse proxy breaks remote replication
Posted by Jeremy Wall <jw...@google.com>.
Unfortunately the server is only running 2.2.3 and I can't upgrade it so I
guess I'm stuck. :-(
On Wed, Sep 3, 2008 at 3:03 PM, Adam Kocoloski <ad...@gmail.com>wrote:
> Hi Jeremy, I think Apache added a "nocanon" keyword in 2.2.7+ that's
> supposed to pass raw URLs onto the backend. Have you tried that? Best,
>
> Adam
>
>
> On Sep 3, 2008, at 3:26 PM, Jeremy Wall wrote:
>
> An Apache reverse proxy also breaks with url encodings. So that's at least
>> one other proxy that does it.
>>
>> On Wed, Sep 3, 2008 at 2:13 PM, Damien Katz <da...@apache.org> wrote:
>>
>> This is an issue I've been anticipating for a while, which is proxies
>>> messing around with the url encoding and causing problems.
>>>
>>> CouchDB url elements are delimited by slashes, for example "GET
>>> db/doc/fileattachment". But any of the elements "db" "doc" or
>>> "attachment"
>>> could have slashes in them, if slashes are url encoded (%20 I think).
>>> So
>>> using the slashes requires that the proxies keep the encoding exactly
>>> intact, instead of normalizing encoded urls to slashes.
>>>
>>> I've discussed this a while ago and was advised that proxies shouldn't
>>> mess
>>> with the URL encodings. So too me, my default position is this to me is a
>>> bug in nginx. However, I can be convinced otherwise, if other proxies or
>>> tools tend to do the same thing.
>>>
>>> -Damien
>>>
>>>
>>> On Sep 3, 2008, at 2:38 PM, Adam Kocoloski wrote:
>>>
>>> Hi, I installed CouchDB behind nginx the other day and noticed that
>>> remote
>>>
>>>> replication didn't work. The problem seems to be that
>>>>
>>>> a) CouchDB stores the replication history in a local doc with an ID
>>>> formed
>>>> from the URL-encoded paths to the source and target DBs,
>>>>
>>>> b) nginx decodes all %2Fs in the URLs it processes, and
>>>>
>>>> c) couch_httpd chokes on a GET request for the replication history doc
>>>> using the decoded URL delivered by nginx.
>>>>
>>>> My workaround was to encode "/" as "|" in the ID of the replication
>>>> history document. It seemed simpler than doing extra special-casing in
>>>> couch_httpd to handle decoded "/" characters in replication docIDs, and
>>>> I
>>>> didn't see any way to turn off URL decoding in nginx. Best,
>>>>
>>>> Adam
>>>>
>>>> --- a/trunk/src/couchdb/couch_rep.erl
>>>> +++ b/trunk/src/couchdb/couch_rep.erl
>>>> @@ -28,6 +28,9 @@ url_encode([H|T]) ->
>>>> [H|url_encode(T)];
>>>> H == $_; H == $.; H == $-; H == $: ->
>>>> [H|url_encode(T)];
>>>> + % nginx will decode the %2F which makes couch_httpd blow up
>>>> + H == $/ ->
>>>> + [$||url_encode(T)];
>>>> true ->
>>>> case lists:flatten(io_lib:format("~.16.0B", [H])) of
>>>> [X, Y] ->
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>
Re: URL-decoding reverse proxy breaks remote replication
Posted by Adam Kocoloski <ad...@gmail.com>.
Hi Jeremy, I think Apache added a "nocanon" keyword in 2.2.7+ that's
supposed to pass raw URLs onto the backend. Have you tried that? Best,
Adam
On Sep 3, 2008, at 3:26 PM, Jeremy Wall wrote:
> An Apache reverse proxy also breaks with url encodings. So that's at
> least
> one other proxy that does it.
>
> On Wed, Sep 3, 2008 at 2:13 PM, Damien Katz <da...@apache.org> wrote:
>
>> This is an issue I've been anticipating for a while, which is proxies
>> messing around with the url encoding and causing problems.
>>
>> CouchDB url elements are delimited by slashes, for example "GET
>> db/doc/fileattachment". But any of the elements "db" "doc" or
>> "attachment"
>> could have slashes in them, if slashes are url encoded (%20 I
>> think). So
>> using the slashes requires that the proxies keep the encoding exactly
>> intact, instead of normalizing encoded urls to slashes.
>>
>> I've discussed this a while ago and was advised that proxies
>> shouldn't mess
>> with the URL encodings. So too me, my default position is this to
>> me is a
>> bug in nginx. However, I can be convinced otherwise, if other
>> proxies or
>> tools tend to do the same thing.
>>
>> -Damien
>>
>>
>> On Sep 3, 2008, at 2:38 PM, Adam Kocoloski wrote:
>>
>> Hi, I installed CouchDB behind nginx the other day and noticed that
>> remote
>>> replication didn't work. The problem seems to be that
>>>
>>> a) CouchDB stores the replication history in a local doc with an
>>> ID formed
>>> from the URL-encoded paths to the source and target DBs,
>>>
>>> b) nginx decodes all %2Fs in the URLs it processes, and
>>>
>>> c) couch_httpd chokes on a GET request for the replication history
>>> doc
>>> using the decoded URL delivered by nginx.
>>>
>>> My workaround was to encode "/" as "|" in the ID of the replication
>>> history document. It seemed simpler than doing extra special-
>>> casing in
>>> couch_httpd to handle decoded "/" characters in replication
>>> docIDs, and I
>>> didn't see any way to turn off URL decoding in nginx. Best,
>>>
>>> Adam
>>>
>>> --- a/trunk/src/couchdb/couch_rep.erl
>>> +++ b/trunk/src/couchdb/couch_rep.erl
>>> @@ -28,6 +28,9 @@ url_encode([H|T]) ->
>>> [H|url_encode(T)];
>>> H == $_; H == $.; H == $-; H == $: ->
>>> [H|url_encode(T)];
>>> + % nginx will decode the %2F which makes couch_httpd blow up
>>> + H == $/ ->
>>> + [$||url_encode(T)];
>>> true ->
>>> case lists:flatten(io_lib:format("~.16.0B", [H])) of
>>> [X, Y] ->
>>>
>>>
>>>
>>>
>>
Re: URL-decoding reverse proxy breaks remote replication
Posted by Jeremy Wall <jw...@google.com>.
An Apache reverse proxy also breaks with url encodings. So that's at least
one other proxy that does it.
On Wed, Sep 3, 2008 at 2:13 PM, Damien Katz <da...@apache.org> wrote:
> This is an issue I've been anticipating for a while, which is proxies
> messing around with the url encoding and causing problems.
>
> CouchDB url elements are delimited by slashes, for example "GET
> db/doc/fileattachment". But any of the elements "db" "doc" or "attachment"
> could have slashes in them, if slashes are url encoded (%20 I think). So
> using the slashes requires that the proxies keep the encoding exactly
> intact, instead of normalizing encoded urls to slashes.
>
> I've discussed this a while ago and was advised that proxies shouldn't mess
> with the URL encodings. So too me, my default position is this to me is a
> bug in nginx. However, I can be convinced otherwise, if other proxies or
> tools tend to do the same thing.
>
> -Damien
>
>
> On Sep 3, 2008, at 2:38 PM, Adam Kocoloski wrote:
>
> Hi, I installed CouchDB behind nginx the other day and noticed that remote
>> replication didn't work. The problem seems to be that
>>
>> a) CouchDB stores the replication history in a local doc with an ID formed
>> from the URL-encoded paths to the source and target DBs,
>>
>> b) nginx decodes all %2Fs in the URLs it processes, and
>>
>> c) couch_httpd chokes on a GET request for the replication history doc
>> using the decoded URL delivered by nginx.
>>
>> My workaround was to encode "/" as "|" in the ID of the replication
>> history document. It seemed simpler than doing extra special-casing in
>> couch_httpd to handle decoded "/" characters in replication docIDs, and I
>> didn't see any way to turn off URL decoding in nginx. Best,
>>
>> Adam
>>
>> --- a/trunk/src/couchdb/couch_rep.erl
>> +++ b/trunk/src/couchdb/couch_rep.erl
>> @@ -28,6 +28,9 @@ url_encode([H|T]) ->
>> [H|url_encode(T)];
>> H == $_; H == $.; H == $-; H == $: ->
>> [H|url_encode(T)];
>> + % nginx will decode the %2F which makes couch_httpd blow up
>> + H == $/ ->
>> + [$||url_encode(T)];
>> true ->
>> case lists:flatten(io_lib:format("~.16.0B", [H])) of
>> [X, Y] ->
>>
>>
>>
>>
>
Re: URL-decoding reverse proxy breaks remote replication
Posted by Damien Katz <da...@apache.org>.
This is an issue I've been anticipating for a while, which is proxies
messing around with the url encoding and causing problems.
CouchDB url elements are delimited by slashes, for example "GET db/doc/
fileattachment". But any of the elements "db" "doc" or "attachment"
could have slashes in them, if slashes are url encoded (%20 I
think). So using the slashes requires that the proxies keep the
encoding exactly intact, instead of normalizing encoded urls to slashes.
I've discussed this a while ago and was advised that proxies shouldn't
mess with the URL encodings. So too me, my default position is this to
me is a bug in nginx. However, I can be convinced otherwise, if other
proxies or tools tend to do the same thing.
-Damien
On Sep 3, 2008, at 2:38 PM, Adam Kocoloski wrote:
> Hi, I installed CouchDB behind nginx the other day and noticed that
> remote replication didn't work. The problem seems to be that
>
> a) CouchDB stores the replication history in a local doc with an ID
> formed from the URL-encoded paths to the source and target DBs,
>
> b) nginx decodes all %2Fs in the URLs it processes, and
>
> c) couch_httpd chokes on a GET request for the replication history
> doc using the decoded URL delivered by nginx.
>
> My workaround was to encode "/" as "|" in the ID of the replication
> history document. It seemed simpler than doing extra special-casing
> in couch_httpd to handle decoded "/" characters in replication
> docIDs, and I didn't see any way to turn off URL decoding in nginx.
> Best,
>
> Adam
>
> --- a/trunk/src/couchdb/couch_rep.erl
> +++ b/trunk/src/couchdb/couch_rep.erl
> @@ -28,6 +28,9 @@ url_encode([H|T]) ->
> [H|url_encode(T)];
> H == $_; H == $.; H == $-; H == $: ->
> [H|url_encode(T)];
> + % nginx will decode the %2F which makes couch_httpd blow up
> + H == $/ ->
> + [$||url_encode(T)];
> true ->
> case lists:flatten(io_lib:format("~.16.0B", [H])) of
> [X, Y] ->
>
>
>