You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Sam Stainsby <sa...@sustainablesoftware.com.au> on 2012/10/29 08:05:59 UTC

literal '+' in URL when creaitng a database

Hi all,

Shouldn't this succeed (assuming appropriate permissions):

curl -X PUT 'http://localhost:5984/aaa+bbb'

Instead, I get the "Only lowercase characters (a-z), digits (0-9), and 
any of the characters _, $, (, ), +, -, and / are allowed ..." error.

I understand that '+' has special significance in the query part of a 
URL, but not the path part, so I think the above should work. I've found 
with the latest Dispatch library (0.9.3) that dispatch doesn't encode the 
'+', which from what I've read since seems to still be a legal URL. On 
the other hand,, couch seems to require it to be encoded, so the 
following *does* succeed:

curl -X PUT 'http://localhost:5984/aaa%2bbbb'

resulting in a database named 'aaa+bbb'.

I've checked (with wireshark) that the first query does indeed send the 
literal '+ character : PUT /aaa+bbb ...

Cheers,
Sam Stainsby.


Re: literal '+' in URL when creaitng a database

Posted by Sam Stainsby <sa...@sustainablesoftware.com.au>.
On Mon, 29 Oct 2012 15:29:31 -0700, Mark Hahn wrote:

> I always encode the entire url.  That can't cause a problem, can it?

Best you have a look at this I think:

http://www.lunatech-research.com/archives/2009/02/03/what-every-web-
developer-must-know-about-url-encoding

-- Sam.


Re: literal '+' in URL when creaitng a database

Posted by Mark Hahn <ma...@hahnca.com>.
I always encode the entire url.  That can't cause a problem, can it?

On Mon, Oct 29, 2012 at 3:27 PM, Sam Stainsby <
sam@sustainablesoftware.com.au> wrote:

> On Mon, 29 Oct 2012 15:18:16 -0700, Mark Hahn wrote:
>
> >> "URL encoding" is applied homogeneously over all parts of the URL.
> >
> > What if there was a slash or hash character?  I don't see how you can
> > avoid escaping the whole url.
>
> Sorry, what I mean is that different parts of the URL have subtlety
> different encoding rules.
>
> -- Sam.
>
>

Re: literal '+' in URL when creaitng a database

Posted by Sam Stainsby <sa...@sustainablesoftware.com.au>.
On Mon, 29 Oct 2012 15:18:16 -0700, Mark Hahn wrote:

>> "URL encoding" is applied homogeneously over all parts of the URL.
> 
> What if there was a slash or hash character?  I don't see how you can
> avoid escaping the whole url.

Sorry, what I mean is that different parts of the URL have subtlety 
different encoding rules.

-- Sam.


Re: literal '+' in URL when creaitng a database

Posted by Mark Hahn <ma...@hahnca.com>.
> "URL encoding" is applied homogeneously over all parts of the URL.

What if there was a slash or hash character?  I don't see how you can avoid
escaping the whole url.

On Mon, Oct 29, 2012 at 3:13 PM, Sam Stainsby <
sam@sustainablesoftware.com.au> wrote:

> On Mon, 29 Oct 2012 17:07:37 +0000, Robert Newson wrote:
>
> > It's because we call mochiweb_util:unquote(Path) which replaces the +
> > for a space.
>
> What I've read is that there seems to be a widespread misconception that
> "URL encoding" is applied homogeneously over all parts of the URL. Even
> some major libraries get it wrong --- or have misleading names at least.
>
> I've reported this now:
> https://issues.apache.org/jira/browse/COUCHDB-1580
>
> -- Sam.
>
>

Re: literal '+' in URL when creaitng a database

Posted by Sam Stainsby <sa...@sustainablesoftware.com.au>.
On Mon, 29 Oct 2012 17:07:37 +0000, Robert Newson wrote:

> It's because we call mochiweb_util:unquote(Path) which replaces the +
> for a space.

What I've read is that there seems to be a widespread misconception that 
"URL encoding" is applied homogeneously over all parts of the URL. Even 
some major libraries get it wrong --- or have misleading names at least.

I've reported this now:
https://issues.apache.org/jira/browse/COUCHDB-1580

-- Sam.


Re: literal '+' in URL when creaitng a database

Posted by Robert Newson <rn...@apache.org>.
It's because we call mochiweb_util:unquote(Path) which replaces the +
for a space.

B.

On 29 October 2012 16:48, Jens Alfke <je...@couchbase.com> wrote:
>
> On Oct 29, 2012, at 1:26 AM, Sam Stainsby <sa...@sustainablesoftware.com.au>> wrote:
>
> How couch encodes that as a file name in an OS would be internal to
> couch, so if couch is using query string encoding for the file name, that
> may be a good choice for OS portability. However, my understanding is
> that '+' representing a space in a URL is only valid for the *query* part
> of a URL.
>
> Agreed — it should not be necessary to URL-encode “+” signs in the path portion of a URL. Your URL refers to the database named “aaa+bbb”, not “aaa bbb”, so the request should have succeeded. This sounds like a bug in CouchDB.
>
> —Jens

Re: literal '+' in URL when creaitng a database

Posted by Jens Alfke <je...@couchbase.com>.
On Oct 29, 2012, at 1:26 AM, Sam Stainsby <sa...@sustainablesoftware.com.au>> wrote:

How couch encodes that as a file name in an OS would be internal to
couch, so if couch is using query string encoding for the file name, that
may be a good choice for OS portability. However, my understanding is
that '+' representing a space in a URL is only valid for the *query* part
of a URL.

Agreed — it should not be necessary to URL-encode “+” signs in the path portion of a URL. Your URL refers to the database named “aaa+bbb”, not “aaa bbb”, so the request should have succeeded. This sounds like a bug in CouchDB.

—Jens

Re: literal '+' in URL when creaitng a database

Posted by Sam Stainsby <sa...@sustainablesoftware.com.au>.
On Mon, 29 Oct 2012 08:23:09 +0100, Benoit Chesneau wrote:

> On Mon, Oct 29, 2012 at 8:05 AM, Sam Stainsby

>> I understand that '+' has special significance in the query part of a
>> URL, but not the path part, so I think the above should work.

> '+' would mean space on the file system if I recall correctly. Which
> could be problematic on some platforms.

Hi Benoit,

How couch encodes that as a file name in an OS would be internal to 
couch, so if couch is using query string encoding for the file name, that 
may be a good choice for OS portability. However, my understanding is 
that '+' representing a space in a URL is only valid for the *query* part 
of a URL.

"Within the query string, the plus sign is reserved as shorthand notation 
for a space. Therefore, real plus signs must be encoded. This method was 
used to make query URIs easier to pass in systems which did not allow 
spaces." (http://www.w3.org/Addressing/URL/4_URI_Recommentations.html)


"For HTTP URLs, a space in a path fragment part has to be encoded to 
"%20" (not, absolutely not "+"), while the "+" character in the path 
fragment part can be left unencoded."
http://www.lunatech-research.com/archives/2009/02/03/what-every-web-
developer-must-know-about-url-encoding

Cheers,
Sam.


Re: literal '+' in URL when creaitng a database

Posted by Benoit Chesneau <bc...@gmail.com>.
On Mon, Oct 29, 2012 at 8:05 AM, Sam Stainsby
<sa...@sustainablesoftware.com.au> wrote:
> Hi all,
>
> Shouldn't this succeed (assuming appropriate permissions):
>
> curl -X PUT 'http://localhost:5984/aaa+bbb'
>
> Instead, I get the "Only lowercase characters (a-z), digits (0-9), and
> any of the characters _, $, (, ), +, -, and / are allowed ..." error.
>
> I understand that '+' has special significance in the query part of a
> URL, but not the path part, so I think the above should work. I've found
> with the latest Dispatch library (0.9.3) that dispatch doesn't encode the
> '+', which from what I've read since seems to still be a legal URL. On
> the other hand,, couch seems to require it to be encoded, so the
> following *does* succeed:
>
> curl -X PUT 'http://localhost:5984/aaa%2bbbb'
>
> resulting in a database named 'aaa+bbb'.
>
> I've checked (with wireshark) that the first query does indeed send the
> literal '+ character : PUT /aaa+bbb ...
>
> Cheers,
> Sam Stainsby.
>

'+' would mean space on the file system if I recall correctly. Which
could be problematic on some platforms.

- benoit