You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by "Taras Puchko (JIRA)" <ji...@apache.org> on 2010/09/08 16:05:32 UTC

[jira] Created: (COUCHDB-883) Wrong document returned due to incorrect URL decoding

Wrong document returned due to incorrect URL decoding
-----------------------------------------------------

                 Key: COUCHDB-883
                 URL: https://issues.apache.org/jira/browse/COUCHDB-883
             Project: CouchDB
          Issue Type: Bug
          Components: HTTP Interface
    Affects Versions: 1.0.1
         Environment: Kubuntu 10.4, Firefox 3.6.8
            Reporter: Taras Puchko


I have two documents in my database: "a b" and "a+b". The first can be retrieved via "/mydb/a%20b" and the second via "/mydb/a%2Bb".

When I enter "/mydb/a b" in the browser it automatically encodes it so the correct document is returned. But when I enter "/mydb/a+b" the URL is sent intact since "+" is a valid character in a path segment according to [1]. The problem is that "GET /mydb/a+b" makes CouchDB return the document with id "a b" and not the intended one, which is against the URI spec .

For an informal description of URL encoding one may refer to [2].

[1]: http://www.ietf.org/rfc/rfc2396.txt
[2]: http://www.lunatech-research.com/archives/2009/02/03/what-every-web-developer-must-know-about-url-encoding

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-883) Wrong document returned due to incorrect URL decoding

Posted by "Taras Puchko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907983#action_12907983 ] 

Taras Puchko commented on COUCHDB-883:
--------------------------------------

Sebastian, "reserved" does NOT mean that a character must be encoded in all parts of a URL.

2.2. Reserved Characters
Characters in the "reserved" set are not reserved in all contexts. The set of characters actually reserved within any given URI component is defined by that component. In general, a character is reserved if the semantics of the URI changes if the character is replaced with its escaped US-ASCII encoding.

3.3. Path Component
segment       = *pchar *( ";" param )
pchar         = unreserved | escaped | ":" | "@" | "&" | "=" | "+" | "$" | ","
The path may consist of a sequence of path segments separated by a single slash "/" character.
Within a path segment, the characters "/", ";", "=", and "?" are reserved. 



> Wrong document returned due to incorrect URL decoding
> -----------------------------------------------------
>
>                 Key: COUCHDB-883
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-883
>             Project: CouchDB
>          Issue Type: Bug
>          Components: HTTP Interface
>    Affects Versions: 1.0.1
>         Environment: Kubuntu 10.4, Firefox 3.6.8
>            Reporter: Taras Puchko
>
> I have two documents in my database: "a b" and "a+b". The first can be retrieved via "/mydb/a%20b" and the second via "/mydb/a%2Bb".
> When I enter "/mydb/a b" in the browser it automatically encodes it so the correct document is returned. But when I enter "/mydb/a+b" the URL is sent intact since "+" is a valid character in a path segment according to [1]. The problem is that "GET /mydb/a+b" makes CouchDB return the document with id "a b" and not the intended one, which is against the URI spec .
> For an informal description of URL encoding one may refer to [2].
> [1]: http://www.ietf.org/rfc/rfc2396.txt
> [2]: http://www.lunatech-research.com/archives/2009/02/03/what-every-web-developer-must-know-about-url-encoding

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-883) Wrong document returned due to incorrect URL decoding

Posted by "Paul Joseph Davis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Joseph Davis updated COUCHDB-883:
--------------------------------------

    Skill Level: Committers Level (Medium to Hard)

> Wrong document returned due to incorrect URL decoding
> -----------------------------------------------------
>
>                 Key: COUCHDB-883
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-883
>             Project: CouchDB
>          Issue Type: Bug
>          Components: HTTP Interface
>    Affects Versions: 1.0.1
>         Environment: Kubuntu 10.4, Firefox 3.6.8
>            Reporter: Taras Puchko
>         Attachments: logging.diff
>
>
> I have two documents in my database: "a b" and "a+b". The first can be retrieved via "/mydb/a%20b" and the second via "/mydb/a%2Bb".
> When I enter "/mydb/a b" in the browser it automatically encodes it so the correct document is returned. But when I enter "/mydb/a+b" the URL is sent intact since "+" is a valid character in a path segment according to [1]. The problem is that "GET /mydb/a+b" makes CouchDB return the document with id "a b" and not the intended one, which is against the URI spec .
> For an informal description of URL encoding one may refer to [2].
> [1]: http://www.ietf.org/rfc/rfc2396.txt
> [2]: http://www.lunatech-research.com/archives/2009/02/03/what-every-web-developer-must-know-about-url-encoding

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-883) Wrong document returned due to incorrect URL decoding

Posted by "Sebastian Cohnen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907922#action_12907922 ] 

Sebastian Cohnen commented on COUCHDB-883:
------------------------------------------

RFC2396: G.2. Modifications from both RFC 1738 and RFC 1808

"The plus "+", dollar "$", and comma "," characters have been added to those in the "reserved" set, since they are treated as reserved within the query component."

Therefor you need to URI-encode the plus (+) character according to the RFC.

> Wrong document returned due to incorrect URL decoding
> -----------------------------------------------------
>
>                 Key: COUCHDB-883
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-883
>             Project: CouchDB
>          Issue Type: Bug
>          Components: HTTP Interface
>    Affects Versions: 1.0.1
>         Environment: Kubuntu 10.4, Firefox 3.6.8
>            Reporter: Taras Puchko
>
> I have two documents in my database: "a b" and "a+b". The first can be retrieved via "/mydb/a%20b" and the second via "/mydb/a%2Bb".
> When I enter "/mydb/a b" in the browser it automatically encodes it so the correct document is returned. But when I enter "/mydb/a+b" the URL is sent intact since "+" is a valid character in a path segment according to [1]. The problem is that "GET /mydb/a+b" makes CouchDB return the document with id "a b" and not the intended one, which is against the URI spec .
> For an informal description of URL encoding one may refer to [2].
> [1]: http://www.ietf.org/rfc/rfc2396.txt
> [2]: http://www.lunatech-research.com/archives/2009/02/03/what-every-web-developer-must-know-about-url-encoding

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (COUCHDB-883) Wrong document returned due to incorrect URL decoding

Posted by "Taras Puchko (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Taras Puchko reopened COUCHDB-883:
----------------------------------


Robert, you are wrong. What "uri escaping rules" are you talking about? 

I've specifically pointed to the spec. Read "3.3. Path Component" and "2.2. Reserved Characters".

There is no rule that makes a plus sign be interpreted as a space. It's a compatibility behavior applicable ONLY to query parameter values.

Please read http://www.lunatech-research.com/archives/2009/02/03/what-every-web-developer-must-know-about-url-encoding


> Wrong document returned due to incorrect URL decoding
> -----------------------------------------------------
>
>                 Key: COUCHDB-883
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-883
>             Project: CouchDB
>          Issue Type: Bug
>          Components: HTTP Interface
>    Affects Versions: 1.0.1
>         Environment: Kubuntu 10.4, Firefox 3.6.8
>            Reporter: Taras Puchko
>
> I have two documents in my database: "a b" and "a+b". The first can be retrieved via "/mydb/a%20b" and the second via "/mydb/a%2Bb".
> When I enter "/mydb/a b" in the browser it automatically encodes it so the correct document is returned. But when I enter "/mydb/a+b" the URL is sent intact since "+" is a valid character in a path segment according to [1]. The problem is that "GET /mydb/a+b" makes CouchDB return the document with id "a b" and not the intended one, which is against the URI spec .
> For an informal description of URL encoding one may refer to [2].
> [1]: http://www.ietf.org/rfc/rfc2396.txt
> [2]: http://www.lunatech-research.com/archives/2009/02/03/what-every-web-developer-must-know-about-url-encoding

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (COUCHDB-883) Wrong document returned due to incorrect URL decoding

Posted by "Robert Newson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Newson closed COUCHDB-883.
---------------------------------

    Resolution: Not A Problem


This is expected behavior. The + is interpreted as a space according to the uri escaping rules. Use %2b if you want to keep the + symbol.


> Wrong document returned due to incorrect URL decoding
> -----------------------------------------------------
>
>                 Key: COUCHDB-883
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-883
>             Project: CouchDB
>          Issue Type: Bug
>          Components: HTTP Interface
>    Affects Versions: 1.0.1
>         Environment: Kubuntu 10.4, Firefox 3.6.8
>            Reporter: Taras Puchko
>
> I have two documents in my database: "a b" and "a+b". The first can be retrieved via "/mydb/a%20b" and the second via "/mydb/a%2Bb".
> When I enter "/mydb/a b" in the browser it automatically encodes it so the correct document is returned. But when I enter "/mydb/a+b" the URL is sent intact since "+" is a valid character in a path segment according to [1]. The problem is that "GET /mydb/a+b" makes CouchDB return the document with id "a b" and not the intended one, which is against the URI spec .
> For an informal description of URL encoding one may refer to [2].
> [1]: http://www.ietf.org/rfc/rfc2396.txt
> [2]: http://www.lunatech-research.com/archives/2009/02/03/what-every-web-developer-must-know-about-url-encoding

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (COUCHDB-883) Wrong document returned due to incorrect URL decoding

Posted by "Muharem Hrnjadovic (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907868#action_12907868 ] 

Muharem Hrnjadovic commented on COUCHDB-883:
--------------------------------------------

FWIW, an URL like http://localhost/a+b is left alone by apache2 i.e. I see the following entry in /var/log/apache2/access.log:

127.0.0.1 - - [10/Sep/2010:05:13:30 +0200] "GET /a+b HTTP/1.1" 200 294 "-"

Also, a file with that name (a+b) is served correctly.

> Wrong document returned due to incorrect URL decoding
> -----------------------------------------------------
>
>                 Key: COUCHDB-883
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-883
>             Project: CouchDB
>          Issue Type: Bug
>          Components: HTTP Interface
>    Affects Versions: 1.0.1
>         Environment: Kubuntu 10.4, Firefox 3.6.8
>            Reporter: Taras Puchko
>
> I have two documents in my database: "a b" and "a+b". The first can be retrieved via "/mydb/a%20b" and the second via "/mydb/a%2Bb".
> When I enter "/mydb/a b" in the browser it automatically encodes it so the correct document is returned. But when I enter "/mydb/a+b" the URL is sent intact since "+" is a valid character in a path segment according to [1]. The problem is that "GET /mydb/a+b" makes CouchDB return the document with id "a b" and not the intended one, which is against the URI spec .
> For an informal description of URL encoding one may refer to [2].
> [1]: http://www.ietf.org/rfc/rfc2396.txt
> [2]: http://www.lunatech-research.com/archives/2009/02/03/what-every-web-developer-must-know-about-url-encoding

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (COUCHDB-883) Wrong document returned due to incorrect URL decoding

Posted by "Muharem Hrnjadovic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/COUCHDB-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Muharem Hrnjadovic updated COUCHDB-883:
---------------------------------------

    Attachment: logging.diff

I added some logging statements to find out where the a+b -> a b conversion takes place and came to realize that it happens in handle_request()  (src/couchdb/couch_httpd.erl, line 237) after the 'requested_path_parts' and 'path_parts' are mangled through couch_httpd:unquote() which in turn calls mochiweb_util:unquote().

A quick experiment confirms that:


$ erl -pz $HOME/src/couchdb/src/mochiweb
Erlang R14A (erts-5.8) [source] [64-bit] [smp:2:2] [rq:2] [async-threads:0] [kernel-poll:false]

Eshell V5.8  (abort with ^G)
1> mochiweb_util:unquote("a+b")
1> .
"a b"
2> 


> Wrong document returned due to incorrect URL decoding
> -----------------------------------------------------
>
>                 Key: COUCHDB-883
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-883
>             Project: CouchDB
>          Issue Type: Bug
>          Components: HTTP Interface
>    Affects Versions: 1.0.1
>         Environment: Kubuntu 10.4, Firefox 3.6.8
>            Reporter: Taras Puchko
>         Attachments: logging.diff
>
>
> I have two documents in my database: "a b" and "a+b". The first can be retrieved via "/mydb/a%20b" and the second via "/mydb/a%2Bb".
> When I enter "/mydb/a b" in the browser it automatically encodes it so the correct document is returned. But when I enter "/mydb/a+b" the URL is sent intact since "+" is a valid character in a path segment according to [1]. The problem is that "GET /mydb/a+b" makes CouchDB return the document with id "a b" and not the intended one, which is against the URI spec .
> For an informal description of URL encoding one may refer to [2].
> [1]: http://www.ietf.org/rfc/rfc2396.txt
> [2]: http://www.lunatech-research.com/archives/2009/02/03/what-every-web-developer-must-know-about-url-encoding

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.