You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by "Noah Slater (JIRA)" <ji...@apache.org> on 2008/03/09 00:57:46 UTC
[jira] Created: (COUCHDB-18) Unicode document names
Unicode document names
----------------------
Key: COUCHDB-18
URL: https://issues.apache.org/jira/browse/COUCHDB-18
Project: CouchDB
Issue Type: Bug
Reporter: Noah Slater
Priority: Minor
The documentation at
http://www.couchdbwiki.com/index.php?title=HTTP_Doc_API#Valid_Document_Names
notes that valid document names (_id) are only [a-zA-Z0-9_] (for now). The
behaviour for non-ASCII names is not specified.
For now what happens is that CouchDB treats document names in a url like
iso-8859-1.. for ex. this is a dump of the HTTP traffic:
PUT /test/%D0%B0%D0%B0%D0%B0%D0%B0 HTTP/1.1
Host: localhost:5984
Accept-Encoding: identity
Content-Length: 40
content-type: application/json
accept: application/json
user-agent: couchdb-python 0.2
{"message": "the medium is the message"}
HTTP/1.1 201 Created
Server: inets/develop
Date: Sat, 05 Jan 2008 17:58:58 GMT
Cache-Control: no-cache
Pragma: no-cache
Expires: Sat, 05 Jan 2008 17:58:58 GMT
Transfer-Encoding: chunked
Content-Type: application/json
Etag: 3412642223
57
{"ok":true,"id":"\u00d0\u00b0\u00d0\u00b0\u00d0\u00b0\u00d0\u00b0","rev":"3412642223"}
0
The string %D0%B0%D0%B0%D0%B0%D0%B0 was converted to
"\u00d0\u00b0\u00d0\u00b0\u00d0\u00b0\u00d0\u00b0" ... while the intended
behavior was to get a Cyrillic (utf-8) document name.
Also couchdb-python and the javascript library that ships with couchdb
assume that utf-8 is used as an encoding of the unicode document names.
Everything tested with CouchDB 0.7.2
--
Forgot to add... the expected output would be "id":"\u0430\u0430\u0430\u0430" . for ex:
$ python
>>> unquote('%D0%B0%D0%B0%D0%B0%D0%B0').decode('utf-8')
u'\u0430\u0430\u0430\u0430'
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Closed: (COUCHDB-18) Unicode document names
Posted by "Chris Anderson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/COUCHDB-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Anderson closed COUCHDB-18.
---------------------------------
Resolution: Cannot Reproduce
This has been fixed for a while
> Unicode document names
> ----------------------
>
> Key: COUCHDB-18
> URL: https://issues.apache.org/jira/browse/COUCHDB-18
> Project: CouchDB
> Issue Type: Bug
> Reporter: Noah Slater
> Priority: Minor
>
> The documentation at
> http://www.couchdbwiki.com/index.php?title=HTTP_Doc_API#Valid_Document_Names
> notes that valid document names (_id) are only [a-zA-Z0-9_] (for now). The
> behaviour for non-ASCII names is not specified.
> For now what happens is that CouchDB treats document names in a url like
> iso-8859-1.. for ex. this is a dump of the HTTP traffic:
> PUT /test/%D0%B0%D0%B0%D0%B0%D0%B0 HTTP/1.1
> Host: localhost:5984
> Accept-Encoding: identity
> Content-Length: 40
> content-type: application/json
> accept: application/json
> user-agent: couchdb-python 0.2
> {"message": "the medium is the message"}
> HTTP/1.1 201 Created
> Server: inets/develop
> Date: Sat, 05 Jan 2008 17:58:58 GMT
> Cache-Control: no-cache
> Pragma: no-cache
> Expires: Sat, 05 Jan 2008 17:58:58 GMT
> Transfer-Encoding: chunked
> Content-Type: application/json
> Etag: 3412642223
> 57
> {"ok":true,"id":"\u00d0\u00b0\u00d0\u00b0\u00d0\u00b0\u00d0\u00b0","rev":"3412642223"}
> 0
> The string %D0%B0%D0%B0%D0%B0%D0%B0 was converted to
> "\u00d0\u00b0\u00d0\u00b0\u00d0\u00b0\u00d0\u00b0" ... while the intended
> behavior was to get a Cyrillic (utf-8) document name.
> Also couchdb-python and the javascript library that ships with couchdb
> assume that utf-8 is used as an encoding of the unicode document names.
> Everything tested with CouchDB 0.7.2
> --
> Forgot to add... the expected output would be "id":"\u0430\u0430\u0430\u0430" . for ex:
> $ python
> >>> unquote('%D0%B0%D0%B0%D0%B0%D0%B0').decode('utf-8')
> u'\u0430\u0430\u0430\u0430'
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.