You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by "Fedor Indutny (JIRA)" <ji...@apache.org> on 2011/02/03 17:06:29 UTC
[jira] Created: (COUCHDB-1057) Wrong JSON parser behavior on
escaped unicode characters
Wrong JSON parser behavior on escaped unicode characters
--------------------------------------------------------
Key: COUCHDB-1057
URL: https://issues.apache.org/jira/browse/COUCHDB-1057
Project: CouchDB
Issue Type: Bug
Components: Database Core
Affects Versions: 1.0
Environment: Ubuntu 10.10
Doesn't matter
Reporter: Fedor Indutny
Try to save following doc to couchdb:
{ "_id" : "json-test", "test": "\u0080-\uffff"}
And then put it to the database:
curl -X PUT -d @1.json --basic --user admin:admin -H "Content-Type: application/json" http://couchdb:5984/tadagraph/json-test
You'll get error:
{"error":"bad_request","reason":"invalid UTF-8 JSON"}
jsonlint ( http://www.jsonlint.com/ ) says that it's a valid JSON
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (COUCHDB-1057) Wrong JSON parser behavior on
escaped unicode characters
Posted by "Fedor Indutny (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/COUCHDB-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990177#comment-12990177 ]
Fedor Indutny commented on COUCHDB-1057:
----------------------------------------
http://www.ietf.org/rfc/rfc4627.txt
2.5. Strings
...
Any character may be escaped. If the character is in the Basic
Multilingual Plane (U+0000 through U+FFFF), then it may be
represented as a six-character sequence: a reverse solidus, followed
by the lowercase letter u, followed by four hexadecimal digits that
encode the character's code point. The hexadecimal letters A though
F can be upper or lowercase. So, for example, a string containing
only a single reverse solidus character may be represented as
"\u005C".
...
Looks like (U+0000 through U+FFFF) is declared valid for JSON in RFC
> Wrong JSON parser behavior on escaped unicode characters
> --------------------------------------------------------
>
> Key: COUCHDB-1057
> URL: https://issues.apache.org/jira/browse/COUCHDB-1057
> Project: CouchDB
> Issue Type: Bug
> Components: Database Core
> Affects Versions: 1.0
> Environment: Ubuntu 10.10
> Doesn't matter
> Reporter: Fedor Indutny
>
> Try to save following doc to couchdb:
> { "_id" : "json-test", "test": "\u0080-\uffff"}
> And then put it to the database:
> curl -X PUT -d @1.json --basic --user admin:admin -H "Content-Type: application/json" http://couchdb:5984/tadagraph/json-test
> You'll get error:
> {"error":"bad_request","reason":"invalid UTF-8 JSON"}
> jsonlint ( http://www.jsonlint.com/ ) says that it's a valid JSON
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Closed: (COUCHDB-1057) Wrong JSON parser behavior on escaped
unicode characters
Posted by "Paul Joseph Davis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/COUCHDB-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Joseph Davis closed COUCHDB-1057.
--------------------------------------
Resolution: Won't Fix
But wikipedia says its not:
http://en.wikipedia.org/wiki/UTF-8
Specifically, \uFFFF is an invalid code point and I reject Crockford's crazy delusional world that says all strings in every language should be implemented as unsigned 16 bit integers.
> Wrong JSON parser behavior on escaped unicode characters
> --------------------------------------------------------
>
> Key: COUCHDB-1057
> URL: https://issues.apache.org/jira/browse/COUCHDB-1057
> Project: CouchDB
> Issue Type: Bug
> Components: Database Core
> Affects Versions: 1.0
> Environment: Ubuntu 10.10
> Doesn't matter
> Reporter: Fedor Indutny
>
> Try to save following doc to couchdb:
> { "_id" : "json-test", "test": "\u0080-\uffff"}
> And then put it to the database:
> curl -X PUT -d @1.json --basic --user admin:admin -H "Content-Type: application/json" http://couchdb:5984/tadagraph/json-test
> You'll get error:
> {"error":"bad_request","reason":"invalid UTF-8 JSON"}
> jsonlint ( http://www.jsonlint.com/ ) says that it's a valid JSON
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (COUCHDB-1057) Wrong JSON parser behavior on
escaped unicode characters
Posted by "Paul Joseph Davis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/COUCHDB-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990205#comment-12990205 ]
Paul Joseph Davis commented on COUCHDB-1057:
--------------------------------------------
Yeah, that's the part of the spec that uses a broken assumption of 16 bit integers for representing string data. Of other interest is that we also reject invalid surrogate pairs which the spec doesn't even mention.
> Wrong JSON parser behavior on escaped unicode characters
> --------------------------------------------------------
>
> Key: COUCHDB-1057
> URL: https://issues.apache.org/jira/browse/COUCHDB-1057
> Project: CouchDB
> Issue Type: Bug
> Components: Database Core
> Affects Versions: 1.0
> Environment: Ubuntu 10.10
> Doesn't matter
> Reporter: Fedor Indutny
>
> Try to save following doc to couchdb:
> { "_id" : "json-test", "test": "\u0080-\uffff"}
> And then put it to the database:
> curl -X PUT -d @1.json --basic --user admin:admin -H "Content-Type: application/json" http://couchdb:5984/tadagraph/json-test
> You'll get error:
> {"error":"bad_request","reason":"invalid UTF-8 JSON"}
> jsonlint ( http://www.jsonlint.com/ ) says that it's a valid JSON
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (COUCHDB-1057) Wrong JSON parser behavior on
escaped unicode characters
Posted by "Paul Joseph Davis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/COUCHDB-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990250#comment-12990250 ]
Paul Joseph Davis commented on COUCHDB-1057:
--------------------------------------------
Also, I realized I should probably give more background on this instead of just getting irritated with that spec again.
The underlying issue is that CouchDB stores all of its JSON strings as UTF-8, which means that all code points we recognize in the input is required to be representable as UTF-8. As you see in the JSON spec, there wasn't much foresight into what constitutes a valid Unicode code point. This means that the JSON spec allows for things that aren't representable as UTF-8 via unicode escapes.
When I asked about the issue on the es5-discuss list I was actually told that JSON requires strings to be stored as 16 bit integers (hence why I'm so fond of repeating that). Yeah, I was actually told that JSON supposedly requires a specific string implementation. Seeing as how JSON is widely characterized as a ubiquitous exchange format, I promptly rejected that assertion and haven't been overly motivated to relax our enforcement of valid Unicode code points.
If someone wants to write a patch that carries invalid escapes through the system I'd probably be ok with that, though I think we tried once and it gummed up something somewhere else.
> Wrong JSON parser behavior on escaped unicode characters
> --------------------------------------------------------
>
> Key: COUCHDB-1057
> URL: https://issues.apache.org/jira/browse/COUCHDB-1057
> Project: CouchDB
> Issue Type: Bug
> Components: Database Core
> Affects Versions: 1.0
> Environment: Ubuntu 10.10
> Doesn't matter
> Reporter: Fedor Indutny
>
> Try to save following doc to couchdb:
> { "_id" : "json-test", "test": "\u0080-\uffff"}
> And then put it to the database:
> curl -X PUT -d @1.json --basic --user admin:admin -H "Content-Type: application/json" http://couchdb:5984/tadagraph/json-test
> You'll get error:
> {"error":"bad_request","reason":"invalid UTF-8 JSON"}
> jsonlint ( http://www.jsonlint.com/ ) says that it's a valid JSON
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira