You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by "Jan Lehnardt (JIRA)" <ji...@apache.org> on 2009/03/05 13:07:56 UTC

[jira] Updated: (COUCHDB-254) Non-Unicde characters in an attachment name render a document unreadable.

     [ https://issues.apache.org/jira/browse/COUCHDB-254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jan Lehnardt updated COUCHDB-254:
---------------------------------

    Attachment: COUCHDB-254.txt

This patch sends  a 400 Bad Request response with the reason "Attachment name is not UTF-8 encoded" when trying to save a document with an attachment which has non-utf-8 characters in the name. With test cases for inline-attachments, standalone attachments, bulk docs.

> Non-Unicde characters in an attachment name render a document unreadable.
> -------------------------------------------------------------------------
>
>                 Key: COUCHDB-254
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-254
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 0.9
>         Environment: Linux, erlang, 12b-5, couchdb r791265
>            Reporter: Maximillian Dornseif
>            Priority: Critical
>         Attachments: COUCHDB-254.txt
>
>
> Attatchment names containing nun unicode characters can be created easily because URI-s are (nearly) 8-bit clean. But when reading they are encoded into utf-8 which doesn't work out. So you are left with unreadable database entries.
> I was not able to generate invalid UTF-8 in JavaScript but a test case would look somewhat like this:
> --- couch_tests.js      2009-02-05 19:47:20.000000000 +0000
> +++ /usr/local/share/couchdb/www/script/couch_tests.js  2009-02-13 21:34:23.000000000 +0000
> @@ -1078,9 +1078,31 @@
>      var xhr = CouchDB.request("GET", "/test_suite_db/bin_doc4/attachment.txt");
>      T(xhr.status == 200);
>      T(xhr.responseText == "This is a string");
> -
>    },
>  
> +  attatchment_names : function(debug) {
> +    var db = new CouchDB("test_suite_db");
> +    db.deleteDb();
> +    db.createDb();
> +    if (debug) debugger;
> +
> +    var binAttDoc = {
> +      _id: "bin_doc",
> +      _attachments:{
> +        "foo\x80txt": {
> +          content_type:"text/plain",
> +          data: "VGhpcyBpcyBhIGJhc2U2NCBlbmNvZGVkIHRleHQ="
> +        }
> +      }
> +    }
> +
> +    var save_response = db.save(binAttDoc);
> +    T(save_response.ok);
> +
> +    var xhr = CouchDB.request("GET", "/test_suite_db/bin_doc\x80foo.txt");
> +    T(xhr.responseText == "This is a base64 encoded text");
> +},
> +
>    attachment_paths : function(debug) {
>      if (debug) debugger;
>      var dbNames = ["test_suite_db", "test_suite_db/with_slashes"];
> A python script (fuzzer?) for triggering the bug looks like this:
> import sys
> import couchdb.client
> COUCHSERVER = "http://localhost:5984"
> COUCHDB_NAME = "md_test"
> def _setup_couchdb():
>     """Get a connection handler to the CouchDB Database, creating it when needed."""
>     server = couchdb.client.Server(COUCHSERVER)
>     print "using %s/%s" % (COUCHSERVER, COUCHDB_NAME)
>     if COUCHDB_NAME in server:
>         return server[COUCHDB_NAME]
>     else:
>         return server.create(COUCHDB_NAME)
>     
> def main():
>     db = _setup_couchdb()
>     doc_id = "doc_id"
>     
>     try:
>         doc = db[doc_id]
>     except couchdb.client.ResourceNotFound:
>         doc = {}
>     
>     db[doc_id] = doc
>     for i in range(256):
>         char = chr(i)
>         name = "___%s___" % (char)
>         print "checking %r (%d) " % (char, i),
>         sys.stdout.flush()
>         db.put_attachment(db[doc_id], "data", name)
>         db[doc_id]
>         print '\r',
>     print 
> main()

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.