You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by "Sam Rijs (JIRA)" <ji...@apache.org> on 2013/02/25 18:14:13 UTC

[jira] [Commented] (COUCHDB-1425) Emitting UTF-8 chars >= 0xD800 in JS map stops design doc from indexing

    [ https://issues.apache.org/jira/browse/COUCHDB-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586003#comment-13586003 ] 

Sam Rijs commented on COUCHDB-1425:
-----------------------------------

I have investigated this issue and found the source of the timeout.

Internally, the CouchDB View Server maps over a chunk of documents
at once and returns their emit-result to the Erlang Process collectively
in a JSON array.

The print function uses enc_string, which, when encountering an invalid
unicode string (as we have here), returns an error correctly.
However, the print function just exits when an error is returned from
enc_string, and prints nothing at all.

This means that whenever the emitted data is invalid unicode,
the Erlang Process, having received nothing assumes a timeout,
while the View Server is fully responsive but just doesn't transmit anything.

The question in now, were to fix this? Should Couch.toJSON throw an error when
it encounters an invalid JSON string. Should a view really fail when there
are malformed documents? Or should we just sanitize the emit-result in some way?
                
> Emitting UTF-8 chars >= 0xD800 in JS map stops design doc from indexing
> -----------------------------------------------------------------------
>
>                 Key: COUCHDB-1425
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1425
>             Project: CouchDB
>          Issue Type: Bug
>          Components: JavaScript View Server
>    Affects Versions: 1.1.1
>         Environment: Mac OS 10.6.8, but not sure that matters.
>            Reporter: Jim Klo
>
> Was trying determine UTF-8 Char collation, using the following Gist: https://gist.github.com/1904807
> It turns out that once the view gets to the document that would emit "\uD800", the view server times out and stops indexing that design document.
> This seems like a bug, since I can 'store' a document with UTF-8 chars >= 0xD800, but one cannot emit a key with that char in the string.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira