You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by "Jim Klo (Created) (JIRA)" <ji...@apache.org> on 2012/02/27 21:34:46 UTC

[jira] [Created] (COUCHDB-1425) Emitting UTF-8 chars >= 0xD800 in JS map stops design doc from indexing

Emitting UTF-8 chars >= 0xD800 in JS map stops design doc from indexing
-----------------------------------------------------------------------

                 Key: COUCHDB-1425
                 URL: https://issues.apache.org/jira/browse/COUCHDB-1425
             Project: CouchDB
          Issue Type: Bug
          Components: JavaScript View Server
    Affects Versions: 1.1.1
         Environment: Mac OS 10.6.8, but not sure that matters.
            Reporter: Jim Klo


Was trying determine UTF-8 Char collation, using the following Gist: https://gist.github.com/1904807

It turns out that once the view gets to the document that would emit "\uD800", the view server times out and stops indexing that design document.

This seems like a bug, since I can 'store' a document with UTF-8 chars >= 0xD800, but one cannot emit a key with that char in the string.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COUCHDB-1425) Emitting UTF-8 chars >= 0xD800 in JS map stops design doc from indexing

Posted by "Jim Klo (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217919#comment-13217919 ] 

Jim Klo commented on COUCHDB-1425:
----------------------------------

I actually went back and set my timeout pretty high (90 sec). And yes it just times out. 

The python script doesn't fail. That's what's odd. I don't know enough about surrogate pairs, but it would seem even if they are being used wrong, they are wrong on insert too. Hence I'd expect an error or problem on insert AND index/retrieval. 

No actual error is logged other than timeout. The real problem is that if the data is poorly formatted, and emitting the data as a key in JS causes the view server to stall and prevent docs of higher local sequence to not get indexed. 

It's certainly a great way to DOS an App since CouchDB doesn't balk at inserting it. 
                
> Emitting UTF-8 chars >= 0xD800 in JS map stops design doc from indexing
> -----------------------------------------------------------------------
>
>                 Key: COUCHDB-1425
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1425
>             Project: CouchDB
>          Issue Type: Bug
>          Components: JavaScript View Server
>    Affects Versions: 1.1.1
>         Environment: Mac OS 10.6.8, but not sure that matters.
>            Reporter: Jim Klo
>
> Was trying determine UTF-8 Char collation, using the following Gist: https://gist.github.com/1904807
> It turns out that once the view gets to the document that would emit "\uD800", the view server times out and stops indexing that design document.
> This seems like a bug, since I can 'store' a document with UTF-8 chars >= 0xD800, but one cannot emit a key with that char in the string.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (COUCHDB-1425) Emitting UTF-8 chars >= 0xD800 in JS map stops design doc from indexing

Posted by "Alexander Shorin (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/COUCHDB-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217857#comment-13217857 ] 

Alexander Shorin commented on COUCHDB-1425:
-------------------------------------------

Javascript servers fail with OS timeout error for me(25 sec), while I couldn't reproduce problem with Python one. I suppose the problem depends on Javascript internal operations with unicode or maybe deeper in Spidermonkey, haven't tested this case.

AFAIK, characters with code in range 0xD800..0xDFFF are surrogate one and could not be encoded/decoded properly if they are alone. There is term `surrogate pairs` - the combination of two surrogate characters (with high and low code) that are represented as single char. 
Would emit single surrogate character be just invalid usage of Unicode standard?

                
> Emitting UTF-8 chars >= 0xD800 in JS map stops design doc from indexing
> -----------------------------------------------------------------------
>
>                 Key: COUCHDB-1425
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1425
>             Project: CouchDB
>          Issue Type: Bug
>          Components: JavaScript View Server
>    Affects Versions: 1.1.1
>         Environment: Mac OS 10.6.8, but not sure that matters.
>            Reporter: Jim Klo
>
> Was trying determine UTF-8 Char collation, using the following Gist: https://gist.github.com/1904807
> It turns out that once the view gets to the document that would emit "\uD800", the view server times out and stops indexing that design document.
> This seems like a bug, since I can 'store' a document with UTF-8 chars >= 0xD800, but one cannot emit a key with that char in the string.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira