You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by gi...@git.apache.org on 2017/07/26 10:01:49 UTC

[GitHub] lokedhs opened a new issue #715: Non-BMP text gets corrupted when inserted into a view

lokedhs opened a new issue #715: Non-BMP text gets corrupted when inserted into a view
URL: https://github.com/apache/couchdb/issues/715
 
 
   <!--- Provide a general summary of the issue in the Title above -->
   
   When a document contains a text field with characters that are outside of the BMP (Unicode code points greater than 0xFFFF), those characters gets corrupted when copied into a view.
   
   This is a regression since 1.6.0.
   
   ## Expected Behavior
   <!--- If you're describing a bug, tell us what should happen -->
   <!--- If you're suggesting a change/improvement, tell us how it should work -->
   
   I expect the same string to be visible in the view as in the original document.
   
   ## Current Behavior
   <!--- If describing a bug, tell us what happens instead of the expected behavior -->
   <!--- If suggesting a change/improvement, explain the difference from current behavior -->
   
   If a string contains a non-BMP character in the source document, the view gets the same character followed by a U+FFFD REPLACEMENT CHARACTER.
   
   ## Steps to Reproduce (for bugs)
   <!--- Provide a link to a live example, or an unambiguous set of steps to -->
   <!--- reproduce this bug. Include code to reproduce, if relevant -->
   
   Create a view with the following definition:
   
   ```
   function (doc) {
     if(doc.type==='encoding_test') {
       emit(doc._id, doc.text);
     }
   }
   ```
   
   Then insert the following document:
   
   ```
   {
       "type": "encoding_test",
       "text": "this is a smiley face: ?."
   }
   ```
   
   Then, look at the view that was defined above, and note how the U+FFFD REPLACEMENT CHARACTER was added.
   
   ## Context
   <!--- How has this issue affected you? What are you trying to accomplish? -->
   <!--- Providing context helps us come up with a solution that is most useful in the real world -->
   
   I'm using CouchDB as the backend to a web application, and when using CouchDB 2.0.0, a lot of characters are displayed wrongly.
   
   ## Your Environment
   <!--- Include as many relevant details about the environment you experienced the bug in -->
   * CouchDB 2.0.0
   * Arch Linux 4.11.9-1-ARCH
   * https://github.com/cicakhq/potato
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services