You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Steven Phillips (JIRA)" <ji...@apache.org> on 2016/03/31 00:26:25 UTC

[jira] [Commented] (DRILL-4558) When a query returns diacritics in a string, the string is cut

    [ https://issues.apache.org/jira/browse/DRILL-4558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218975#comment-15218975 ] 

Steven Phillips commented on DRILL-4558:
----------------------------------------

This looks like a problem in the BsonRecordReader:

{code}
  private void writeString(String readString, final MapOrListWriterImpl writer, String fieldName, boolean isList) {
    final int length = readString.length();
    final VarCharHolder vh = new VarCharHolder();
    ensure(length);
    try {
      workBuf.setBytes(0, readString.getBytes("UTF-8"));
    } catch (UnsupportedEncodingException e) {
      throw new DrillRuntimeException("Unable to read string value for field: " + fieldName, e);
    }
    vh.buffer = workBuf;
    vh.start = 0;
    vh.end = length;
    if (isList == false) {
      writer.varChar(fieldName).write(vh);
    } else {
      writer.list.varChar().write(vh);
    }
  }
{code}

the length variable should be the length of the byte array, not the length of the String.

A quick work-around would be to disable the bson reader:

set store.mongo.bson.record.reader = false;

> When a query returns diacritics in a string, the string is cut
> --------------------------------------------------------------
>
>                 Key: DRILL-4558
>                 URL: https://issues.apache.org/jira/browse/DRILL-4558
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - MongoDB
>         Environment: Apache Drill 1.6
> MongoDB 3.2.1
>            Reporter: Vincent Uribe
>
> With the given document in a collection "Test" from a database testDb :
> {
>     "_id" : ObjectId("56e7f1bd0944228aab06d0e2"),
>     "ID_ATTRIBUT" : "3",
>     "VAL_ATTRIBUT" : "Végétaux",
>     "UPDATED" : ISODate("2016-01-09T23:00:00.000Z")
> }
> When querying select * from mongoStorage.testDb.Test I get 
> _id: [B@affb65
> ID_ATTRIBUT: 3
> VAL_ATTRIBUT: *Végéta*
> UPDATED: 2016-01-09T23:00:00.000Z
> As you can see, the two 'é' cut the string "végétaux" by 2 characters, giving végéta.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)