You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Steven Phillips (JIRA)" <ji...@apache.org> on 2016/03/31 00:26:25 UTC
[jira] [Commented] (DRILL-4558) When a query returns diacritics in
a string, the string is cut
[ https://issues.apache.org/jira/browse/DRILL-4558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218975#comment-15218975 ]
Steven Phillips commented on DRILL-4558:
----------------------------------------
This looks like a problem in the BsonRecordReader:
{code}
private void writeString(String readString, final MapOrListWriterImpl writer, String fieldName, boolean isList) {
final int length = readString.length();
final VarCharHolder vh = new VarCharHolder();
ensure(length);
try {
workBuf.setBytes(0, readString.getBytes("UTF-8"));
} catch (UnsupportedEncodingException e) {
throw new DrillRuntimeException("Unable to read string value for field: " + fieldName, e);
}
vh.buffer = workBuf;
vh.start = 0;
vh.end = length;
if (isList == false) {
writer.varChar(fieldName).write(vh);
} else {
writer.list.varChar().write(vh);
}
}
{code}
the length variable should be the length of the byte array, not the length of the String.
A quick work-around would be to disable the bson reader:
set store.mongo.bson.record.reader = false;
> When a query returns diacritics in a string, the string is cut
> --------------------------------------------------------------
>
> Key: DRILL-4558
> URL: https://issues.apache.org/jira/browse/DRILL-4558
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - MongoDB
> Environment: Apache Drill 1.6
> MongoDB 3.2.1
> Reporter: Vincent Uribe
>
> With the given document in a collection "Test" from a database testDb :
> {
> "_id" : ObjectId("56e7f1bd0944228aab06d0e2"),
> "ID_ATTRIBUT" : "3",
> "VAL_ATTRIBUT" : "Végétaux",
> "UPDATED" : ISODate("2016-01-09T23:00:00.000Z")
> }
> When querying select * from mongoStorage.testDb.Test I get
> _id: [B@affb65
> ID_ATTRIBUT: 3
> VAL_ATTRIBUT: *Végéta*
> UPDATED: 2016-01-09T23:00:00.000Z
> As you can see, the two 'é' cut the string "végétaux" by 2 characters, giving végéta.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)