You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Phabricator (JIRA)" <ji...@apache.org> on 2013/03/18 22:29:17 UTC
[jira] [Updated] (HIVE-4199) ORC writer doesn't handle non-UTF8
encoded Text properly
[ https://issues.apache.org/jira/browse/HIVE-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Phabricator updated HIVE-4199:
------------------------------
Attachment: HIVE-4199.HIVE-4199.HIVE-4199.D9501.1.patch
sxyuan requested code review of "HIVE-4199 [jira] ORC writer doesn't handle non-UTF8 encoded Text properly".
Reviewers: kevinwilfong
StringTreeWriter currently converts fields stored as Text objects into Strings. This can lose information (see http://en.wikipedia.org/wiki/Replacement_character#Replacement_character), and is also unnecessary since the dictionary stores Text objects.
Instead, we can check whether Text or String is preferred and simply use the preferred class, converting only to String for the index stats.
TEST PLAN
Run unit tests, including new query. The join in the test query originally produces no results because of the bug.
REVISION DETAIL
https://reviews.facebook.net/D9501
AFFECTED FILES
data/files/nonutf8.txt
ql/src/test/results/clientpositive/orc_nonutf8.q.out
ql/src/test/queries/clientpositive/orc_nonutf8.q
ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java
ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
MANAGE HERALD RULES
https://reviews.facebook.net/herald/view/differential/
WHY DID I GET THIS EMAIL?
https://reviews.facebook.net/herald/transcript/22719/
To: kevinwilfong, sxyuan
Cc: JIRA
> ORC writer doesn't handle non-UTF8 encoded Text properly
> --------------------------------------------------------
>
> Key: HIVE-4199
> URL: https://issues.apache.org/jira/browse/HIVE-4199
> Project: Hive
> Issue Type: Bug
> Components: Serializers/Deserializers
> Reporter: Samuel Yuan
> Assignee: Samuel Yuan
> Priority: Minor
> Attachments: HIVE-4199.HIVE-4199.HIVE-4199.D9501.1.patch
>
>
> StringTreeWriter currently converts fields stored as Text objects into Strings. This can lose information (see http://en.wikipedia.org/wiki/Replacement_character#Replacement_character), and is also unnecessary since the dictionary stores Text objects.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira