You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Steven Phillips (JIRA)" <ji...@apache.org> on 2015/01/13 22:51:34 UTC
[jira] [Commented] (DRILL-1997) Hive generated parquet files with
maps containing strings return wrong value
[ https://issues.apache.org/jira/browse/DRILL-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276011#comment-14276011 ]
Steven Phillips commented on DRILL-1997:
----------------------------------------
I actually think this is correct. The schema of the file:
message hive_schema {
optional int32 c1;
optional boolean c2;
optional double c3;
optional binary c4;
optional group c5 (LIST) {
repeated group bag {
optional int32 array_element;
}
}
optional group c6 (MAP) {
repeated group map (MAP_KEY_VALUE) {
required int32 key;
optional binary value;
}
}
optional group c7 (MAP) {
repeated group map (MAP_KEY_VALUE) {
required binary key;
optional binary value;
}
}
optional group c8 {
optional binary r;
optional int32 s;
optional double t;
}
optional int32 c9;
optional int32 c10;
optional float c11;
optional int64 c12;
optional group c13 (LIST) {
repeated group bag {
optional group array_element (LIST) {
repeated group bag {
optional binary array_element;
}
}
}
}
optional group c15 {
optional int32 r;
optional group s {
optional int32 a;
optional binary b;
}
}
optional group c16 (LIST) {
repeated group bag {
optional group array_element {
optional group m (MAP) {
repeated group map (MAP_KEY_VALUE) {
required binary key;
optional binary value;
}
}
optional int32 n;
}
}
}
}
The string value in c6 is simply stored as binary, with no metadata indicating that it is UTF-8 encoded string. I think this indicates that hive currently does not support the utf-8 converted type. In sqlline, when displaying a complex object, we use json. And binary values are displayed as base64 in json.
> Hive generated parquet files with maps containing strings return wrong value
> ----------------------------------------------------------------------------
>
> Key: DRILL-1997
> URL: https://issues.apache.org/jira/browse/DRILL-1997
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet
> Reporter: Ramana Inukonda Nagaraj
> Assignee: Parth Chandra
> Priority: Critical
> Attachments: hive_alltypes.parquet
>
>
> Created a parquet file in hive having the following DDL
> hive> desc alltypesparquet;
> OK
> c1 int
> c2 boolean
> c3 double
> c4 string
> c5 array<int>
> c6 map<int,string>
> c7 map<string,string>
> c8 struct<r:string,s:int,t:double>
> c9 tinyint
> c10 smallint
> c11 float
> c12 bigint
> c13 array<array<string>>
> c15 struct<r:int,s:struct<a:int,b:string>>
> c16 array<struct<m:map<string,string>,n:int>>
> Time taken: 0.076 seconds, Fetched: 15 row(s)
> All the complex types with string in them are returning incorrect values in drill. For example:
> hive> select c6 from alltypesparquet;
> NULL
> NULL
> {1:"x",2:"y"}
> 0: jdbc:drill:> select c6 from `/user/hive/warehouse/alltypesparquet`;
> +------------+
> | c6 |
> +------------+
> | {"map":[]} |
> | {"map":[]} |
> | {"map":[{"key":1,"value":"eA=="},{"key":2,"value":"eQ=="}]} |
> +------------+
> 3 rows selected (0.077 seconds)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)