You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Mohammad Islam <mi...@yahoo.com.INVALID> on 2015/10/16 20:10:47 UTC
List type to parquet schema
(sorry for being too specific!)
Hi,
What is the good and consistent way of converting a List into a parquet schema?
Let's take an example:
My type is : array<struct<latitude:double,longitude:double>>.
Same type from parquet is giving me this schema:
optional group locations (LIST) {
repeated group element {
required double latitude;
required double longitude;
}
}
But, in Hive, the same is converted to :
optional group locations (LIST) {
repeated group bag {
optional group array_element {
optional double latitude;
optional double longitude;
}
}
}
Looks like, Hive is adding a new layer called "bag" into it. I want both to be in the same schema so that it is easier to compare the schema type evolution.
My question is : should we modify one (preferably the Hive side)? If yes, do you have any suggestion?
The corresponding code in Hive :
// An optional group containing a repeated anonymous group "bag", containing
// 1 anonymous element "array_element"
private static GroupType convertArrayType(final String name, final ListTypeInfo typeInfo) {
final TypeInfo subType = typeInfo.getListElementTypeInfo();
return listWrapper(name, OriginalType.LIST, new GroupType(Repetition.REPEATED,
ParquetHiveSerDe.ARRAY.toString(), convertType("array_element", subType)));
}
Regards,
Mohammad
Source file:
org.apache.hadoop.hive.ql.io.parquet.convert.
HiveSchemaConverter (
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java).
List type to parquet schema
Posted by Mohammad Islam <mi...@yahoo.com.INVALID>.
(sorry for being too specific!)
Hi,
What is the good and consistent way of converting a List into a parquet schema?
Let's take an example:
My type is : array<struct<latitude:double,longitude:double>>.
Same type from parquet is giving me this schema:
optional group locations (LIST) {
repeated group element {
required double latitude;
required double longitude;
}
}
But, in Hive, the same is converted to :
optional group locations (LIST) {
repeated group bag {
optional group array_element {
optional double latitude;
optional double longitude;
}
}
}
Looks like, Hive is adding a new layer called "bag" into it. I want both to be in the same schema so that it is easier to compare the schema type evolution.
My question is : should we modify one (preferably the Hive side)? If yes, do you have any suggestion?
The corresponding code in Hive :
// An optional group containing a repeated anonymous group "bag", containing
// 1 anonymous element "array_element"
private static GroupType convertArrayType(final String name, final ListTypeInfo typeInfo) {
final TypeInfo subType = typeInfo.getListElementTypeInfo();
return listWrapper(name, OriginalType.LIST, new GroupType(Repetition.REPEATED,
ParquetHiveSerDe.ARRAY.toString(), convertType("array_element", subType)));
}
Regards,
Mohammad
Source file:
org.apache.hadoop.hive.ql.io.parquet.convert.
HiveSchemaConverter (
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java).