You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "xsys (Jira)" <ji...@apache.org> on 2022/09/09 23:36:00 UTC
[jira] [Updated] (HIVE-26531) UnsupportedOperationException while creating table in Avro format if column schema contains MAP with INTEGER key
[ https://issues.apache.org/jira/browse/HIVE-26531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
xsys updated HIVE-26531:
------------------------
Description:
h3. Describe the bug
We are trying to save a table with the {{Avro}} data format through {{{}spark-sql{}}}. The table contains {{MAP}} as part of the schema and the map's key is an {{{}INT{}}}: {{{}MAP<INT, STRING>{}}}. We observe the following exception from the {{CREATE TABLE}} query:
{noformat}
22/08/29 12:03:38 ERROR Table: Unable to get field from serde: org.apache.hadoop.hive.serde2.avro.AvroSerDe java.lang.UnsupportedOperationException: Key of Map can only be a String{noformat}
_Here is the full stack trace, for reference:_ [^Avro_Map_StackTrace.txt]
The exception is raised by the following [Hive code|https://github.com/apache/hive/blob/rel/release-3.1.2/serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java#L216-L221]:
{noformat}
private Schema createAvroMap(TypeInfo typeInfo) {
TypeInfo keyTypeInfo = ((MapTypeInfo) typeInfo).getMapKeyTypeInfo();
if (((PrimitiveTypeInfo) keyTypeInfo).getPrimitiveCategory()
!= PrimitiveObjectInspector.PrimitiveCategory.STRING) {
throw new UnsupportedOperationException("Key of Map can only be a String");
}
TypeInfo valueTypeInfo = ((MapTypeInfo) typeInfo).getMapValueTypeInfo();
Schema valueSchema = createAvroSchema(valueTypeInfo);
return Schema.createMap(valueSchema);
}{noformat}
h3. To Reproduce
On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{spark-shell}} with the Avro package:
{noformat}
$SPARK_HOME/bin/spark-sql --packages org.apache.spark:spark-avro_2.12:3.2.1
{noformat}
Execute the following:
{noformat}
create table avro_map(c1 MAP<INT, STRING>) ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.avro.AvroSerDe" STORED AS INPUTFORMAT "org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat" OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat";{noformat}
h3. Expected behavior
We expect to create a table successfully in Avro format if the column schema contains MAP with INTEGER key. We tried other formats like Parquet & ORC and the outcome is consistent with this expectation.
Here is a simplified example showing expected behavior using the Parquet & ORC file formats:
{noformat}
spark-sql> create table orc_map(c1 MAP<INT, STRING>) STORED AS ORC;
Time taken: 0.196 seconds
spark-sql> create table parquet_map(c1 MAP<INT, STRING>) STORED AS PARQUET;
Time taken: 0.113 seconds
spark-sql> desc orc_map;
c1 map<int,string>
Time taken: 0.387 seconds, Fetched 1 row(s)
spark-sql> desc parquet_map;
c1 map<int,string>
Time taken: 0.077 seconds, Fetched 1 row(s){noformat}
was:
h3. Describe the bug
We are trying to save a table with the {{Avro}} data format through {{{}spark-sql{}}}. The table contains {{MAP}} as part of the schema and the map's key is an {{{}INT{}}}: {{{}MAP<INT, STRING>{}}}. We observe the following exception from the {{CREATE TABLE}} query:
{noformat}
22/08/29 12:03:38 ERROR Table: Unable to get field from serde: org.apache.hadoop.hive.serde2.avro.AvroSerDe java.lang.UnsupportedOperationException: Key of Map can only be a String{noformat}
_Here is the full stack trace, for reference:_ [^Avro_Map_StackTrace.txt]
The exception is raised by the following [Hive code|https://github.com/apache/hive/blob/rel/release-3.1.2/serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java#L216-L221]:
{noformat}
private Schema createAvroMap(TypeInfo typeInfo) {
TypeInfo keyTypeInfo = ((MapTypeInfo) typeInfo).getMapKeyTypeInfo();
if (((PrimitiveTypeInfo) keyTypeInfo).getPrimitiveCategory()
!= PrimitiveObjectInspector.PrimitiveCategory.STRING) {
throw new UnsupportedOperationException("Key of Map can only be a String");
}
TypeInfo valueTypeInfo = ((MapTypeInfo) typeInfo).getMapValueTypeInfo();
Schema valueSchema = createAvroSchema(valueTypeInfo);
return Schema.createMap(valueSchema);
}{noformat}
h3. To Reproduce
On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{spark-shell}} with the Avro package:
{noformat}
$SPARK_HOME/bin/spark-sql --packages org.apache.spark:spark-avro_2.12:3.2.1
{noformat}
Execute the following:
{noformat}
create table avro_map(c1 MAP<INT, STRING>) ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.avro.AvroSerDe" STORED AS INPUTFORMAT "org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat" OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat";{noformat}
h3. Expected behavior
We expect to create a table successfully in Avro format if the column schema contains MAP with INTEGER key. We tried other formats like Parquet & ORC and the outcome is consistent with this expectation.
Here is a simplified example showing expected behavior using the Parquet & ORC file formats:
{noformat}
spark-sql> create table orc_map(c1 MAP<INT, STRING>) STORED AS ORC;
Time taken: 0.196 seconds
spark-sql> create table parquet_map(c1 MAP<INT, STRING>) STORED AS PARQUET;
Time taken: 0.113 seconds
spark-sql> desc orc_map;
c1 map<int,string>
Time taken: 0.387 seconds, Fetched 1 row(s)
spark-sql> desc parquet_map;
c1 map<int,string>
Time taken: 0.077 seconds, Fetched 1 row(s){noformat}
> UnsupportedOperationException while creating table in Avro format if column schema contains MAP with INTEGER key
> ----------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-26531
> URL: https://issues.apache.org/jira/browse/HIVE-26531
> Project: Hive
> Issue Type: Bug
> Components: Serializers/Deserializers
> Affects Versions: 3.1.2
> Reporter: xsys
> Priority: Major
> Attachments: Avro_Map_StackTrace.txt
>
>
> h3. Describe the bug
> We are trying to save a table with the {{Avro}} data format through {{{}spark-sql{}}}. The table contains {{MAP}} as part of the schema and the map's key is an {{{}INT{}}}: {{{}MAP<INT, STRING>{}}}. We observe the following exception from the {{CREATE TABLE}} query:
> {noformat}
> 22/08/29 12:03:38 ERROR Table: Unable to get field from serde: org.apache.hadoop.hive.serde2.avro.AvroSerDe java.lang.UnsupportedOperationException: Key of Map can only be a String{noformat}
> _Here is the full stack trace, for reference:_ [^Avro_Map_StackTrace.txt]
> The exception is raised by the following [Hive code|https://github.com/apache/hive/blob/rel/release-3.1.2/serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java#L216-L221]:
> {noformat}
> private Schema createAvroMap(TypeInfo typeInfo) {
> TypeInfo keyTypeInfo = ((MapTypeInfo) typeInfo).getMapKeyTypeInfo();
> if (((PrimitiveTypeInfo) keyTypeInfo).getPrimitiveCategory()
> != PrimitiveObjectInspector.PrimitiveCategory.STRING) {
> throw new UnsupportedOperationException("Key of Map can only be a String");
> }
> TypeInfo valueTypeInfo = ((MapTypeInfo) typeInfo).getMapValueTypeInfo();
> Schema valueSchema = createAvroSchema(valueTypeInfo);
> return Schema.createMap(valueSchema);
> }{noformat}
>
> h3. To Reproduce
> On Spark 3.2.1 (commit {{{}4f25b3f712{}}}), using {{spark-shell}} with the Avro package:
> {noformat}
> $SPARK_HOME/bin/spark-sql --packages org.apache.spark:spark-avro_2.12:3.2.1
> {noformat}
> Execute the following:
> {noformat}
> create table avro_map(c1 MAP<INT, STRING>) ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.avro.AvroSerDe" STORED AS INPUTFORMAT "org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat" OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat";{noformat}
> h3. Expected behavior
> We expect to create a table successfully in Avro format if the column schema contains MAP with INTEGER key. We tried other formats like Parquet & ORC and the outcome is consistent with this expectation.
> Here is a simplified example showing expected behavior using the Parquet & ORC file formats:
> {noformat}
> spark-sql> create table orc_map(c1 MAP<INT, STRING>) STORED AS ORC;
> Time taken: 0.196 seconds
> spark-sql> create table parquet_map(c1 MAP<INT, STRING>) STORED AS PARQUET;
> Time taken: 0.113 seconds
> spark-sql> desc orc_map;
> c1 map<int,string>
> Time taken: 0.387 seconds, Fetched 1 row(s)
> spark-sql> desc parquet_map;
> c1 map<int,string>
> Time taken: 0.077 seconds, Fetched 1 row(s){noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)