You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Sergio Peña (JIRA)" <ji...@apache.org> on 2014/10/30 18:20:34 UTC
[jira] [Commented] (HIVE-8419) Hive doesn't properly write NULL
values in Parquet files when the type is struct<...>.
[ https://issues.apache.org/jira/browse/HIVE-8419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190410#comment-14190410 ]
Sergio Peña commented on HIVE-8419:
-----------------------------------
Hi [~Akryus]
Could you attach the avro file (data + schema)?
Btw, This could have been fixed on HIVE-8359 where it now accepts null values on array elements.
> Hive doesn't properly write NULL values in Parquet files when the type is struct<...>.
> --------------------------------------------------------------------------------------
>
> Key: HIVE-8419
> URL: https://issues.apache.org/jira/browse/HIVE-8419
> Project: Hive
> Issue Type: Bug
> Components: File Formats
> Affects Versions: 0.13.1
> Reporter: Frédéric TERRAZZONI
>
> Hive doesn't seem to be able to write NULL values in a column of type "struct". Instead, it replaces them by empty objects (= non NULL objects containing only NULL values).
> Here is a short example demonstrating the issue. We start with a small Avro table "avro_table".
> {code} SELECT * from avro_table {code}
> || mycol ||
> || struct<field1:string,field2:double> ||
> | {"field1":"blabla","field2":1.0} |
> | {"field1":"blabla","field2":2.0} |
> | NULL |
> | {"field1":"blabla","field2":4.0} |
> | {"field1":"blabla","field2":5.0} |
> As you can see here, the third row contains a NULL cell. Then, let's copy it using Hive (INSERT OVERWRITE ...) into a Parquet table named "parquet_table".
> Finally, when you try to display it:
> {code} SELECT * from parquet_table {code}
> || mycol ||
> || struct<field1:string,field2:double> ||
> | {"field1":"blabla","field2":1.0} |
> | {"field1":"blabla","field2":2.0} |
> | {"field1":null,"field2":null} |
> | {"field1":"blabla","field2":4.0} |
> | {"field1":"blabla","field2":5.0} |
> I tried to generate a (correct) Parquet file using our software (Dataiku), and Hive had no problem reading null values, even when the column type was "struct".
> Consequently, I suspect the bug to be located in the Parquet writer code.
> This bug also recursively propagates to nested types. For instance a NULL cell of type {code} struct<field1:struct<field3:string>,field2:double> {code} will be become {code} {"field1":{"field3":null},"field2":null} {code} when written in a Parquet file.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)