You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yin Huai (JIRA)" <ji...@apache.org> on 2016/01/26 16:51:40 UTC

[jira] [Resolved] (SPARK-12682) Hive will fail if the schema of a parquet table has a very wide schema

     [ https://issues.apache.org/jira/browse/SPARK-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yin Huai resolved SPARK-12682.
------------------------------
       Resolution: Fixed
    Fix Version/s: 1.6.1
                   2.0.0

Issue resolved by pull request 10826
[https://github.com/apache/spark/pull/10826]

> Hive will fail if the schema of a parquet table has a very wide schema
> ----------------------------------------------------------------------
>
>                 Key: SPARK-12682
>                 URL: https://issues.apache.org/jira/browse/SPARK-12682
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Yin Huai
>             Fix For: 2.0.0, 1.6.1
>
>
> To reproduce it, you can create a table with many many columns. You need to make sure that all of data type strings combined exceeds 4000 chars (strings are generated by HiveMetastoreTypes.toMetastoreType). Then, save the table as parquet. Because we will try to use a hive compatible way to store the metadata, we will set the serde to parquet serde. Then, when you load the table, you will see a {{java.lang.IllegalArgumentException}} thrown from Hive's {{TypeInfoUtils}}. I believe the cause is the same as SPARK-6024. Hive's parquet does not handle wide schema well and the data type string is truncated. 
> Once you hit this problem, you will not be able to drop the table because Hive fails to evaluate drop table command. To at least provide a better workaround. We should see if we should have a native drop table call to metastore and if we should add a flag to disable saving a data source table's metadata in hive compatible way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org