You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2017/09/12 22:04:00 UTC

[jira] [Commented] (SPARK-15705) Spark won't read ORC schema from metastore for partitioned tables

    [ https://issues.apache.org/jira/browse/SPARK-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16163747#comment-16163747 ] 

Dongjoon Hyun commented on SPARK-15705:
---------------------------------------

Hi, All.

I'm tracking this bug. This seems to be fixed since 2.1.1.

{code}
scala> spark.table("default.test").printSchema
root
 |-- id: long (nullable = true)
 |-- name: string (nullable = true)
 |-- state: string (nullable = true)

scala> sql("set spark.sql.hive.convertMetastoreOrc=true")
res1: org.apache.spark.sql.DataFrame = [key: string, value: string]

scala> spark.table("default.test").printSchema
root
 |-- _col0: long (nullable = true)
 |-- _col1: string (nullable = true)
 |-- state: string (nullable = true)

scala> sc.version
res3: String = 2.0.2
{code}

{code}
scala> spark.table("default.test").printSchema
root
 |-- id: long (nullable = true)
 |-- name: string (nullable = true)
 |-- state: string (nullable = true)

scala> sql("set spark.sql.hive.convertMetastoreOrc=true")
res1: org.apache.spark.sql.DataFrame = [key: string, value: string]

scala> spark.table("default.test").printSchema
root
 |-- id: long (nullable = true)
 |-- name: string (nullable = true)
 |-- state: string (nullable = true)

scala> sc.version
res3: String = 2.1.1
{code}

{code}
scala> spark.table("default.test").printSchema
root
 |-- id: long (nullable = true)
 |-- name: string (nullable = true)
 |-- state: string (nullable = true)


scala> sql("set spark.sql.hive.convertMetastoreOrc=true")
res1: org.apache.spark.sql.DataFrame = [key: string, value: string]

scala> spark.table("default.test").printSchema
root
 |-- id: long (nullable = true)
 |-- name: string (nullable = true)
 |-- state: string (nullable = true)

scala> sc.version
res3: String = 2.2.0
{code}

> Spark won't read ORC schema from metastore for partitioned tables
> -----------------------------------------------------------------
>
>                 Key: SPARK-15705
>                 URL: https://issues.apache.org/jira/browse/SPARK-15705
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>         Environment: HDP 2.3.4 (Hive 1.2.1, Hadoop 2.7.1)
>            Reporter: Nic Eggert
>            Assignee: Yin Huai
>            Priority: Critical
>             Fix For: 2.0.0
>
>
> Spark does not seem to read the schema from the Hive metastore for partitioned tables stored as ORC files. It appears to read the schema from the files themselves, which, if they were created with Hive, does not match the metastore schema (at least not before before Hive 2.0, see HIVE-4243). To reproduce:
> In Hive:
> {code}
> hive> create table default.test (id BIGINT, name STRING) partitioned by (state STRING) stored as orc;
> hive> insert into table default.test partition (state="CA") values (1, "mike"), (2, "steve"), (3, "bill");
> {code}
> In Spark
> {code}
> scala> spark.table("default.test").printSchema
> {code}
> Expected result: Spark should preserve the column names that were defined in Hive.
> Actual Result:
> {code}
> root
>  |-- _col0: long (nullable = true)
>  |-- _col1: string (nullable = true)
>  |-- state: string (nullable = true)
> {code}
> Possibly related to SPARK-14959?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org