You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yin Huai (JIRA)" <ji...@apache.org> on 2016/07/19 18:29:20 UTC

[jira] [Created] (SPARK-16628) OrcConversions should not convert an ORC table represented by MetastoreRelation to HadoopFsRelation if metastore schema does not match schema stored in ORC files

Yin Huai created SPARK-16628:
--------------------------------

             Summary: OrcConversions should not convert an ORC table represented by MetastoreRelation to HadoopFsRelation if metastore schema does not match schema stored in ORC files
                 Key: SPARK-16628
                 URL: https://issues.apache.org/jira/browse/SPARK-16628
             Project: Spark
          Issue Type: Bug
          Components: SQL
            Reporter: Yin Huai


When {{spark.sql.hive.convertMetastoreOrc}} is enabled, we will convert a ORC table represented by a MetastoreRelation to HadoopFsRelation that uses Spark's OrcFileFormat internally. This conversion aims to make table scanning have a better performance since at runtime, the code path to scan HadoopFsRelation's performance is better. However, OrcFileFormat's implementation is based on the assumption that ORC files store their schema with correct column names. However, before Hive 2.0, an ORC table created by Hive does not store column name correctly in the ORC files (HIVE-4243). So, for this kind of ORC datasets, we cannot really convert the code path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org