You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xin Wu (JIRA)" <ji...@apache.org> on 2016/07/18 21:17:20 UTC

[jira] [Commented] (SPARK-16605) Spark2.0 cannot "select" data from a table stored as an orc file which has been created by hive while hive or spark1.6 supports

    [ https://issues.apache.org/jira/browse/SPARK-16605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383082#comment-15383082 ] 

Xin Wu commented on SPARK-16605:
--------------------------------

The current issue for dealing with ORC data inserted by Hive is that the schema stored in orc file inserted by hive is using dummy column name such as "_col1, _col2, ...". Hive knows how to read the data. However, in Spark SQL, for performance gain, it tries to convert ORC table to its native ORC relation for scanning, in that it infers schema from orc file directly but getting the table schema from hive megastore. There are then mismatch here. 

Try the workaround that turns off this conversion for performance: 
{code}set spark.sql.hive.convertMetastoreOrc=false{code}

Then, see if it works. 

> Spark2.0 cannot "select" data from a table stored as an orc file which has been created by hive while hive or spark1.6 supports
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-16605
>                 URL: https://issues.apache.org/jira/browse/SPARK-16605
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: marymwu
>         Attachments: screenshot-1.png
>
>
> Spark2.0 cannot "select" data from a table stored as an orc file which has been created by hive while hive or spark1.6 supports
> Steps:
> 1. Use hive to create a table "tbtxt" stored as txt and load data into it.
> 2. Use hive to create a table "tborc" stored as orc and insert the data from table "tbtxt" . Example, "create table tborc stored as orc as select * from tbtxt"
> 3. Use spark2.0 to "select * from tborc;".-->error occurs,java.lang.IllegalArgumentException: Field "nid" does not exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org