You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2016/03/22 18:03:25 UTC
[jira] [Assigned] (SPARK-14070) Use ORC data source for SQL queries
on ORC tables
[ https://issues.apache.org/jira/browse/SPARK-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-14070:
------------------------------------
Assignee: (was: Apache Spark)
> Use ORC data source for SQL queries on ORC tables
> -------------------------------------------------
>
> Key: SPARK-14070
> URL: https://issues.apache.org/jira/browse/SPARK-14070
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 1.6.1
> Reporter: Tejas Patil
> Priority: Minor
>
> Currently if one is trying to query ORC tables in Hive, the plan generated by Spark hows that its using the `HiveTableScan` operator which is generic to all file formats. We could instead use the ORC data source for this so that we can get ORC specific optimizations like predicate pushdown.
> Current behaviour:
> ```
> scala> hqlContext.sql("SELECT * FROM orc_table").explain(true)
> == Parsed Logical Plan ==
> 'Project [unresolvedalias(*, None)]
> +- 'UnresolvedRelation `orc_table`, None
> == Analyzed Logical Plan ==
> key: string, value: string
> Project [key#171,value#172]
> +- MetastoreRelation default, orc_table, None
> == Optimized Logical Plan ==
> MetastoreRelation default, orc_table, None
> == Physical Plan ==
> HiveTableScan [key#171,value#172], MetastoreRelation default, orc_table, None
> ```
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org