You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2016/03/22 18:03:25 UTC

[jira] [Assigned] (SPARK-14070) Use ORC data source for SQL queries on ORC tables

     [ https://issues.apache.org/jira/browse/SPARK-14070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-14070:
------------------------------------

    Assignee:     (was: Apache Spark)

> Use ORC data source for SQL queries on ORC tables
> -------------------------------------------------
>
>                 Key: SPARK-14070
>                 URL: https://issues.apache.org/jira/browse/SPARK-14070
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.6.1
>            Reporter: Tejas Patil
>            Priority: Minor
>
> Currently if one is trying to query ORC tables in Hive, the plan generated by Spark hows that its using the `HiveTableScan` operator which is generic to all file formats. We could instead use the ORC data source for this so that we can get ORC specific optimizations like predicate pushdown.
> Current behaviour:
> ```
> scala>  hqlContext.sql("SELECT * FROM orc_table").explain(true)
> == Parsed Logical Plan ==
> 'Project [unresolvedalias(*, None)]
> +- 'UnresolvedRelation `orc_table`, None
> == Analyzed Logical Plan ==
> key: string, value: string
> Project [key#171,value#172]
> +- MetastoreRelation default, orc_table, None
> == Optimized Logical Plan ==
> MetastoreRelation default, orc_table, None
> == Physical Plan ==
> HiveTableScan [key#171,value#172], MetastoreRelation default, orc_table, None
> ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org