You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2016/01/27 03:24:39 UTC
[jira] [Commented] (SPARK-12998) Enable OrcRelation when connecting via spark thrift server

    [ https://issues.apache.org/jira/browse/SPARK-12998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15118486#comment-15118486 ] 

Apache Spark commented on SPARK-12998:
--------------------------------------

User 'rajeshbalamohan' has created a pull request for this issue:
https://github.com/apache/spark/pull/10938

> Enable OrcRelation when connecting via spark thrift server
> ----------------------------------------------------------
>
>                 Key: SPARK-12998
>                 URL: https://issues.apache.org/jira/browse/SPARK-12998
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Rajesh Balamohan
>
> When a user connects via spark-thrift server to execute SQL, it does not enable PPD with ORC. It ends up creating MetastoreRelation which does not have ORC PPD.  Purpose of this JIRA is to convert MetastoreRelation to OrcRelation in HiveMetastoreCatalog, so that users can benefit from PPD even when connecting to spark-thrift server.
> {noformat}
> For example, "explain select count(1) from  tpch_flat_orc_1000.lineitem where l_shipdate = '1990-04-18'", current plan is 
> +------------------------------------------------------------------------------------------------------------------+--+
> |                                                       plan                                                       |
> +------------------------------------------------------------------------------------------------------------------+--+
> | == Physical Plan ==                                                                                              |
> | TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#17L])                  |
> | +- Exchange SinglePartition, None                                                                                |
> |    +- WholeStageCodegen                                                                                          |
> |       :  +- TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[count#20L])  |
> |       :     +- Project                                                                                           |
> |       :        +- Filter (l_shipdate#11 = 1990-04-18)                                                            |
> |       :           +- INPUT                                                                                       |
> |       +- HiveTableScan [l_shipdate#11], MetastoreRelation tpch_1000, lineitem, None                     |
> +------------------------------------------------------------------------------------------------------------------+--+
> It would be good to change it to OrcRelation to do PPD with ORC, which reduces the runtime by large margin.
>  
> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
> |                                                                                             plan                                                                                              |
> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
> | == Physical Plan ==                                                                                                                                                                           |
> | TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#70L])                                                                                               |
> | +- Exchange SinglePartition, None                                                                                                                                                             |
> |    +- WholeStageCodegen                                                                                                                                                                       |
> |       :  +- TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[count#106L])                                                                              |
> |       :     +- Project                                                                                                                                                                        |
> |       :        +- Filter (_col10#64 = 1990-04-18)                                                                                                                                             |
> |       :           +- INPUT                                                                                                                                                                    |
> |       +- Scan OrcRelation[_col10#64] InputPaths: hdfs://nn:8020/apps/hive/warehouse/tpch_1000.db/lineitem, PushedFilters: [EqualTo(_col10,1990-04-18)]  |
> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org