You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yadong Qi (JIRA)" <ji...@apache.org> on 2015/12/07 08:22:11 UTC
[jira] [Closed] (SPARK-12167) Invoke the right sameResult function when plan is warpped with SubQueries

     [ https://issues.apache.org/jira/browse/SPARK-12167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yadong Qi closed SPARK-12167.
-----------------------------
    Resolution: Duplicate

> Invoke the right sameResult function when plan is warpped with SubQueries
> -------------------------------------------------------------------------
>
>                 Key: SPARK-12167
>                 URL: https://issues.apache.org/jira/browse/SPARK-12167
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.2
>            Reporter: Yadong Qi
>
> I find this bug when I use cache table,
> ```
> spark-sql> create table src_p(key int, value int) stored as parquet;
> OK
> Time taken: 3.144 seconds
> spark-sql> cache table src_p;
> Time taken: 1.452 seconds
> spark-sql> explain extended select count(*) from src_p;
> ```
> I got the wrong physical plan
> ```
> == Physical Plan ==
> TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#28L])
>  TungstenExchange SinglePartition
>   TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[currentCount#33L])
>    Scan ParquetRelation[hdfs://9.91.8.131:9000/user/hive/warehouse/src_p][]
> ```
> and the right physical plan is
> ```
> == Physical Plan ==
> TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#47L])
>  TungstenExchange SinglePartition
>   TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[currentCount#62L])
>    InMemoryColumnarTableScan (InMemoryRelation [key#45,value#46], true, 10000, StorageLevel(true, true, false, true, 1), (Scan ParquetRelation[hdfs://9.91.8.131:9000/user/hive/warehouse/src_p][key#9,value#10]), Some(src_p))
> ```
> When the implementation classes of `MultiInstanceRelation`(eg. `LogicalRelation`, `LocalRelation`) are warpped with SubQueries, they can't invoke the right `sameResult` function in their own implementation. So we need to eliminate SubQueries first and then try to invoke `sameResult` function in their own implementation.
> Like:
> When plan is `Subquery(LogicalRelation(relation:ParquetRelation[hdfs://9.91.8.131:9000/user/hive/warehouse/src_p], expectedOutputAttributes:Some(ArrayBuffer(key#0, value#1))))`, first eliminate SubQueries, and then will invoke the `sameResult` function in `LogicalRelation` instead of `LogicalPlan`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org