You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2018/11/12 16:39:30 UTC

[GitHub] spark pull request #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries t...

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22518#discussion_r232729788
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala ---
    @@ -1268,4 +1269,16 @@ class SubquerySuite extends QueryTest with SharedSQLContext {
           assert(getNumSortsInQuery(query5) == 1)
         }
       }
    +
    +  test("SPARK-25482: Reuse same Subquery in order to execute it only once") {
    +    withTempView("t1", "t2") {
    +      sql("create temporary view t1(a int) using parquet")
    +      sql("create temporary view t2(b int) using parquet")
    +      val plan = sql("select * from t2 where b > (select max(a) from t1)")
    --- End diff --
    
    > The subquery should be executed anyway sooner or later, right?
    
    Yes, but we could execute scan and subquery at the same time (2 spark jobs running together), instead of executing them serialized.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org