You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by dilipbiswal <gi...@git.apache.org> on 2017/01/07 00:33:43 UTC

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

GitHub user dilipbiswal opened a pull request:

    https://github.com/apache/spark/pull/16493

    [SPARK-19093][SQL] Cached tables are not used in SubqueryExpression

    ## What changes were proposed in this pull request?
    Consider the plans inside subquery expressions while looking up cache manager to make
    used of cached data. Currently CacheManager.useCachedData does not consider the
    subquery expressions in the plan.
    
    ## How was this patch tested?
    Added new tests in CachedTableSuite.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dilipbiswal/spark SPARK-19093

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16493.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16493
    
----
commit f733f90325b975973e60272ba6708dff5059f9dd
Author: Dilip Biswal <db...@us.ibm.com>
Date:   2017-01-07T00:18:23Z

    [SPARK-19093] Cached tables are not used in SubqueryExpression

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16493#discussion_r95051550
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
    @@ -565,4 +577,67 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext
             case i: InMemoryRelation => i
           }.size == 1)
       }
    +
    +  test("SPARK-19093 Caching in side subquery") {
    +    withTempView("t1") {
    +      Seq(1).toDF("c1").createOrReplaceTempView("t1")
    +      spark.catalog.cacheTable("t1")
    +      val cachedPlan =
    +        sql(
    +          """
    +            |SELECT * FROM t1
    +            |WHERE
    +            |NOT EXISTS (SELECT * FROM t1)
    +          """.stripMargin).queryExecution.optimizedPlan
    +      assert(
    +        cachedPlan.collect {
    +          case i: InMemoryRelation => i
    +        }.size == 2)
    --- End diff --
    
    The same here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    **[Test build #71027 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71027/testReport)** for PR 16493 at commit [`3c779d5`](https://github.com/apache/spark/commit/3c779d59fa54f9ed62a3ebef260b097695c0eff1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16493#discussion_r95050799
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
    @@ -565,4 +567,82 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext
             case i: InMemoryRelation => i
           }.size == 1)
       }
    +
    +  test("SPARK-19093 Caching in side subquery") {
    +    withTempView("t1") {
    +      Seq(1).toDF("c1").createOrReplaceTempView("t1")
    +      spark.catalog.cacheTable("t1")
    +      val cachedPlan =
    +        sql(
    +          """
    +            |SELECT * FROM t1
    +            |WHERE
    +            |NOT EXISTS (SELECT * FROM t1)
    +          """.stripMargin).queryExecution.optimizedPlan
    +      assert(
    +        cachedPlan.collect {
    +          case i: InMemoryRelation => i
    +        }.size == 2)
    +      spark.catalog.uncacheTable("t1")
    +    }
    +  }
    +
    +  test("SPARK-19093 scalar and nested predicate query") {
    +    def getCachedPlans(plan: LogicalPlan): Seq[LogicalPlan] = {
    +      plan collect {
    +        case i: InMemoryRelation => i
    +      }
    +    }
    +    withTempView("t1", "t2", "t3", "t4") {
    +      Seq(1).toDF("c1").createOrReplaceTempView("t1")
    +      Seq(2).toDF("c1").createOrReplaceTempView("t2")
    +      Seq(1).toDF("c1").createOrReplaceTempView("t3")
    +      Seq(1).toDF("c1").createOrReplaceTempView("t4")
    +      spark.catalog.cacheTable("t1")
    +      spark.catalog.cacheTable("t2")
    +      spark.catalog.cacheTable("t3")
    +      spark.catalog.cacheTable("t4")
    +
    +      // Nested predicate subquery
    +      val cachedPlan =
    +        sql(
    +        """
    +          |SELECT * FROM t1
    +          |WHERE
    +          |c1 IN (SELECT c1 FROM t2 WHERE c1 IN (SELECT c1 FROM t3 WHERE c1 = 1))
    +        """.stripMargin).queryExecution.optimizedPlan
    +
    +      assert(
    +        cachedPlan.collect {
    +          case i: InMemoryRelation => i
    +        }.size == 3)
    +
    +      // Scalar subquery and predicate subquery
    +      val cachedPlan2 =
    +        sql(
    +          """
    +            |SELECT * FROM (SELECT max(c1) FROM t1 GROUP BY c1)
    +            |WHERE
    +            |c1 = (SELECT max(c1) FROM t2 GROUP BY c1)
    +            |OR
    +            |EXISTS (SELECT c1 FROM t3)
    +            |OR
    +            |c1 IN (SELECT c1 FROM t4)
    +          """.stripMargin).queryExecution.optimizedPlan
    +
    +
    +      val cachedRelations = scala.collection.mutable.MutableList.empty[Seq[LogicalPlan]]
    +      cachedRelations += getCachedPlans(cachedPlan2)
    +      cachedPlan2 transformAllExpressions {
    +        case e: SubqueryExpression => cachedRelations += getCachedPlans(e.plan)
    +          e
    +      }
    +      assert(cachedRelations.flatten.size == 4)
    --- End diff --
    
    @gatorsmile Thanks... I will make the change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    Thank you very much @gatorsmile @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16493#discussion_r95051398
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ---
    @@ -131,6 +132,12 @@ class CacheManager extends Logging {
     
       /** Replaces segments of the given logical plan with cached versions where possible. */
       def useCachedData(plan: LogicalPlan): LogicalPlan = {
    +    useCachedDataInternal(plan) transformAllExpressions {
    +      case s: SubqueryExpression => s.withNewPlan(useCachedData(s.plan))
    +    }
    +  }
    +
    +  private def useCachedDataInternal(plan: LogicalPlan): LogicalPlan = {
    --- End diff --
    
    @gatorsmile Thank you very much. I have addressed your comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71004/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    **[Test build #71005 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71005/testReport)** for PR 16493 at commit [`3c779d5`](https://github.com/apache/spark/commit/3c779d59fa54f9ed62a3ebef260b097695c0eff1).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    **[Test build #71006 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71006/testReport)** for PR 16493 at commit [`3c779d5`](https://github.com/apache/spark/commit/3c779d59fa54f9ed62a3ebef260b097695c0eff1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70999/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16493#discussion_r95051546
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
    @@ -565,4 +567,82 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext
             case i: InMemoryRelation => i
           }.size == 1)
       }
    +
    +  test("SPARK-19093 Caching in side subquery") {
    +    withTempView("t1") {
    +      Seq(1).toDF("c1").createOrReplaceTempView("t1")
    +      spark.catalog.cacheTable("t1")
    +      val cachedPlan =
    +        sql(
    +          """
    +            |SELECT * FROM t1
    +            |WHERE
    +            |NOT EXISTS (SELECT * FROM t1)
    +          """.stripMargin).queryExecution.optimizedPlan
    +      assert(
    +        cachedPlan.collect {
    +          case i: InMemoryRelation => i
    +        }.size == 2)
    +      spark.catalog.uncacheTable("t1")
    +    }
    +  }
    +
    +  test("SPARK-19093 scalar and nested predicate query") {
    +    def getCachedPlans(plan: LogicalPlan): Seq[LogicalPlan] = {
    +      plan collect {
    +        case i: InMemoryRelation => i
    +      }
    +    }
    +    withTempView("t1", "t2", "t3", "t4") {
    +      Seq(1).toDF("c1").createOrReplaceTempView("t1")
    +      Seq(2).toDF("c1").createOrReplaceTempView("t2")
    +      Seq(1).toDF("c1").createOrReplaceTempView("t3")
    +      Seq(1).toDF("c1").createOrReplaceTempView("t4")
    +      spark.catalog.cacheTable("t1")
    +      spark.catalog.cacheTable("t2")
    +      spark.catalog.cacheTable("t3")
    +      spark.catalog.cacheTable("t4")
    +
    +      // Nested predicate subquery
    +      val cachedPlan =
    +        sql(
    +        """
    +          |SELECT * FROM t1
    +          |WHERE
    +          |c1 IN (SELECT c1 FROM t2 WHERE c1 IN (SELECT c1 FROM t3 WHERE c1 = 1))
    +        """.stripMargin).queryExecution.optimizedPlan
    +
    +      assert(
    +        cachedPlan.collect {
    +          case i: InMemoryRelation => i
    +        }.size == 3)
    +
    +      // Scalar subquery and predicate subquery
    +      val cachedPlan2 =
    +        sql(
    +          """
    +            |SELECT * FROM (SELECT max(c1) FROM t1 GROUP BY c1)
    +            |WHERE
    +            |c1 = (SELECT max(c1) FROM t2 GROUP BY c1)
    +            |OR
    +            |EXISTS (SELECT c1 FROM t3)
    +            |OR
    +            |c1 IN (SELECT c1 FROM t4)
    +          """.stripMargin).queryExecution.optimizedPlan
    +
    +
    +      val cachedRelations = scala.collection.mutable.MutableList.empty[Seq[LogicalPlan]]
    +      cachedRelations += getCachedPlans(cachedPlan2)
    +      cachedPlan2 transformAllExpressions {
    +        case e: SubqueryExpression => cachedRelations += getCachedPlans(e.plan)
    +          e
    +      }
    +      assert(cachedRelations.flatten.size == 4)
    +
    +      spark.catalog.uncacheTable("t1")
    +      spark.catalog.uncacheTable("t2")
    +      spark.catalog.uncacheTable("t3")
    +      spark.catalog.uncacheTable("t4")
    --- End diff --
    
    How about this? @dilipbiswal 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    In the test suite, we can have such a helper function to count `InMemoryRelation`
    ```Scala
      private def getNumInMemoryRelations(plan: LogicalPlan): Int = {
        var sum = plan.collect { case _: InMemoryRelation => 1 }.sum
        plan.transformAllExpressions {
          case e: SubqueryExpression =>
            sum += getNumInMemoryRelations(e.plan)
            e
        }
        sum
      }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71027/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16493#discussion_r95052038
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
    @@ -565,4 +567,82 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext
             case i: InMemoryRelation => i
           }.size == 1)
       }
    +
    +  test("SPARK-19093 Caching in side subquery") {
    +    withTempView("t1") {
    +      Seq(1).toDF("c1").createOrReplaceTempView("t1")
    +      spark.catalog.cacheTable("t1")
    +      val cachedPlan =
    +        sql(
    +          """
    +            |SELECT * FROM t1
    +            |WHERE
    +            |NOT EXISTS (SELECT * FROM t1)
    +          """.stripMargin).queryExecution.optimizedPlan
    +      assert(
    +        cachedPlan.collect {
    +          case i: InMemoryRelation => i
    +        }.size == 2)
    +      spark.catalog.uncacheTable("t1")
    +    }
    +  }
    +
    +  test("SPARK-19093 scalar and nested predicate query") {
    +    def getCachedPlans(plan: LogicalPlan): Seq[LogicalPlan] = {
    +      plan collect {
    +        case i: InMemoryRelation => i
    +      }
    +    }
    +    withTempView("t1", "t2", "t3", "t4") {
    +      Seq(1).toDF("c1").createOrReplaceTempView("t1")
    +      Seq(2).toDF("c1").createOrReplaceTempView("t2")
    +      Seq(1).toDF("c1").createOrReplaceTempView("t3")
    +      Seq(1).toDF("c1").createOrReplaceTempView("t4")
    +      spark.catalog.cacheTable("t1")
    +      spark.catalog.cacheTable("t2")
    +      spark.catalog.cacheTable("t3")
    +      spark.catalog.cacheTable("t4")
    +
    +      // Nested predicate subquery
    +      val cachedPlan =
    +        sql(
    +        """
    +          |SELECT * FROM t1
    +          |WHERE
    +          |c1 IN (SELECT c1 FROM t2 WHERE c1 IN (SELECT c1 FROM t3 WHERE c1 = 1))
    +        """.stripMargin).queryExecution.optimizedPlan
    +
    +      assert(
    +        cachedPlan.collect {
    +          case i: InMemoryRelation => i
    +        }.size == 3)
    +
    +      // Scalar subquery and predicate subquery
    +      val cachedPlan2 =
    +        sql(
    +          """
    +            |SELECT * FROM (SELECT max(c1) FROM t1 GROUP BY c1)
    +            |WHERE
    +            |c1 = (SELECT max(c1) FROM t2 GROUP BY c1)
    +            |OR
    +            |EXISTS (SELECT c1 FROM t3)
    +            |OR
    +            |c1 IN (SELECT c1 FROM t4)
    +          """.stripMargin).queryExecution.optimizedPlan
    +
    +
    +      val cachedRelations = scala.collection.mutable.MutableList.empty[Seq[LogicalPlan]]
    +      cachedRelations += getCachedPlans(cachedPlan2)
    +      cachedPlan2 transformAllExpressions {
    +        case e: SubqueryExpression => cachedRelations += getCachedPlans(e.plan)
    +          e
    +      }
    +      assert(cachedRelations.flatten.size == 4)
    +
    +      spark.catalog.uncacheTable("t1")
    +      spark.catalog.uncacheTable("t2")
    +      spark.catalog.uncacheTable("t3")
    +      spark.catalog.uncacheTable("t4")
    --- End diff --
    
    @gatorsmile sorry.. missed this one .. Will make the change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16493#discussion_r95050745
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ---
    @@ -131,6 +132,12 @@ class CacheManager extends Logging {
     
       /** Replaces segments of the given logical plan with cached versions where possible. */
       def useCachedData(plan: LogicalPlan): LogicalPlan = {
    +    useCachedDataInternal(plan) transformAllExpressions {
    +      case s: SubqueryExpression => s.withNewPlan(useCachedData(s.plan))
    +    }
    +  }
    +
    +  private def useCachedDataInternal(plan: LogicalPlan): LogicalPlan = {
    --- End diff --
    
    After rethinking about it, we do not need to add a new function. We can combine them into a single function, like:
    ```Scala
      /** Replaces segments of the given logical plan with cached versions where possible. */
      def useCachedData(plan: LogicalPlan): LogicalPlan = {
        val newPlan = plan transformDown {
          case currentFragment =>
            lookupCachedData(currentFragment)
              .map(_.cachedRepresentation.withOutput(currentFragment.output))
              .getOrElse(currentFragment)
        }
        
        newPlan transformAllExpressions {
          case s: SubqueryExpression => s.withNewPlan(useCachedData(s.plan))
        }
      }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16493#discussion_r95050805
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ---
    @@ -131,6 +132,12 @@ class CacheManager extends Logging {
     
       /** Replaces segments of the given logical plan with cached versions where possible. */
       def useCachedData(plan: LogicalPlan): LogicalPlan = {
    +    useCachedDataInternal(plan) transformAllExpressions {
    +      case s: SubqueryExpression => s.withNewPlan(useCachedData(s.plan))
    +    }
    +  }
    +
    +  private def useCachedDataInternal(plan: LogicalPlan): LogicalPlan = {
    --- End diff --
    
    @gatorsmile Sure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16493#discussion_r95050711
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
    @@ -565,4 +567,82 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext
             case i: InMemoryRelation => i
           }.size == 1)
       }
    +
    +  test("SPARK-19093 Caching in side subquery") {
    +    withTempView("t1") {
    +      Seq(1).toDF("c1").createOrReplaceTempView("t1")
    +      spark.catalog.cacheTable("t1")
    +      val cachedPlan =
    +        sql(
    +          """
    +            |SELECT * FROM t1
    +            |WHERE
    +            |NOT EXISTS (SELECT * FROM t1)
    +          """.stripMargin).queryExecution.optimizedPlan
    +      assert(
    +        cachedPlan.collect {
    +          case i: InMemoryRelation => i
    +        }.size == 2)
    +      spark.catalog.uncacheTable("t1")
    +    }
    +  }
    +
    +  test("SPARK-19093 scalar and nested predicate query") {
    +    def getCachedPlans(plan: LogicalPlan): Seq[LogicalPlan] = {
    +      plan collect {
    +        case i: InMemoryRelation => i
    +      }
    +    }
    +    withTempView("t1", "t2", "t3", "t4") {
    +      Seq(1).toDF("c1").createOrReplaceTempView("t1")
    +      Seq(2).toDF("c1").createOrReplaceTempView("t2")
    +      Seq(1).toDF("c1").createOrReplaceTempView("t3")
    +      Seq(1).toDF("c1").createOrReplaceTempView("t4")
    +      spark.catalog.cacheTable("t1")
    +      spark.catalog.cacheTable("t2")
    +      spark.catalog.cacheTable("t3")
    +      spark.catalog.cacheTable("t4")
    +
    +      // Nested predicate subquery
    +      val cachedPlan =
    +        sql(
    +        """
    +          |SELECT * FROM t1
    +          |WHERE
    +          |c1 IN (SELECT c1 FROM t2 WHERE c1 IN (SELECT c1 FROM t3 WHERE c1 = 1))
    +        """.stripMargin).queryExecution.optimizedPlan
    +
    +      assert(
    +        cachedPlan.collect {
    +          case i: InMemoryRelation => i
    +        }.size == 3)
    --- End diff --
    
    Then, this can be simplified to 
    ```Scala
    assert (getNumInMemoryRelations(cachedPlan2) == 3)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    **[Test build #71027 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71027/testReport)** for PR 16493 at commit [`3c779d5`](https://github.com/apache/spark/commit/3c779d59fa54f9ed62a3ebef260b097695c0eff1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16493


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    Although the test cases can be improved, the code fix looks good to me. cc @JoshRosen @hvanhovell 
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    Also cc @rxin @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    retest this please



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16493#discussion_r95049506
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
    @@ -565,4 +567,82 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext
             case i: InMemoryRelation => i
           }.size == 1)
       }
    +
    +  test("SPARK-19093 Caching in side subquery") {
    +    withTempView("t1") {
    +      Seq(1).toDF("c1").createOrReplaceTempView("t1")
    +      spark.catalog.cacheTable("t1")
    +      val cachedPlan =
    +        sql(
    +          """
    +            |SELECT * FROM t1
    +            |WHERE
    +            |NOT EXISTS (SELECT * FROM t1)
    +          """.stripMargin).queryExecution.optimizedPlan
    +      assert(
    +        cachedPlan.collect {
    +          case i: InMemoryRelation => i
    +        }.size == 2)
    +      spark.catalog.uncacheTable("t1")
    +    }
    +  }
    +
    +  test("SPARK-19093 scalar and nested predicate query") {
    +    def getCachedPlans(plan: LogicalPlan): Seq[LogicalPlan] = {
    +      plan collect {
    +        case i: InMemoryRelation => i
    +      }
    +    }
    +    withTempView("t1", "t2", "t3", "t4") {
    +      Seq(1).toDF("c1").createOrReplaceTempView("t1")
    +      Seq(2).toDF("c1").createOrReplaceTempView("t2")
    +      Seq(1).toDF("c1").createOrReplaceTempView("t3")
    +      Seq(1).toDF("c1").createOrReplaceTempView("t4")
    +      spark.catalog.cacheTable("t1")
    +      spark.catalog.cacheTable("t2")
    +      spark.catalog.cacheTable("t3")
    +      spark.catalog.cacheTable("t4")
    +
    +      // Nested predicate subquery
    +      val cachedPlan =
    +        sql(
    +        """
    +          |SELECT * FROM t1
    +          |WHERE
    +          |c1 IN (SELECT c1 FROM t2 WHERE c1 IN (SELECT c1 FROM t3 WHERE c1 = 1))
    +        """.stripMargin).queryExecution.optimizedPlan
    +
    +      assert(
    +        cachedPlan.collect {
    +          case i: InMemoryRelation => i
    +        }.size == 3)
    +
    +      // Scalar subquery and predicate subquery
    +      val cachedPlan2 =
    +        sql(
    +          """
    +            |SELECT * FROM (SELECT max(c1) FROM t1 GROUP BY c1)
    +            |WHERE
    +            |c1 = (SELECT max(c1) FROM t2 GROUP BY c1)
    +            |OR
    +            |EXISTS (SELECT c1 FROM t3)
    +            |OR
    +            |c1 IN (SELECT c1 FROM t4)
    +          """.stripMargin).queryExecution.optimizedPlan
    +
    +
    +      val cachedRelations = scala.collection.mutable.MutableList.empty[Seq[LogicalPlan]]
    +      cachedRelations += getCachedPlans(cachedPlan2)
    +      cachedPlan2 transformAllExpressions {
    +        case e: SubqueryExpression => cachedRelations += getCachedPlans(e.plan)
    +          e
    +      }
    +      assert(cachedRelations.flatten.size == 4)
    +
    +      spark.catalog.uncacheTable("t1")
    +      spark.catalog.uncacheTable("t2")
    +      spark.catalog.uncacheTable("t3")
    +      spark.catalog.uncacheTable("t4")
    --- End diff --
    
    ```Scala
      override def afterEach(): Unit = {
        try {
          clearCache()
        } finally {
          super.afterEach()
        }
      }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16493#discussion_r95051560
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
    @@ -565,4 +577,67 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext
             case i: InMemoryRelation => i
           }.size == 1)
       }
    +
    +  test("SPARK-19093 Caching in side subquery") {
    +    withTempView("t1") {
    +      Seq(1).toDF("c1").createOrReplaceTempView("t1")
    +      spark.catalog.cacheTable("t1")
    +      val cachedPlan =
    +        sql(
    +          """
    +            |SELECT * FROM t1
    +            |WHERE
    +            |NOT EXISTS (SELECT * FROM t1)
    +          """.stripMargin).queryExecution.optimizedPlan
    +      assert(
    +        cachedPlan.collect {
    +          case i: InMemoryRelation => i
    +        }.size == 2)
    +      spark.catalog.uncacheTable("t1")
    +    }
    +  }
    +
    +  test("SPARK-19093 scalar and nested predicate query") {
    +
    +
    --- End diff --
    
    Nit: remove these two lines


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    **[Test build #70999 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70999/testReport)** for PR 16493 at commit [`f733f90`](https://github.com/apache/spark/commit/f733f90325b975973e60272ba6708dff5059f9dd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    **[Test build #71005 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71005/testReport)** for PR 16493 at commit [`3c779d5`](https://github.com/apache/spark/commit/3c779d59fa54f9ed62a3ebef260b097695c0eff1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    retest this please



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    @dilipbiswal Could you post the nested subquery in the PR description? It can help the other reviewers understand the fix. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16493#discussion_r95049495
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
    @@ -565,4 +567,82 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext
             case i: InMemoryRelation => i
           }.size == 1)
       }
    +
    +  test("SPARK-19093 Caching in side subquery") {
    +    withTempView("t1") {
    +      Seq(1).toDF("c1").createOrReplaceTempView("t1")
    +      spark.catalog.cacheTable("t1")
    +      val cachedPlan =
    +        sql(
    +          """
    +            |SELECT * FROM t1
    +            |WHERE
    +            |NOT EXISTS (SELECT * FROM t1)
    +          """.stripMargin).queryExecution.optimizedPlan
    +      assert(
    +        cachedPlan.collect {
    +          case i: InMemoryRelation => i
    +        }.size == 2)
    +      spark.catalog.uncacheTable("t1")
    +    }
    +  }
    +
    +  test("SPARK-19093 scalar and nested predicate query") {
    +    def getCachedPlans(plan: LogicalPlan): Seq[LogicalPlan] = {
    +      plan collect {
    +        case i: InMemoryRelation => i
    +      }
    +    }
    +    withTempView("t1", "t2", "t3", "t4") {
    +      Seq(1).toDF("c1").createOrReplaceTempView("t1")
    +      Seq(2).toDF("c1").createOrReplaceTempView("t2")
    +      Seq(1).toDF("c1").createOrReplaceTempView("t3")
    +      Seq(1).toDF("c1").createOrReplaceTempView("t4")
    +      spark.catalog.cacheTable("t1")
    +      spark.catalog.cacheTable("t2")
    +      spark.catalog.cacheTable("t3")
    +      spark.catalog.cacheTable("t4")
    +
    +      // Nested predicate subquery
    +      val cachedPlan =
    +        sql(
    +        """
    +          |SELECT * FROM t1
    +          |WHERE
    +          |c1 IN (SELECT c1 FROM t2 WHERE c1 IN (SELECT c1 FROM t3 WHERE c1 = 1))
    +        """.stripMargin).queryExecution.optimizedPlan
    +
    +      assert(
    +        cachedPlan.collect {
    +          case i: InMemoryRelation => i
    +        }.size == 3)
    +
    +      // Scalar subquery and predicate subquery
    +      val cachedPlan2 =
    +        sql(
    +          """
    +            |SELECT * FROM (SELECT max(c1) FROM t1 GROUP BY c1)
    +            |WHERE
    +            |c1 = (SELECT max(c1) FROM t2 GROUP BY c1)
    +            |OR
    +            |EXISTS (SELECT c1 FROM t3)
    +            |OR
    +            |c1 IN (SELECT c1 FROM t4)
    +          """.stripMargin).queryExecution.optimizedPlan
    +
    +
    +      val cachedRelations = scala.collection.mutable.MutableList.empty[Seq[LogicalPlan]]
    +      cachedRelations += getCachedPlans(cachedPlan2)
    +      cachedPlan2 transformAllExpressions {
    +        case e: SubqueryExpression => cachedRelations += getCachedPlans(e.plan)
    +          e
    +      }
    +      assert(cachedRelations.flatten.size == 4)
    +
    +      spark.catalog.uncacheTable("t1")
    +      spark.catalog.uncacheTable("t2")
    +      spark.catalog.uncacheTable("t3")
    +      spark.catalog.uncacheTable("t4")
    --- End diff --
    
    You can call `clearCache()` and then no need to uncache each table. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    Merging to master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    retest this please



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16493#discussion_r95050708
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
    @@ -565,4 +567,82 @@ class CachedTableSuite extends QueryTest with SQLTestUtils with SharedSQLContext
             case i: InMemoryRelation => i
           }.size == 1)
       }
    +
    +  test("SPARK-19093 Caching in side subquery") {
    +    withTempView("t1") {
    +      Seq(1).toDF("c1").createOrReplaceTempView("t1")
    +      spark.catalog.cacheTable("t1")
    +      val cachedPlan =
    +        sql(
    +          """
    +            |SELECT * FROM t1
    +            |WHERE
    +            |NOT EXISTS (SELECT * FROM t1)
    +          """.stripMargin).queryExecution.optimizedPlan
    +      assert(
    +        cachedPlan.collect {
    +          case i: InMemoryRelation => i
    +        }.size == 2)
    +      spark.catalog.uncacheTable("t1")
    +    }
    +  }
    +
    +  test("SPARK-19093 scalar and nested predicate query") {
    +    def getCachedPlans(plan: LogicalPlan): Seq[LogicalPlan] = {
    +      plan collect {
    +        case i: InMemoryRelation => i
    +      }
    +    }
    +    withTempView("t1", "t2", "t3", "t4") {
    +      Seq(1).toDF("c1").createOrReplaceTempView("t1")
    +      Seq(2).toDF("c1").createOrReplaceTempView("t2")
    +      Seq(1).toDF("c1").createOrReplaceTempView("t3")
    +      Seq(1).toDF("c1").createOrReplaceTempView("t4")
    +      spark.catalog.cacheTable("t1")
    +      spark.catalog.cacheTable("t2")
    +      spark.catalog.cacheTable("t3")
    +      spark.catalog.cacheTable("t4")
    +
    +      // Nested predicate subquery
    +      val cachedPlan =
    +        sql(
    +        """
    +          |SELECT * FROM t1
    +          |WHERE
    +          |c1 IN (SELECT c1 FROM t2 WHERE c1 IN (SELECT c1 FROM t3 WHERE c1 = 1))
    +        """.stripMargin).queryExecution.optimizedPlan
    +
    +      assert(
    +        cachedPlan.collect {
    +          case i: InMemoryRelation => i
    +        }.size == 3)
    +
    +      // Scalar subquery and predicate subquery
    +      val cachedPlan2 =
    +        sql(
    +          """
    +            |SELECT * FROM (SELECT max(c1) FROM t1 GROUP BY c1)
    +            |WHERE
    +            |c1 = (SELECT max(c1) FROM t2 GROUP BY c1)
    +            |OR
    +            |EXISTS (SELECT c1 FROM t3)
    +            |OR
    +            |c1 IN (SELECT c1 FROM t4)
    +          """.stripMargin).queryExecution.optimizedPlan
    +
    +
    +      val cachedRelations = scala.collection.mutable.MutableList.empty[Seq[LogicalPlan]]
    +      cachedRelations += getCachedPlans(cachedPlan2)
    +      cachedPlan2 transformAllExpressions {
    +        case e: SubqueryExpression => cachedRelations += getCachedPlans(e.plan)
    +          e
    +      }
    +      assert(cachedRelations.flatten.size == 4)
    --- End diff --
    
    Then, this can be simplified to 
    ```Scala
    assert (getNumInMemoryRelations(cachedPlan2) == 4)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by hvanhovell <gi...@git.apache.org>.

Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71005/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    **[Test build #70999 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70999/testReport)** for PR 16493 at commit [`f733f90`](https://github.com/apache/spark/commit/f733f90325b975973e60272ba6708dff5059f9dd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    **[Test build #71004 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71004/testReport)** for PR 16493 at commit [`f9f0b01`](https://github.com/apache/spark/commit/f9f0b01e5cf6e8a6a212324686e82b0c4bf1b5fc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71006/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16493
  
    **[Test build #71004 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71004/testReport)** for PR 16493 at commit [`f9f0b01`](https://github.com/apache/spark/commit/f9f0b01e5cf6e8a6a212324686e82b0c4bf1b5fc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org