You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by viirya <gi...@git.apache.org> on 2017/07/20 07:16:41 UTC

[GitHub] spark pull request #18687: [SPARK-21484][SQL] Fix inconsistent query plans o...

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/18687

    [SPARK-21484][SQL] Fix inconsistent query plans of Dataset after persist/unpersist

    ## What changes were proposed in this pull request?
    
    After the call of persist/unpersis, the query plans of a `Dataset` should be changed accordingly. But currently the query plans are the same. So you will see the inconsistent query plans like:
    
        scala> val x1 = Seq(1).toDF()
        x1: org.apache.spark.sql.DataFrame = [value: int]
        scala> println(x1.queryExecution.executedPlan) // query plans are materialized before persist()
        LocalTableScan [value#1]
        scala> x1.persist()
        scala> x1.count()
        scala> println(x1.queryExecution.executedPlan)
        LocalTableScan [value#1]
    
        scala> val x1 = Seq(1).toDF()
        x1: org.apache.spark.sql.DataFrame = [value: int]
        scala> x1.persist()
        scala> x1.count()
        scala> println(x1.queryExecution.executedPlan) // query plans are materialized after persist()
        InMemoryTableScan [value#24]
           +- InMemoryRelation [value#24], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas)
                 +- LocalTableScan [value#1]
        scala> x1.unpersist()
        scala> println(x1.queryExecution.executedPlan)
        InMemoryTableScan [value#24]
           +- InMemoryRelation [value#24], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas)
                 +- LocalTableScan [value#1]
    
    ## How was this patch tested?


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 SPARK-21484

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18687.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18687
    
----
commit 6b21c6b2408dcbc6fec938dba94eeb5d70387b2e
Author: Liang-Chi Hsieh <vi...@gmail.com>
Date:   2017-07-20T07:12:00Z

    Change query execution instance after persist/unpersist.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    We need to well define the cache behavior of Dataset.
    
    I agree that once a table is cached, other Dataset should use the cached table without reading it again.
    
    Then once a table is uncached, other Dataset now should use the cached table or uncached table?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79885/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Should `Dataset` be thread-safe? cc @rxin @gatorsmile 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    **[Test build #79878 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79878/testReport)** for PR 18687 at commit [`1aa0520`](https://github.com/apache/spark/commit/1aa0520ce9492545da23f62a709a3530fa253afd).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class CacheWatcher(val queryExecution: QueryExecution) `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79878/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    **[Test build #79881 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79881/testReport)** for PR 18687 at commit [`92cf928`](https://github.com/apache/spark/commit/92cf9288de8dcebe25578763c329ff8193203000).
     * This patch **fails from timeout after a configured wait of \`250m\`**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class CacheWatcherSuite extends QueryTest with SharedSQLContext `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Ok. Sounds reasonable. I'm preparing new fix for the case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Since you said it is well defined, then I suppose that the coded behavior of cache:
    
    * Once a dataset is cached, the same fragment of logical plan in other datasets should use the cached plan.
    * Once a dataset is uncached, the same fragment of logical plan in other datasets should fallback to uncached plan.
    
    is correct and we don't want to change.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    @gatorsmile Thanks for reporting that.
    
    It is hard to argue the reported case is valid in semantics. Actually ds1 and ds2 are two different Datasets. In semantics, you cache one Dataset, why another Dataset needs to be cached too?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Thanks for fixing this, but this PR does not fix all the cases that caused by our materialized plans in the QueryExecution. For examples,
    ```Scala
          Seq("1", "2").toDF().write.saveAsTable("t")
          val ds1 = spark.table("t")
          val ds2 = spark.table("t")
          ds1.collect()
          ds2.persist()
          ds1.collect() --> this still use the uncached plan.
    ```
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79791/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    **[Test build #79884 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79884/testReport)** for PR 18687 at commit [`92cf928`](https://github.com/apache/spark/commit/92cf9288de8dcebe25578763c329ff8193203000).
     * This patch **fails from timeout after a configured wait of \`250m\`**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class CacheWatcherSuite extends QueryTest with SharedSQLContext `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79884/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Except for fixing the weird consequence of caching e.g. https://github.com/apache/spark/pull/18687#issuecomment-317159979 and  https://github.com/apache/spark/pull/18687#issuecomment-317186128, I'd keep the current behavior described above https://github.com/apache/spark/pull/18687#issuecomment-317220129 and https://github.com/apache/spark/pull/18687#issuecomment-317230592.
    
    
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    @gatorsmile This introduces another question, when one table in cached in one session, and other session uncache the table, now is the original table cached or uncached?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Do we have dynamic SQL statements?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    **[Test build #79880 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79880/testReport)** for PR 18687 at commit [`d036a29`](https://github.com/apache/spark/commit/d036a29d076698bde7a8232425f9a7e2951b52d5).
     * This patch **fails from timeout after a configured wait of \`250m\`**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79880/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    **[Test build #79884 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79884/testReport)** for PR 18687 at commit [`92cf928`](https://github.com/apache/spark/commit/92cf9288de8dcebe25578763c329ff8193203000).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    > Once a dataset is uncached, the same fragment of logical plan in other datasets should fallback to uncached plan.
    
    This is true, although the plan looks like the same. However, the execution is not using the cached data.
    
    > Once a dataset is cached, the same fragment of logical plan in other datasets should use the cached plan.
    
    This is true, if and only if your first execution of `queryExecution` is after `persist()`. If you check most of our APIs, we are building a new `QueryExecution`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    It is well defined, but never well documented. So far, there are multiple issues in the existing cache mechanism. We hesitate to do any major change until we figure out what is the next step of the whole cache management.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    **[Test build #79881 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79881/testReport)** for PR 18687 at commit [`92cf928`](https://github.com/apache/spark/commit/92cf9288de8dcebe25578763c329ff8193203000).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    @viirya Could you close this PR? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    I think you might push more changes in the following few hours. If the changes are clean and safe, we still need a SQLConf to ensure the behaviors can be changed back. We might not be able to accept any behavior change in 2.x. 
    
    The changes of this PR (before this comment) is changing the fundamental of the Catalyst design. More importantly, it could also hurt the third-party libraries which are built on Spark SQL. Thus, writing a detailed design doc (SPIP) might be better. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    **[Test build #79880 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79880/testReport)** for PR 18687 at commit [`d036a29`](https://github.com/apache/spark/commit/d036a29d076698bde7a8232425f9a7e2951b52d5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    **[Test build #79790 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79790/testReport)** for PR 18687 at commit [`6b21c6b`](https://github.com/apache/spark/commit/6b21c6b2408dcbc6fec938dba94eeb5d70387b2e).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Btw, as `QueryExecution.toRdd` executes the executed plan. Once it is materialized by the Dataset before persist, it still executes the uncached executed plan. There're few places in Dataset calling `toRdd`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    >> Once a dataset is uncached, the same fragment of logical plan in other datasets should fallback to uncached plan.
    
    > This is true, although the plan looks like the same. However, the execution is not using the cached data.
    
    There're few places in `Dataset` directly calling `QueryExecution.toRdd`, which uses the materialized query plan that still contains cached relation.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    **[Test build #79877 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79877/testReport)** for PR 18687 at commit [`0f3a3c1`](https://github.com/apache/spark/commit/0f3a3c16d687e0843aa1fa6e36bfe910dfc8d0fb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Sure.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    **[Test build #79877 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79877/testReport)** for PR 18687 at commit [`0f3a3c1`](https://github.com/apache/spark/commit/0f3a3c16d687e0843aa1fa6e36bfe910dfc8d0fb).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class CacheWatcher(val queryExecution: QueryExecution) `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    @kiszk There're test failure. I'm trying to fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79790/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    **[Test build #79885 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79885/testReport)** for PR 18687 at commit [`188fe8c`](https://github.com/apache/spark/commit/188fe8c88018f62ef68247fd3c4024c2ca44d7f1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    cc @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    So far, we do not support dynamic SQL statement, but this is a potential feature we can explore in the future. A global statement cache and management can reduce the optimization costs, especially when our CBO optimizer is more advanced. 
    
    At the same time, it also resolves a more general issue. We can invalidate all the physical plans that are built based on the stale info. Data cache is just one of the examples. It could also include out-of-dated statistics. The previous optimized plan might not make sense any more when the underlying tables inserts/deletes a large amount of data. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    > So far, we do not support dynamic SQL statement, but this is a potential feature we can explore in the future. A global statement cache and management can reduce the optimization costs, especially when our CBO optimizer is more advanced.
    
    I agree. The dynamic statement cache is used to reduce the costly preparation process by reusing prepared (parsed, analyzed and optimized) statement. IIUC, this only works for identical dynamic SQL statements. When CBO optimizer goes more advanced and more costly, the statement cache might help reduce the cost for identical statements.
    
    The current cache in SparkSQL is not for statement cache, but for query plan fragment (and its execution result) cache. A query doesn't need to be identical to cached query plan. It can reuse the cached plan even when the cached one is just a fragment of it.
    
    So seems to me they are orthogonal and can be complementary.
    
    
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    > The changes of this PR (before this comment) is changing the fundamental of the Catalyst design. More importantly, it could also hurt the third-party libraries which are built on Spark SQL. Thus, writing a detailed design doc (SPIP) might be better.
    
    If it's necessary, I'd write the doc. Currently the change is just limited to the internal of `QueryExecution`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    ping @cloud-fan Please help review this too. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79876/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    @gatorsmile Any suggestion for this issue? Leave it as it for now?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    **[Test build #79791 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79791/testReport)** for PR 18687 at commit [`3c16c3c`](https://github.com/apache/spark/commit/3c16c3c09ebc81f2f6b7ffb6430f9305bfe4030c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    There is another case we should fix. You can see the storage level of ds2 is StorageLevel.NONE, but its executed plan is still cached version. 
    
        scala> Seq("1", "2").toDF().write.saveAsTable("t")
        scala> val ds1 = spark.table("t")
        ds1: org.apache.spark.sql.DataFrame = [value: string]
        scala> val ds2 = spark.table("t")
        ds2: org.apache.spark.sql.DataFrame = [value: string]
    
        scala> ds1.persist()
        res1: ds1.type = [value: string]
    
        scala> ds2.queryExecution.executedPlan
        res2: org.apache.spark.sql.execution.SparkPlan =
        InMemoryTableScan [value#11]
           +- InMemoryRelation [value#11], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas)
                 +- *FileScan parquet default.t[value#7] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/root/repos/spark-1/spark-warehouse/t], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string>
    
        scala> ds1.unpersist()
        res3: ds1.type = [value: string]
    
        scala> ds2.queryExecution.executedPlan
        res4: org.apache.spark.sql.execution.SparkPlan =
        InMemoryTableScan [value#11]
           +- InMemoryRelation [value#11], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas)
                 +- *FileScan parquet default.t[value#7] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/root/repos/spark-1/spark-warehouse/t], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string>
        scala> ds2.storageLevel
        res5: org.apache.spark.storage.StorageLevel = StorageLevel(1 replicas)
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79877/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    **[Test build #79876 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79876/testReport)** for PR 18687 at commit [`6e84930`](https://github.com/apache/spark/commit/6e849309e8582730bcc4d831b7fd175fcc7602fd).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class CacheWatcher(val queryExecution: QueryExecution) `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    This really depends on how you implement the global statement cache and management. All the compiled plans can be stored in the cache. The plans can be reused, if possible (the reused plans might not be identical). 
    
    The plan recompilation in this PR can be part of global statement cache and management. Plan recompilation can be manual or automatic. In this specific case, it should be done automatically. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    **[Test build #79885 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79885/testReport)** for PR 18687 at commit [`188fe8c`](https://github.com/apache/spark/commit/188fe8c88018f62ef68247fd3c4024c2ca44d7f1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class StreamingCacheWatcher(override val queryExecution: QueryExecution)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    After rethinking it, to fundamentally fix the issue, the ideal design is to introduce a mechanism like a global `dynamic statement cache`. This can resolve many our existing issues and also improve the performance of our query compilation. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    **[Test build #79790 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79790/testReport)** for PR 18687 at commit [`6b21c6b`](https://github.com/apache/spark/commit/6b21c6b2408dcbc6fec938dba94eeb5d70387b2e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    cc @cloud-fan Can you help review this too? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    **[Test build #79878 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79878/testReport)** for PR 18687 at commit [`1aa0520`](https://github.com/apache/spark/commit/1aa0520ce9492545da23f62a709a3530fa253afd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    My above example is pretty common in many Spark SQL use cases. Many users rely on it. As long as one table is cached in one session, the other sessions can use the cached table without reading the table again. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79881/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL] Fix inconsistent query plans of Datas...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    **[Test build #79791 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79791/testReport)** for PR 18687 at commit [`3c16c3c`](https://github.com/apache/spark/commit/3c16c3c09ebc81f2f6b7ffb6430f9305bfe4030c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18687: [SPARK-21484][SQL][WIP] Fix inconsistent query plans of ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18687
  
    **[Test build #79876 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79876/testReport)** for PR 18687 at commit [`6e84930`](https://github.com/apache/spark/commit/6e849309e8582730bcc4d831b7fd175fcc7602fd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18687: [SPARK-21484][SQL] Fix inconsistent query plans o...

Posted by viirya <gi...@git.apache.org>.
Github user viirya closed the pull request at:

    https://github.com/apache/spark/pull/18687


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org