You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2016/03/13 03:06:28 UTC

[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/11682

    [SPARK-13840] [SQL] Disable Project Pushdown Through Filter

    #### What changes were proposed in this pull request?
    
    Before this PR, two Optimizer rules `ColumnPruning` and `PushPredicateThroughProject` reverse each other's effects. Optimizer always reaches the max iteration when optimizing some queries. Thus, we need to disable `Project` push down through `Filter` in the rule `columnPruning`. 
    
    The issue becomes worse when having another rule `NullFiltering`, which could add extra Filters for `IsNotNull`. We have to be careful when introducing extra `Filter` if the benefit is not large enough. cc @sameeragarwal @marmbrus 
    
    In addition, `ColumnPruning` should not push `Project` through non-deterministic `Filter`. This could cause wrong results.
    
    cc @davies @cloud-fan @yhuai 
    
    #### How was this patch tested?
    
    Modified the existing test cases.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark viewDuplicateNames

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11682.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11682
    
----
commit e128a0a3d23b7a6a37cdc77034a651d79e32c451
Author: gatorsmile <ga...@gmail.com>
Date:   2016-03-13T02:00:04Z

    Disable Project Push Down Through Filter

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196566777
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53119/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Split Optimizer Rule Colum...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196939816
  
    @davies I might understand your points. We still prefer `PushPredicateThroughProject`. Let me submit another PR to address that issue and at you. 
    
    Let me explain my understanding. They still conflicts with each other. However, now the changes can converge now without extra changes. It depends the order of these two rules. 
    
    If we want to push `Project` deeper, we should keep the current order. Then, after finishing the batch `Batch("Operator Optimizations")`, we can add another batch for `PushPredicateThroughProject`. We just need to run it once to make sure `Filter` is below `Project` in the final optimized plan.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196425642
  
    I'm worried that we are adding rules that aren't stable.  Perhaps we should make the rule executor throw an error when the testing flag is set if we ever hit the maximum number of iterations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196442838
  
    Yeah, that is a good idea. Let me try it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196566772
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196431220
  
    @davies Will try to find a way to keep both. Will submit a separate PR for handling non-determisitic Filter. 
    
    @marmbrus Yeah, your concern is valid. Will submit a separate PR to do what you said above. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196531576
  
    **[Test build #53116 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53116/consoleFull)** for PR 11682 at commit [`608b901`](https://github.com/apache/spark/commit/608b9014813174e2227389c7919074ed743fde47).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Split Optimizer Rule Colum...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196628698
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53144/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Split Optimizer Rule Colum...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196628692
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Split Optimizer Rule Colum...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11682#discussion_r56459726
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -410,6 +401,24 @@ object ColumnPruning extends Rule[LogicalPlan] {
     }
     
     /**
    + * Eliminate no-op Project and Window.
    + *
    + * Note: this rule should be executed just after ColumnPruning.
    + */
    +object EliminateOperators extends Rule[LogicalPlan] {
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    --- End diff --
    
    See the latest PR: https://github.com/apache/spark/pull/11745. It will enable it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196131489
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196569211
  
    **[Test build #53116 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53116/consoleFull)** for PR 11682 at commit [`608b901`](https://github.com/apache/spark/commit/608b9014813174e2227389c7919074ed743fde47).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196537010
  
    **[Test build #53121 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53121/consoleFull)** for PR 11682 at commit [`adf64da`](https://github.com/apache/spark/commit/adf64da3c3f86e6addb6cb813deba36bb4c7debf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-195862036
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53016/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Split Optimizer Rule Colum...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196617839
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196560839
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Split Optimizer Rule Colum...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196839488
  
    @davies In the case Filter(Project()), the second rule `EliminateOperators` removes the useless Project if it does not prune any column. 
    
    Could you explain any issue this PR misses? I can submit a follow-up PR to address it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Split Optimizer Rule Colum...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196732337
  
    @gatorsmile Have you missed the special rule for Filter(Project()) ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196110718
  
    **[Test build #53046 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53046/consoleFull)** for PR 11682 at commit [`e5e00ae`](https://github.com/apache/spark/commit/e5e00ae1d0f9885b05e3a81f8b084e1059151fba).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196569575
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53116/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196560475
  
    **[Test build #53121 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53121/consoleFull)** for PR 11682 at commit [`adf64da`](https://github.com/apache/spark/commit/adf64da3c3f86e6addb6cb813deba36bb4c7debf).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Split Optimizer Rule Colum...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-197724073
  
    @gatorsmile @rxin @cloud-fan Since this PR does not solve the problem as expected, also introduce other problems (can't prune columns for Filter(Join(xx)), I have reverted this patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196569574
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Split Optimizer Rule Colum...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/11682


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Split Optimizer Rule Colum...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11682#discussion_r56459598
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -410,6 +401,24 @@ object ColumnPruning extends Rule[LogicalPlan] {
     }
     
     /**
    + * Eliminate no-op Project and Window.
    + *
    + * Note: this rule should be executed just after ColumnPruning.
    + */
    +object EliminateOperators extends Rule[LogicalPlan] {
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    --- End diff --
    
    This is not what we want, it cause that we can't prune columsn for Filter(Join()).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-195862015
  
    **[Test build #53016 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53016/consoleFull)** for PR 11682 at commit [`e128a0a`](https://github.com/apache/spark/commit/e128a0a3d23b7a6a37cdc77034a651d79e32c451).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196565770
  
    **[Test build #53119 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53119/consoleFull)** for PR 11682 at commit [`f1eee03`](https://github.com/apache/spark/commit/f1eee03aa8d745044f18fd97c2dca543c421497f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Split Optimizer Rule Colum...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196594548
  
    **[Test build #53144 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53144/consoleFull)** for PR 11682 at commit [`bc4685a`](https://github.com/apache/spark/commit/bc4685aaabeeb4e58eb4aacccc4915d30349481e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196380248
  
    @gatorsmile Is this behavior new (introduced by a recent change) or previous versions of the optimizer also have the same behavior?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196131490
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53046/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196560843
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53121/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-195862035
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Split Optimizer Rule Colum...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196627725
  
    **[Test build #53144 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53144/consoleFull)** for PR 11682 at commit [`bc4685a`](https://github.com/apache/spark/commit/bc4685aaabeeb4e58eb4aacccc4915d30349481e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196433895
  
    After some offline discussions with @marmbrus , we realized that these two rules may still conflict with each other if the Filter can'be pushed throughout the child (for example, outer join). A better solution could be split the ColumnPruning as two rules: a) the first one add new Project  2) the second one remove unnecessary projects, include the Project under Filter (the Project should only prune some columns). The second rule ran just after the first role.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196388382
  
    @yhuai This was added in the recent change in `ColumnPruning`. 1.6 does not have such an issue. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Split Optimizer Rule Colum...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-197004091
  
    After writing more test cases, I found a couple of issues. In addition, we need another split of the rule `ColumnPruning`. Have a new rule `PushProjectThroughPredicate` before starting `ColumnPruning`.
    
    Anyway, will show the details in the PR. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196535654
  
    **[Test build #53119 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53119/consoleFull)** for PR 11682 at commit [`f1eee03`](https://github.com/apache/spark/commit/f1eee03aa8d745044f18fd97c2dca543c421497f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Split Optimizer Rule Colum...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-197725609
  
    @davies Sorry for that. 
    
    Could you review another PR: https://github.com/apache/spark/pull/11745 ? That is built on this PR to resolve all the issues. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Split Optimizer Rule Colum...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196922074
  
    @gatorsmile This latest changes does not address the problem in the PR description, ColumnPruning and PushPredicateThroughProject still conflict with each other, right?
    
    Could you also add a regression test for that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Split Optimizer Rule Colum...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11682#discussion_r56459774
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -410,6 +401,24 @@ object ColumnPruning extends Rule[LogicalPlan] {
     }
     
     /**
    + * Eliminate no-op Project and Window.
    + *
    + * Note: this rule should be executed just after ColumnPruning.
    + */
    +object EliminateOperators extends Rule[LogicalPlan] {
    +  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
    --- End diff --
    
    I realize it after this PR. Thus, I submitted a new one to fix the issue. Sorry for that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196131389
  
    **[Test build #53046 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53046/consoleFull)** for PR 11682 at commit [`e5e00ae`](https://github.com/apache/spark/commit/e5e00ae1d0f9885b05e3a81f8b084e1059151fba).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Split Optimizer Rule Colum...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196701013
  
    Merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-195850299
  
    **[Test build #53016 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53016/consoleFull)** for PR 11682 at commit [`e128a0a`](https://github.com/apache/spark/commit/e128a0a3d23b7a6a37cdc77034a651d79e32c451).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13840] [SQL] Disable Project Pushdown T...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/11682#issuecomment-196423960
  
    @gatorsmile Thanks for finding this. In general, both PushPredicateThroughProject and ColumnPruning are useful, we should keep both and fix the conflict (make them stable).
    
    I think they are only conflicted with each other when they are on opt of a LeafNode (can't be pushed further), so we don't need to insert a Project between Filter and LeafNode, because PhysicalOperation can already handle Project(Filter(LeafNode())) very well.
    
    For the non-deterministic Filter or Project, we may do things wrong in many places, could you create a separate JIRA for that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org