You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by saucam <gi...@git.apache.org> on 2015/04/07 14:27:12 UTC

[GitHub] spark pull request: [SQL][SPARK-6742]: Don't push down predicates ...

GitHub user saucam opened a pull request:

    https://github.com/apache/spark/pull/5390

    [SQL][SPARK-6742]: Don't push down predicates which reference partition column(s)

    cc @liancheng  

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/saucam/spark fpush

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/5390.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5390
    
----
commit 8592acc665241e2304c77427df35221fa7bfc020
Author: Yash Datta <ya...@guavus.com>
Date:   2015-04-07T12:09:20Z

    SPARK-6742: Don't push down predicates which reference partition column(s)

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL][SPARK-6742]: Don't push down predicates ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5390#issuecomment-91936716
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30087/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL][SPARK-6742]: Don't push down predicates ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/5390#issuecomment-91927790
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL][SPARK-6742]: Don't push down predicates ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5390#issuecomment-90533449
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL][SPARK-6742]: Don't push down predicates ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/5390


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL][SPARK-6742]: Don't push down predicates ...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5390#discussion_r28043392
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
    @@ -223,8 +229,12 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] {
                   // can result in "A OR C" being pushed down. Here we are conservative in the sense
                   // that even if "A" was pushed and we check for "A AND B" we still want to keep
                   // "A AND B" in the higher-level filter, not just "B".
    -              predicates.map(p => p -> ParquetFilters.createFilter(p)).collect {
    +              predicates.map(p => p -> ParquetFilters.createFilter(p)).collect { 
    --- End diff --
    
    Remove the trailing space please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL][SPARK-6742]: Don't push down predicates ...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5390#discussion_r28043391
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
    @@ -214,6 +214,12 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] {
           case logical.InsertIntoTable(table: ParquetRelation, partition, child, overwrite) =>
             InsertIntoParquetTable(table, planLater(child), overwrite) :: Nil
           case PhysicalOperation(projectList, filters: Seq[Expression], relation: ParquetRelation) =>
    +        val partitionColNames = relation.partitioningAttributes.map(_.name).toSet
    +        val filtersToPush = filters
    +          .filter { pred =>
    +            val referencedColNames = pred.references.map(_.name).toSet
    +            referencedColNames.intersect(partitionColNames).isEmpty
    +          }
    --- End diff --
    
    Nit, we'd prefer the following style:
    
    ```
    val filtersToPush = filters.filter { pred =>
      ...
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL][SPARK-6742]: Don't push down predicates ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5390#issuecomment-91997467
  
      [Test build #30105 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30105/consoleFull) for   PR 5390 at commit [`3f026d6`](https://github.com/apache/spark/commit/3f026d63104e5fb4ce11aa35dc21cc348dfe097b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL][SPARK-6742]: Don't push down predicates ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5390#issuecomment-92006095
  
      [Test build #30105 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30105/consoleFull) for   PR 5390 at commit [`3f026d6`](https://github.com/apache/spark/commit/3f026d63104e5fb4ce11aa35dc21cc348dfe097b).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL][SPARK-6742]: Don't push down predicates ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5390#discussion_r28199234
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetFilterSuite.scala ---
    @@ -22,7 +22,8 @@ import parquet.filter2.predicate.Operators._
     import parquet.filter2.predicate.{FilterPredicate, Operators}
     
     import org.apache.spark.sql.catalyst.dsl.expressions._
    -import org.apache.spark.sql.catalyst.expressions.{Attribute, Cast, Literal, Predicate, Row}
    +import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeReference, Cast, 
    --- End diff --
    
    [Don't wrap import statements](https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-LineLength).  After 6+ imports consider using `_`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL][SPARK-6742]: Don't push down predicates ...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/5390#issuecomment-91149021
  
    Good catch! #5210 fixed this for the new Parquet data source but forgot the old Parquet code path. Would you please to add a test case for this? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL][SPARK-6742]: Don't push down predicates ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/5390#issuecomment-92006098
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30105/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL][SPARK-6742]: Don't push down predicates ...

Posted by saucam <gi...@git.apache.org>.
Github user saucam commented on the pull request:

    https://github.com/apache/spark/pull/5390#issuecomment-91925973
  
    Added test case. please test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL][SPARK-6742]: Don't push down predicates ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5390#issuecomment-91936711
  
      [Test build #30087 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30087/consoleFull) for   PR 5390 at commit [`ce3d702`](https://github.com/apache/spark/commit/ce3d702625c98b1ad60da21fb16c2802e2766347).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.
     * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL][SPARK-6742]: Don't push down predicates ...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/5390#issuecomment-92509822
  
    Thanks!  Merged to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL][SPARK-6742]: Don't push down predicates ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/5390#issuecomment-91928278
  
      [Test build #30087 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30087/consoleFull) for   PR 5390 at commit [`ce3d702`](https://github.com/apache/spark/commit/ce3d702625c98b1ad60da21fb16c2802e2766347).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org