You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by liancheng <gi...@git.apache.org> on 2014/05/23 13:32:41 UTC

[GitHub] spark pull request: [SPARK-1913][SQL] Bug fix: column pruning erro...

GitHub user liancheng opened a pull request:

    https://github.com/apache/spark/pull/863

    [SPARK-1913][SQL] Bug fix: column pruning error in Parquet support

    JIRA issue: [SPARK-1913](https://issues.apache.org/jira/browse/SPARK-1913)
    
    When scanning Parquet tables, attributes referenced only in predicates that are pushed down are not passed to the `ParquetTableScan` operator and causes exception.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liancheng/spark spark-1913

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/863.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #863
    
----
commit ae60ab38e729c977cac83e956c553e53254dd01f
Author: Cheng Lian <li...@gmail.com>
Date:   2014-05-23T10:57:29Z

    [SPARK-1913] Attributes referenced only in predicates pushed down should remain in ParquetTableScan operator

commit f5b257dc7830b663f1a52c37371e0821da0a684b
Author: Cheng Lian <li...@gmail.com>
Date:   2014-05-23T11:25:26Z

    Added back comments deleted by mistake

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1913][SQL] Bug fix: column pruning erro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/863#issuecomment-44084020
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1913][SQL] Bug fix: column pruning erro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/863#issuecomment-44085430
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15179/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1913][SQL] Bug fix: column pruning erro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/863#issuecomment-43998612
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1913][SQL] Bug fix: column pruning erro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/863#issuecomment-44004048
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1913][SQL] Bug fix: column pruning erro...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/863#issuecomment-44007103
  
    @marmbrus @rxin This bug should be blocking for Spark 1.0 release, please help review, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1913][SQL] Bug fix: column pruning erro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/863#issuecomment-44004052
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15166/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1913][SQL] Bug fix: column pruning erro...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/863


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1913][SQL] Bug fix: column pruning erro...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/863#issuecomment-44110667
  
    Thanks. I've merged this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1913][SQL] Bug fix: column pruning erro...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/863#discussion_r13025912
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
    @@ -206,17 +206,24 @@ class SQLContext(@transient val sparkContext: SparkContext)
          * final desired output requires complex expressions to be evaluated or when columns can be
          * further eliminated out after filtering has been done.
          *
    +     * The `prunePushedDownFilter` is used to remove those filters that can be removed by the filter
    +     * pushdown optimization.
    +     *
          * The required attributes for both filtering and expression evaluation are passed to the
          * provided `scanBuilder` function so that it can avoid unnecessary column materialization.
          */
         def pruneFilterProject(
             projectList: Seq[NamedExpression],
             filterPredicates: Seq[Expression],
    +        prunePushedDownFilter: Option[Expression => Boolean],
             scanBuilder: Seq[Attribute] => SparkPlan): SparkPlan = {
     
           val projectSet = projectList.flatMap(_.references).toSet
           val filterSet = filterPredicates.flatMap(_.references).toSet
    -      val filterCondition = filterPredicates.reduceLeftOption(And)
    +      val filterCondition = prunePushedDownFilter
    +        .map(filterPredicates.filter)
    --- End diff --
    
    Removed the `Option`, now it should be clear :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1913][SQL] Bug fix: column pruning erro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/863#issuecomment-44084024
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1913][SQL] Bug fix: column pruning erro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/863#issuecomment-44085429
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1913][SQL] Bug fix: column pruning erro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/863#issuecomment-43998583
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-1913][SQL] Bug fix: column pruning erro...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/863#discussion_r13019060
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
    @@ -206,17 +206,24 @@ class SQLContext(@transient val sparkContext: SparkContext)
          * final desired output requires complex expressions to be evaluated or when columns can be
          * further eliminated out after filtering has been done.
          *
    +     * The `prunePushedDownFilter` is used to remove those filters that can be removed by the filter
    +     * pushdown optimization.
    +     *
          * The required attributes for both filtering and expression evaluation are passed to the
          * provided `scanBuilder` function so that it can avoid unnecessary column materialization.
          */
         def pruneFilterProject(
             projectList: Seq[NamedExpression],
             filterPredicates: Seq[Expression],
    +        prunePushedDownFilter: Option[Expression => Boolean],
             scanBuilder: Seq[Attribute] => SparkPlan): SparkPlan = {
     
           val projectSet = projectList.flatMap(_.references).toSet
           val filterSet = filterPredicates.flatMap(_.references).toSet
    -      val filterCondition = filterPredicates.reduceLeftOption(And)
    +      val filterCondition = prunePushedDownFilter
    +        .map(filterPredicates.filter)
    --- End diff --
    
    I find this slightly hard to understand because prunePushedDownFilter is an option. Can we write this out as a full closure?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---