You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2018/02/01 14:40:54 UTC

[GitHub] spark pull request #20476: [SPARK-23301][SQL] data source column pruning sho...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/20476

    [SPARK-23301][SQL] data source column pruning should work for arbitrary expressions

    ## What changes were proposed in this pull request?
    
    This PR fixes a mistake in the `PushDownOperatorsToDataSource` rule, the column pruning logic is incorrect about `Project`.
    
    ## How was this patch tested?
    
    a new test case for column pruning with arbitrary expressions, and improve the existing tests to make sure the `PushDownOperatorsToDataSource` really works.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark push-down

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20476.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20476
    
----
commit 353dd6bc60ce7123c392d7b51a496d45b1d7ab5c
Author: Wenchen Fan <we...@...>
Date:   2018-02-01T12:02:23Z

    data source column pruning should work for arbitrary expressions

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/481/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20476: [SPARK-23301][SQL] data source column pruning sho...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20476#discussion_r165375177
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala ---
    @@ -81,35 +81,34 @@ object PushDownOperatorsToDataSource extends Rule[LogicalPlan] with PredicateHel
     
         // TODO: add more push down rules.
     
    -    // TODO: nested fields pruning
    -    def pushDownRequiredColumns(plan: LogicalPlan, requiredByParent: Seq[Attribute]): Unit = {
    -      plan match {
    -        case Project(projectList, child) =>
    -          val required = projectList.filter(requiredByParent.contains).flatMap(_.references)
    --- End diff --
    
    This line is wrong and I fixed to https://github.com/apache/spark/pull/20476/files#diff-b7f3810e65a2bb1585de9609ea491469R93


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86956/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Posted by rdblue <gi...@git.apache.org>.

Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    @gatorsmile, thanks for the context. If we need to redesign push-down, then I think we should do that separately and with a design plan.
    
    **I don't think it's a good idea to bundle it into an unrelated API update.**
    
    For one thing, we want to be able to use the existing tests for the redesigned push-down strategy, not reimplement them in pieces. We also don't want to conflate the two changes for early adopters of the new API. V2 should be as reliable as possible by minimizing new behavior.
    
    This just isn't the right place to test out experimental designs for push-down operations.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    cc @gatorsmile @rdblue most of the changes are tests.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/500/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20476: [SPARK-23301][SQL] data source column pruning sho...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20476#discussion_r165374987
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala ---
    @@ -81,35 +81,34 @@ object PushDownOperatorsToDataSource extends Rule[LogicalPlan] with PredicateHel
     
         // TODO: add more push down rules.
     
    -    // TODO: nested fields pruning
    -    def pushDownRequiredColumns(plan: LogicalPlan, requiredByParent: Seq[Attribute]): Unit = {
    --- End diff --
    
    make it a private method instead of an inline method


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    Since you are being more and more familar with our codes, I believe you can offer us more useful inputs. 
    
    Let me merge this PR for fixing the bugs. Then, we can have more detailed discussions later?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    LGTM.
    
    Thanks! Merged to master/2.3


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    **[Test build #86956 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86956/testReport)** for PR 20476 at commit [`12c8035`](https://github.com/apache/spark/commit/12c8035704d4feb48e91e01566386a9fe522397b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Posted by rdblue <gi...@git.apache.org>.

Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    @cloud-fan, @gatorsmile, this PR demonstrates why we should use PhysicalOperation. I ported the tests from this PR over to our branch and they pass without modifying the push-down code. That's because it reuses code that we already trust.
    
    I'm see no benefit to using a brand new code path for push-down when we can use what is already well tested. I know you want to push other operations, but I've already raised concerns about the design of this new code: it is brittle because it requires matching specific plan nodes.
    
    Push-down should work as it always has: by pushing nodes that are adjacent to relations in the logical plan and relying on the optimizer to push projections and filters down as far as possible. The separation of concerns into simple rules is fundamental to the design of the optimizer. I don't think there is a good argument for new code that breaks how the optimizer is intended to work.
    
    cc @henryr, who might want to chime in.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    @rdblue I know you wanna use `PhysicalOperation` to replace the current operator pushdown rule, but before we reach a consensus, I think we should still fix bugs in the existing code.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20476: [SPARK-23301][SQL] data source column pruning sho...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20476


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    To everyone, this is a bug fix we should merge before the next RC of Spark 2.3. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    **[Test build #86933 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86933/testReport)** for PR 20476 at commit [`353dd6b`](https://github.com/apache/spark/commit/353dd6bc60ce7123c392d7b51a496d45b1d7ab5c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20476: [SPARK-23301][SQL] data source column pruning sho...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20476#discussion_r165375489
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala ---
    @@ -81,35 +81,34 @@ object PushDownOperatorsToDataSource extends Rule[LogicalPlan] with PredicateHel
     
         // TODO: add more push down rules.
     
    -    // TODO: nested fields pruning
    -    def pushDownRequiredColumns(plan: LogicalPlan, requiredByParent: Seq[Attribute]): Unit = {
    -      plan match {
    -        case Project(projectList, child) =>
    -          val required = projectList.filter(requiredByParent.contains).flatMap(_.references)
    -          pushDownRequiredColumns(child, required)
    -
    -        case Filter(condition, child) =>
    -          val required = requiredByParent ++ condition.references
    -          pushDownRequiredColumns(child, required)
    -
    -        case DataSourceV2Relation(fullOutput, reader) => reader match {
    -          case r: SupportsPushDownRequiredColumns =>
    -            // Match original case of attributes.
    -            val attrMap = AttributeMap(fullOutput.zip(fullOutput))
    -            val requiredColumns = requiredByParent.map(attrMap)
    -            r.pruneColumns(requiredColumns.toStructType)
    -          case _ =>
    -        }
    +    pushDownRequiredColumns(filterPushed, filterPushed.outputSet)
    +    // After column pruning, we may have redundant PROJECT nodes in the query plan, remove them.
    +    RemoveRedundantProject(filterPushed)
    +  }
    +
    +  // TODO: nested fields pruning
    +  private def pushDownRequiredColumns(plan: LogicalPlan, requiredByParent: AttributeSet): Unit = {
    +    plan match {
    +      case Project(projectList, child) =>
    +        val required = projectList.flatMap(_.references)
    +        pushDownRequiredColumns(child, AttributeSet(required))
    +
    +      case Filter(condition, child) =>
    +        val required = requiredByParent ++ condition.references
    +        pushDownRequiredColumns(child, required)
     
    -        // TODO: there may be more operators can be used to calculate required columns, we can add
    -        // more and more in the future.
    -        case _ => plan.children.foreach(child => pushDownRequiredColumns(child, child.output))
    +      case relation: DataSourceV2Relation => relation.reader match {
    +        case reader: SupportsPushDownRequiredColumns =>
    +          val requiredColumns = relation.output.filter(requiredByParent.contains)
    --- End diff --
    
    a cleaner way to retain the original case of attributes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86933/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    https://github.com/apache/spark/pull/19424 is the original PR that introduced the new rule `PushDownOperatorsToDataSource`. Both of us reviewed it. : )
    
    Thank you for your understanding! We can have more design discussion in the next few months when you tried the new data source APIs. The code quality is always critical for Spark. We are trying to add more test cases to ensure the codes are stable and well-tested, even if we introduced new APIs.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Posted by rdblue <gi...@git.apache.org>.

Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    @gatorsmile, Do you mean this?
    
    > Extensibility is not good and operator push-down capabilities are limited.
    
    If so, that's very open to interpretation. I would assume it means that the V2 interfaces should support more than just projection and filter push-down, but not a redesign of how push-down happens in the optimizer. Even if it is called out as a goal, I now see it as a misguided choice.
    
    But either way, you make a good point about changing things for a release. I'll defer to your judgement about what should be done for the release. But for the long term, I think this issue underscores my point about reusing code that already works. Let's separate DSv2 from a push-down redesign and get it working reliably without introducing more risk and unknown problems.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #20476: [SPARK-23301][SQL] data source column pruning sho...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20476#discussion_r165437683
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala ---
    @@ -81,35 +81,34 @@ object PushDownOperatorsToDataSource extends Rule[LogicalPlan] with PredicateHel
     
         // TODO: add more push down rules.
     
    -    // TODO: nested fields pruning
    -    def pushDownRequiredColumns(plan: LogicalPlan, requiredByParent: Seq[Attribute]): Unit = {
    -      plan match {
    -        case Project(projectList, child) =>
    -          val required = projectList.filter(requiredByParent.contains).flatMap(_.references)
    -          pushDownRequiredColumns(child, required)
    -
    -        case Filter(condition, child) =>
    -          val required = requiredByParent ++ condition.references
    -          pushDownRequiredColumns(child, required)
    -
    -        case DataSourceV2Relation(fullOutput, reader) => reader match {
    -          case r: SupportsPushDownRequiredColumns =>
    -            // Match original case of attributes.
    -            val attrMap = AttributeMap(fullOutput.zip(fullOutput))
    -            val requiredColumns = requiredByParent.map(attrMap)
    -            r.pruneColumns(requiredColumns.toStructType)
    -          case _ =>
    -        }
    +    pushDownRequiredColumns(filterPushed, filterPushed.outputSet)
    +    // After column pruning, we may have redundant PROJECT nodes in the query plan, remove them.
    +    RemoveRedundantProject(filterPushed)
    +  }
    +
    +  // TODO: nested fields pruning
    +  private def pushDownRequiredColumns(plan: LogicalPlan, requiredByParent: AttributeSet): Unit = {
    +    plan match {
    +      case Project(projectList, child) =>
    +        val required = projectList.flatMap(_.references)
    +        pushDownRequiredColumns(child, AttributeSet(required))
    +
    +      case Filter(condition, child) =>
    +        val required = requiredByParent ++ condition.references
    +        pushDownRequiredColumns(child, required)
     
    -        // TODO: there may be more operators can be used to calculate required columns, we can add
    -        // more and more in the future.
    -        case _ => plan.children.foreach(child => pushDownRequiredColumns(child, child.output))
    +      case relation: DataSourceV2Relation => relation.reader match {
    +        case reader: SupportsPushDownRequiredColumns =>
    +          val requiredColumns = relation.output.filter(requiredByParent.contains)
    +          reader.pruneColumns(requiredColumns.toStructType)
    +
    +        case _ =>
           }
    -    }
     
    -    pushDownRequiredColumns(filterPushed, filterPushed.output)
    -    // After column pruning, we may have redundant PROJECT nodes in the query plan, remove them.
    -    RemoveRedundantProject(filterPushed)
    +      // TODO: there may be more operators can be used to calculate required columns, we can add
    +      // more and more in the future.
    --- End diff --
    
    Nit.  `there may be more operators that can be used to calculate the required columns. We can add more and more in the future.`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Posted by rdblue <gi...@git.apache.org>.

Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    Yeah, I did review it, but at the time I wasn't familiar with how the other code paths worked and assumed that it was necessary to introduce this. I wasn't very familiar with how it *should* work, so I didn't +1 it.
    
    There are a few telling comments though:
    
    > How do we know that there aren't more cases that need to be supported?
    
    > What are the guarantees made by the previous batches in the optimizer? The work done by FilterAndProject seems redundant to me because the optimizer should already push filters below projection. Is that not guaranteed by the time this runs?
    
    In any case, I now think that we should not introduce a new push-down design in conjunction with DSv2. Let's get DSv2 working properly and redesign push-down separately. In parallel is fine by me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    @rdblue To be honest, the push-down solution in the current code base is not well designed. We got many feedbacks from the community (e.g., SAP and IBM Research). One proposed a bottom-up solution and another proposed a top-down solution. No solution is perfect. 
    
    In this release, we want to introduce a new solution for enhancing the capability of operator push-down. The new code path is not stable yet. We are welcoming the community to try it and provide more feedbacks about it. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/498/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    **[Test build #86933 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86933/testReport)** for PR 20476 at commit [`353dd6b`](https://github.com/apache/spark/commit/353dd6bc60ce7123c392d7b51a496d45b1d7ab5c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    **[Test build #86956 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86956/testReport)** for PR 20476 at commit [`12c8035`](https://github.com/apache/spark/commit/12c8035704d4feb48e91e01566386a9fe522397b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #20476: [SPARK-23301][SQL] data source column pruning should wor...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20476
  
    @rdblue Operator pushdown is part of the [data source API V2 SPIP](https://docs.google.com/document/d/1n_vUVbF4KD3gxTmkNEon5qdQ-Z8qU5Frf6WMQZ6jJVM/edit#): https://issues.apache.org/jira/browse/SPARK-15689
    
    Based on the PR review history, it sounds like you also reviewed the proposal and the prototype. Since we are trying to finish the release of Spark 2.3, it might be too late to rewrite everything at the last minute. 
    
    When more users try it, we might get more feedbacks about this. Then, we can have more discussion. Hopefully, in the next release, the community can get the consensus about the design of operator push-down. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org