You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2018/02/02 07:57:45 UTC

[GitHub] spark pull request #20485: [SPARK-23315][SQL] failed to get output from cano...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/20485

    [SPARK-23315][SQL] failed to get output from canonicalized data source v2 related plans

    ## What changes were proposed in this pull request?
    
    `DataSourceV2Relation`  keeps a `fullOutput` and resolves the real output on demand by column name lookup. This will be broken after we canonicalize the plan, because all attribute names become "None".
    
    To fix this, `DataSourceV2Relation` should just keep `output`, and update the `output` when doing column pruning.
    
    ## How was this patch tested?
    
    a new test case

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark canonicalize

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20485.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20485
    
----
commit 75950a1725f01c31764ac31d16acd6e2078956c6
Author: Wenchen Fan <we...@...>
Date:   2018-02-02T07:53:07Z

    failed to get output from canonicalized data source v2 related plans

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/549/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    Sounds fine to me, then.
    
    My focus is on the long-term design issues. I still think that the changes to make plans immutable and to use the existing push-down code as much as possible is the best way to get a reliable 2.3.0, but it is fine if they don't make the release.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    **[Test build #86999 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86999/testReport)** for PR 20485 at commit [`3aa0438`](https://github.com/apache/spark/commit/3aa043897bea5de1c230db6386d832e9b2993df3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20485: [SPARK-23315][SQL] failed to get output from cano...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20485#discussion_r165579903
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala ---
    @@ -81,33 +81,44 @@ object PushDownOperatorsToDataSource extends Rule[LogicalPlan] with PredicateHel
     
         // TODO: add more push down rules.
     
    -    pushDownRequiredColumns(filterPushed, filterPushed.outputSet)
    +    val columnPruned = pushDownRequiredColumns(filterPushed, filterPushed.outputSet)
         // After column pruning, we may have redundant PROJECT nodes in the query plan, remove them.
    -    RemoveRedundantProject(filterPushed)
    +    RemoveRedundantProject(columnPruned)
       }
     
       // TODO: nested fields pruning
    -  private def pushDownRequiredColumns(plan: LogicalPlan, requiredByParent: AttributeSet): Unit = {
    +  private def pushDownRequiredColumns(
    +      plan: LogicalPlan, requiredByParent: AttributeSet): LogicalPlan = {
         plan match {
    -      case Project(projectList, child) =>
    +      case p @ Project(projectList, child) =>
             val required = projectList.flatMap(_.references)
    -        pushDownRequiredColumns(child, AttributeSet(required))
    +        p.copy(child = pushDownRequiredColumns(child, AttributeSet(required)))
     
    -      case Filter(condition, child) =>
    +      case f @ Filter(condition, child) =>
             val required = requiredByParent ++ condition.references
    -        pushDownRequiredColumns(child, required)
    +        f.copy(child = pushDownRequiredColumns(child, required))
     
           case relation: DataSourceV2Relation => relation.reader match {
             case reader: SupportsPushDownRequiredColumns =>
    +          // TODO: Enable the below assert after we make `DataSourceV2Relation` immutable. Fow now
    +          // it's possible that the mutable reader being updated by someone else, and we need to
    +          // always call `reader.pruneColumns` here to correct it.
    +          // assert(relation.output.toStructType == reader.readSchema(),
    +          //  "Schema of data source reader does not match the relation plan.")
    +
               val requiredColumns = relation.output.filter(requiredByParent.contains)
               reader.pruneColumns(requiredColumns.toStructType)
     
    -        case _ =>
    +          val nameToAttr = relation.output.map(_.name).zip(relation.output).toMap
    +          val newOutput = reader.readSchema().map(_.name).map(nameToAttr)
    +          relation.copy(output = newOutput)
    --- End diff --
    
    @rdblue This is the bug I mentioned before. Finally I figured out a way to fix it surgically: always run column pruning even no column needs to be pruned. This helps us correct the required schema of the reader, if it's updated by someone else.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/536/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    **[Test build #87086 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87086/testReport)** for PR 20485 at commit [`3aa0438`](https://github.com/apache/spark/commit/3aa043897bea5de1c230db6386d832e9b2993df3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/522/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    **[Test build #87013 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87013/testReport)** for PR 20485 at commit [`3aa0438`](https://github.com/apache/spark/commit/3aa043897bea5de1c230db6386d832e9b2993df3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    **[Test build #86999 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86999/testReport)** for PR 20485 at commit [`3aa0438`](https://github.com/apache/spark/commit/3aa043897bea5de1c230db6386d832e9b2993df3).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87086/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/607/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    **[Test build #87013 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87013/testReport)** for PR 20485 at commit [`3aa0438`](https://github.com/apache/spark/commit/3aa043897bea5de1c230db6386d832e9b2993df3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    LGTM Thanks! Merged to master/2.3


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86999/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20485: [SPARK-23315][SQL] failed to get output from cano...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20485#discussion_r166436103
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala ---
    @@ -81,33 +81,44 @@ object PushDownOperatorsToDataSource extends Rule[LogicalPlan] with PredicateHel
     
         // TODO: add more push down rules.
     
    -    pushDownRequiredColumns(filterPushed, filterPushed.outputSet)
    +    val columnPruned = pushDownRequiredColumns(filterPushed, filterPushed.outputSet)
         // After column pruning, we may have redundant PROJECT nodes in the query plan, remove them.
    -    RemoveRedundantProject(filterPushed)
    +    RemoveRedundantProject(columnPruned)
       }
     
       // TODO: nested fields pruning
    -  private def pushDownRequiredColumns(plan: LogicalPlan, requiredByParent: AttributeSet): Unit = {
    +  private def pushDownRequiredColumns(
    +      plan: LogicalPlan, requiredByParent: AttributeSet): LogicalPlan = {
         plan match {
    -      case Project(projectList, child) =>
    +      case p @ Project(projectList, child) =>
             val required = projectList.flatMap(_.references)
    -        pushDownRequiredColumns(child, AttributeSet(required))
    +        p.copy(child = pushDownRequiredColumns(child, AttributeSet(required)))
     
    -      case Filter(condition, child) =>
    +      case f @ Filter(condition, child) =>
             val required = requiredByParent ++ condition.references
    -        pushDownRequiredColumns(child, required)
    +        f.copy(child = pushDownRequiredColumns(child, required))
     
           case relation: DataSourceV2Relation => relation.reader match {
             case reader: SupportsPushDownRequiredColumns =>
    +          // TODO: Enable the below assert after we make `DataSourceV2Relation` immutable. Fow now
    --- End diff --
    
    Typo: Fow


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    **[Test build #86979 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86979/testReport)** for PR 20485 at commit [`75950a1`](https://github.com/apache/spark/commit/75950a1725f01c31764ac31d16acd6e2078956c6).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    **[Test build #87086 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87086/testReport)** for PR 20485 at commit [`3aa0438`](https://github.com/apache/spark/commit/3aa043897bea5de1c230db6386d832e9b2993df3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20485: [SPARK-23315][SQL] failed to get output from cano...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20485


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    @gatorsmile @rdblue please review and LGTM this. This will unblock my PR - https://github.com/apache/spark/pull/20445


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    **[Test build #86981 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86981/testReport)** for PR 20485 at commit [`3aa0438`](https://github.com/apache/spark/commit/3aa043897bea5de1c230db6386d832e9b2993df3).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    @rdblue This is another bug we found during the code review. The goal is to ensure Data Source API V2 is usable with at least the same feature sets as Data source API V1. 
    
    After getting more feedbacks about Data Source API V2 from the community, we will restart the discussion about the data source API design in the next release. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/518/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    cc @tdas @jose-torres @rdblue @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by rdblue <gi...@git.apache.org>.
Github user rdblue commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    To be clear, the purpose of this commit, like #20476, is just to get something working for the 2.3.0 release?
    
    I just want to make sure since I think we should be approaching these problems with a better initial design for the integration. I'm fine getting this in to unblock a release, but if it isn't for that purpose then I think we should fix the design problems first.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20485: [SPARK-23315][SQL] failed to get output from cano...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20485#discussion_r165578555
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushDownOperatorsToDataSource.scala ---
    @@ -99,15 +100,22 @@ object PushDownOperatorsToDataSource extends Rule[LogicalPlan] with PredicateHel
     
           case relation: DataSourceV2Relation => relation.reader match {
             case reader: SupportsPushDownRequiredColumns =>
    +          // TODO: Enable the below assert after we make `DataSourceV2Relation` immutable. Fow now
    +          // it's possible that the mutable reader being updated by someone else, and we need to
    +          // always call `reader.pruneColumns` here to correct it.
    +          // assert(relation.output.toStructType == reader.readSchema(),
    +          //  "Schema of data source reader does not match the relation plan.")
    +
               val requiredColumns = relation.output.filter(requiredByParent.contains)
               reader.pruneColumns(requiredColumns.toStructType)
    +          relation.copy(output = requiredColumns)
    --- End diff --
    
    @rdblue This is the bug I mentioned before. Finally I figured out a way to fix it surgically: always run column pruning even no column needs to be pruned. This helps us correct the required schema of the reader, if it's updated by someone else.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87013/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86981/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    **[Test build #86981 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86981/testReport)** for PR 20485 at commit [`3aa0438`](https://github.com/apache/spark/commit/3aa043897bea5de1c230db6386d832e9b2993df3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86979/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    **[Test build #86979 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86979/testReport)** for PR 20485 at commit [`75950a1`](https://github.com/apache/spark/commit/75950a1725f01c31764ac31d16acd6e2078956c6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    jenkins retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20485: [SPARK-23315][SQL] failed to get output from canonicaliz...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20485
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org