You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by viirya <gi...@git.apache.org> on 2016/03/25 10:03:42 UTC

[GitHub] spark pull request: [SPARK-14156][SQL] Use executedPlan in HiveCom...

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/11957

    [SPARK-14156][SQL] Use executedPlan in HiveComparisonTest for the messages of computed tables

    ## What changes were proposed in this pull request?
    JIRA: https://issues.apache.org/jira/browse/SPARK-14156
    
    In HiveComparisonTest, when catalyst results are different to hive results, we will collect the messages for computed tables during the test. During creating the message, we use sparkPlan. But we actually run the query with executedPlan. So the error message is sometimes confusing.
    
    ## How was this patch tested?
    Existing tests.
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 use-executedplan

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11957.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11957
    
----
commit 0c6b687819a8495519c97ac894b4d2e32ca3e026
Author: Liang-Chi Hsieh <si...@tw.ibm.com>
Date:   2016-03-25T08:56:57Z

    Use executedPlan for HiveComparisonTest.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14156][SQL] Use executedPlan in HiveCom...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/11957


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14156][SQL] Use executedPlan in HiveCom...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/11957#issuecomment-201979883
  
    cc @yhuai @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14156][SQL] Use executedPlan in HiveCom...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the pull request:

    https://github.com/apache/spark/pull/11957#issuecomment-201982155
  
    @yhuai I added it to PR description. Please let me know if it is clear now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14156][SQL] Use executedPlan in HiveCom...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11957#issuecomment-201236251
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54167/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14156][SQL] Use executedPlan in HiveCom...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/11957#issuecomment-201981360
  
    Can you attach an example showing the message before and the message after the change? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14156][SQL] Use executedPlan in HiveCom...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11957#issuecomment-202338885
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54303/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14156][SQL] Use executedPlan in HiveCom...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11957#discussion_r57542534
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala ---
    @@ -480,7 +480,7 @@ abstract class HiveComparisonTest
                     val executions = queryList.map(new TestHive.QueryExecution(_))
                     executions.foreach(_.toRdd)
                     val tablesGenerated = queryList.zip(executions).flatMap {
    -                  case (q, e) => e.sparkPlan.collect {
    +                  case (q, e) => e.executedPlan.collect {
    --- End diff --
    
    OK. Let me do it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14156][SQL] Use executedPlan in HiveCom...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11957#issuecomment-202338881
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14156][SQL] Use executedPlan in HiveCom...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11957#issuecomment-201212757
  
    **[Test build #54167 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54167/consoleFull)** for PR 11957 at commit [`0c6b687`](https://github.com/apache/spark/commit/0c6b687819a8495519c97ac894b4d2e32ca3e026).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14156][SQL] Use executedPlan in HiveCom...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11957#issuecomment-201236054
  
    **[Test build #54167 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54167/consoleFull)** for PR 11957 at commit [`0c6b687`](https://github.com/apache/spark/commit/0c6b687819a8495519c97ac894b4d2e32ca3e026).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14156][SQL] Use executedPlan in HiveCom...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11957#discussion_r57536680
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala ---
    @@ -480,7 +480,7 @@ abstract class HiveComparisonTest
                     val executions = queryList.map(new TestHive.QueryExecution(_))
                     executions.foreach(_.toRdd)
                     val tablesGenerated = queryList.zip(executions).flatMap {
    -                  case (q, e) => e.sparkPlan.collect {
    +                  case (q, e) => e.executedPlan.collect {
    --- End diff --
    
    Here we just collect `InsertIntoHiveTable` node to verify its table name is included in `tablesRead`. Next, for each `InsertIntoHiveTable` node, we will call `insert.child.execute().collect()` to run the its child plan.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14156][SQL] Use executedPlan in HiveCom...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11957#discussion_r57523346
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala ---
    @@ -480,7 +480,7 @@ abstract class HiveComparisonTest
                     val executions = queryList.map(new TestHive.QueryExecution(_))
                     executions.foreach(_.toRdd)
                     val tablesGenerated = queryList.zip(executions).flatMap {
    -                  case (q, e) => e.sparkPlan.collect {
    +                  case (q, e) => e.executedPlan.collect {
    --- End diff --
    
    We do extra processing for `sparkPlan` in `prepareForExecution`. It includes four rules now:
    
        override val batches: Seq[Batch] = Seq(
          Batch("Subquery", Once, PlanSubqueries(SessionState.this)),
          Batch("Add exchange", Once, EnsureRequirements(conf)),
          Batch("Whole stage codegen", Once, CollapseCodegenStages(conf)),
          Batch("Reuse duplicated exchanges", Once, ReuseExchange(conf))
        )
    
    The error shown in PR description is happened because `sparkPlan` lacks the processing of `EnsureRequirements` (i.e., "Add exchange" rule). We will add additional node in `sparkPlan` to enable exchanging.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14156][SQL] Use executedPlan in HiveCom...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11957#issuecomment-202318706
  
    **[Test build #54303 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54303/consoleFull)** for PR 11957 at commit [`afecd4a`](https://github.com/apache/spark/commit/afecd4ad3a740101aa5331813eb75be64f62e297).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14156][SQL] Use executedPlan in HiveCom...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11957#discussion_r57541486
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala ---
    @@ -480,7 +480,7 @@ abstract class HiveComparisonTest
                     val executions = queryList.map(new TestHive.QueryExecution(_))
                     executions.foreach(_.toRdd)
                     val tablesGenerated = queryList.zip(executions).flatMap {
    -                  case (q, e) => e.sparkPlan.collect {
    +                  case (q, e) => e.executedPlan.collect {
    --- End diff --
    
    OK. Then, let's explain why we have to make the change at here and how it affects the code that is not shown in the diff.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14156][SQL] Use executedPlan in HiveCom...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/11957#issuecomment-202502268
  
    Thanks. Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14156][SQL] Use executedPlan in HiveCom...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11957#discussion_r57523270
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala ---
    @@ -480,7 +480,7 @@ abstract class HiveComparisonTest
                     val executions = queryList.map(new TestHive.QueryExecution(_))
                     executions.foreach(_.toRdd)
                     val tablesGenerated = queryList.zip(executions).flatMap {
    -                  case (q, e) => e.sparkPlan.collect {
    +                  case (q, e) => e.executedPlan.collect {
    --- End diff --
    
    At here, we were using `sparkPlan` to do the collect. Does using `executedPlan` make any difference? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14156][SQL] Use executedPlan in HiveCom...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11957#issuecomment-202338620
  
    **[Test build #54303 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54303/consoleFull)** for PR 11957 at commit [`afecd4a`](https://github.com/apache/spark/commit/afecd4ad3a740101aa5331813eb75be64f62e297).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14156][SQL] Use executedPlan in HiveCom...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11957#issuecomment-201236250
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14156][SQL] Use executedPlan in HiveCom...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11957#discussion_r57533507
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveComparisonTest.scala ---
    @@ -480,7 +480,7 @@ abstract class HiveComparisonTest
                     val executions = queryList.map(new TestHive.QueryExecution(_))
                     executions.foreach(_.toRdd)
                     val tablesGenerated = queryList.zip(executions).flatMap {
    -                  case (q, e) => e.sparkPlan.collect {
    +                  case (q, e) => e.executedPlan.collect {
    --- End diff --
    
    Yea. But, at here, we are only collecting `InsertIntoHiveTable` nodes. Why using executedPlan at here solve the problem?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org