You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zsxwing <gi...@git.apache.org> on 2017/02/15 01:08:49 UTC

[GitHub] spark pull request #16934: [SPARK-19603][SS]Fix StreamingQuery explain comma...

GitHub user zsxwing opened a pull request:

    https://github.com/apache/spark/pull/16934

    [SPARK-19603][SS]Fix StreamingQuery explain command

    ## What changes were proposed in this pull request?
    
    `StreamingQuery.explain` doesn't show the correct streaming physical plan right now because `ExplainCommand` receives a runtime batch plan and its `logicalPlan.isStreaming` is always false.
    
    This PR adds `streaming` parameter to `ExplainCommand` to allow `StreamExecution` to specify that it's a streaming plan.
    
    ## How was this patch tested?
    
    The updated unit test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zsxwing/spark SPARK-19603

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16934.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16934
    
----
commit 3b6c86a5581df4bdb9a94eac095c9c1ee1363f47
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2017-02-15T01:04:21Z

    Fix StreamingQuery explain command

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16934: [SPARK-19603][SS]Fix StreamingQuery explain command

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16934
  
    **[Test build #72909 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72909/testReport)** for PR 16934 at commit [`3b6c86a`](https://github.com/apache/spark/commit/3b6c86a5581df4bdb9a94eac095c9c1ee1363f47).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16934: [SPARK-19603][SS]Fix StreamingQuery explain command

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16934
  
    **[Test build #72909 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72909/testReport)** for PR 16934 at commit [`3b6c86a`](https://github.com/apache/spark/commit/3b6c86a5581df4bdb9a94eac095c9c1ee1363f47).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16934: [SPARK-19603][SS]Fix StreamingQuery explain command

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16934
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72962/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16934: [SPARK-19603][SS]Fix StreamingQuery explain comma...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16934#discussion_r101398714
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala ---
    @@ -121,3 +121,25 @@ case class ExplainCommand(
         ("Error occurred during query planning: \n" + cause.getMessage).split("\n").map(Row(_))
       }
     }
    +
    +/** An explain command for users to see how a streaming batch is executed. */
    +case class StreamingExplainCommand(
    +    queryExecution: IncrementalExecution,
    +    extended: Boolean) extends RunnableCommand {
    +
    +  override val output: Seq[Attribute] =
    --- End diff --
    
    is this required? Just asking because the one above doesn't have it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16934: [SPARK-19603][SS]Fix StreamingQuery explain command

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/16934
  
    Thanks! Merging to master and 2.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16934: [SPARK-19603][SS]Fix StreamingQuery explain command

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16934
  
    **[Test build #72957 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72957/testReport)** for PR 16934 at commit [`ca268b9`](https://github.com/apache/spark/commit/ca268b99a6e55aebb1b35e3f7ce9ae60ef427838).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16934: [SPARK-19603][SS]Fix StreamingQuery explain command

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16934
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16934: [SPARK-19603][SS]Fix StreamingQuery explain command

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16934
  
    **[Test build #72957 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72957/testReport)** for PR 16934 at commit [`ca268b9`](https://github.com/apache/spark/commit/ca268b99a6e55aebb1b35e3f7ce9ae60ef427838).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16934: [SPARK-19603][SS]Fix StreamingQuery explain command

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on the issue:

    https://github.com/apache/spark/pull/16934
  
    This solution is okay. I'm just not sure why we use `lastExecution.logicalPlan` instead of `logicalPlan` directly. In that case, we wouldn't need the `streaming = true` option for `ExplainCommand`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16934: [SPARK-19603][SS]Fix StreamingQuery explain command

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16934
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16934: [SPARK-19603][SS]Fix StreamingQuery explain command

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16934
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16934: [SPARK-19603][SS]Fix StreamingQuery explain comma...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16934#discussion_r101398532
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala ---
    @@ -277,10 +279,22 @@ class StreamSuite extends StreamTest {
     
       test("explain") {
         val inputData = MemoryStream[String]
    -    val df = inputData.toDS().map(_ + "foo")
    -    // Test `explain` not throwing errors
    -    df.explain()
    -    val q = df.writeStream.queryName("memory_explain").format("memory").start()
    +    val df = inputData.toDS().map(_ + "foo").groupBy("value").agg(count("*"))
    +
    +    // Test `df.explain`
    +    val explain = ExplainCommand(df.queryExecution.logical, extended = false)
    +    val explainString =
    +      spark.sessionState
    +        .executePlan(explain)
    +        .executedPlan
    +        .executeCollect()
    +        .map(_.getString(0))
    +        .mkString("\n")
    +    assert(explainString.contains("StateStoreRestore"))
    --- End diff --
    
    I would also check that this doesn't have a `LocalTableScan` but has a `StreamingRelation`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16934: [SPARK-19603][SS]Fix StreamingQuery explain command

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16934
  
    **[Test build #72962 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72962/testReport)** for PR 16934 at commit [`20eae3d`](https://github.com/apache/spark/commit/20eae3d06e521570d1e36af1b738d1cba8c0e5e7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16934: [SPARK-19603][SS]Fix StreamingQuery explain comma...

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16934#discussion_r101387562
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala ---
    @@ -673,7 +673,7 @@ class StreamExecution(
         if (lastExecution == null) {
           "No physical plan. Waiting for data."
         } else {
    -      val explain = ExplainCommand(lastExecution.logical, extended = extended)
    +      val explain = ExplainCommand(lastExecution.logical, extended = extended, streaming = true)
    --- End diff --
    
    So this means that this code will always return an updated plan for the last batch showing which data files were read instead of just referring to it as a StreamingRelation. We wouldn't have the bug if we had just used `logicalPlan` instead of `lastExecution.logicalPlan`, right? Then the problem would be that the `logicalPlan` may contain errors though?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16934: [SPARK-19603][SS]Fix StreamingQuery explain command

Posted by brkyvz <gi...@git.apache.org>.
Github user brkyvz commented on the issue:

    https://github.com/apache/spark/pull/16934
  
    @zsxwing Left one comment for tests and a question.
    It would be great if you could also post a snippet of the plans after this fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16934: [SPARK-19603][SS]Fix StreamingQuery explain command

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16934
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72909/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16934: [SPARK-19603][SS]Fix StreamingQuery explain command

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16934
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72957/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16934: [SPARK-19603][SS]Fix StreamingQuery explain command

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16934
  
    **[Test build #72962 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72962/testReport)** for PR 16934 at commit [`20eae3d`](https://github.com/apache/spark/commit/20eae3d06e521570d1e36af1b738d1cba8c0e5e7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16934: [SPARK-19603][SS]Fix StreamingQuery explain comma...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16934


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16934: [SPARK-19603][SS]Fix StreamingQuery explain command

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/16934
  
    Good question. Make me realize my fix was wrong. We cannot use `logicalPlan` here because it's not the runtime batch plan. The purpose of adding StreamingQuery.explain is to allow the user to see the runtime plan.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org