You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zsxwing <gi...@git.apache.org> on 2016/04/04 23:00:02 UTC

[GitHub] spark pull request: [SPARK-14382][SQL]QueryProgress should be post...

GitHub user zsxwing opened a pull request:

    https://github.com/apache/spark/pull/12155

    [SPARK-14382][SQL]QueryProgress should be post after committedOffsets is updated

    ## What changes were proposed in this pull request?
    
    Make sure QueryProgress is post after committedOffsets is updated. If QueryProgress is post before committedOffsets is updated, the listener may see a wrong sinkStatus (created from committedOffsets).
    
    See https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.2/644/testReport/junit/org.apache.spark.sql.util/ContinuousQueryListenerSuite/single_listener/ for an example of the failure.
    
    ## How was this patch tested?
    
    Existing unit tests.
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zsxwing/spark SPARK-14382

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12155.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12155
    
----
commit 9717c559d884534eee6ed30539334874a172eea9
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2016-04-04T20:58:12Z

    QueryProgress should be post after committedOffsets is updated

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14382][SQL]QueryProgress should be post...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12155#discussion_r58472736
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala ---
    @@ -331,7 +338,6 @@ class StreamExecution(
     
         val batchTime = (System.nanoTime() - startTime).toDouble / 1000000
         logInfo(s"Completed up to $availableOffsets in ${batchTime}ms")
    --- End diff --
    
    Yeah.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14382][SQL]QueryProgress should be post...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12155#issuecomment-205622521
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14382][SQL]QueryProgress should be post...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12155#issuecomment-205525138
  
    **[Test build #54894 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54894/consoleFull)** for PR 12155 at commit [`9717c55`](https://github.com/apache/spark/commit/9717c559d884534eee6ed30539334874a172eea9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14382][SQL]QueryProgress should be post...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12155#discussion_r58453569
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/util/ContinuousQueryListenerSuite.scala ---
    @@ -212,7 +211,7 @@ class ContinuousQueryListenerSuite extends StreamTest with SharedSQLContext with
     
       case class QueryStatus(
         active: Boolean,
    -    expection: Option[Exception],
    --- End diff --
    
    OMG! How did we miss this the first time!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14382][SQL]QueryProgress should be post...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12155#issuecomment-205622373
  
    **[Test build #54925 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54925/consoleFull)** for PR 12155 at commit [`41b4996`](https://github.com/apache/spark/commit/41b4996dee42bb5687564cb7257aba6363a6e030).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14382][SQL]QueryProgress should be post...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12155#discussion_r58472923
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala ---
    @@ -225,17 +228,21 @@ class StreamExecution(
       }
     
       /**
    -   * Queries all of the sources to see if any new data is available. When there is new data the
    -   * batchId counter is incremented and a new log entry is written with the newest offsets.
    -   *
    -   * Note that committing the offsets for a new batch implicitly marks the previous batch as
    -   * finished and thus this method should only be called when all currently available data
    -   * has been written to the sink.
    +   * Commit the batch. Note that committing the offsets for a new batch implicitly marks the
    +   * previous batch as finished and thus this method should only be called when all currently
    +   * available data has been written to the sink.
        */
    -  private def commitAndConstructNextBatch(): Boolean = {
    +  private def commitBatch(): Unit = {
    --- End diff --
    
    Why does this need to be a separate function anyways, and not be part of runBatch(). It gets called only after runBatch().


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14382][SQL]QueryProgress should be post...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12155


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14382][SQL]QueryProgress should be post...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12155#issuecomment-205588160
  
    **[Test build #54925 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54925/consoleFull)** for PR 12155 at commit [`41b4996`](https://github.com/apache/spark/commit/41b4996dee42bb5687564cb7257aba6363a6e030).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14382][SQL]QueryProgress should be post...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/12155#issuecomment-206518254
  
    LGTM. Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14382][SQL]QueryProgress should be post...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12155#discussion_r58454634
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala ---
    @@ -331,7 +338,6 @@ class StreamExecution(
     
         val batchTime = (System.nanoTime() - startTime).toDouble / 1000000
         logInfo(s"Completed up to $availableOffsets in ${batchTime}ms")
    --- End diff --
    
    > This log line should move to commit
    
    This log contains a local `startTime`. Maybe I can just move codes in `commitBatch` here. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14382][SQL]QueryProgress should be post...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12155#discussion_r58472994
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala ---
    @@ -225,17 +228,21 @@ class StreamExecution(
       }
     
       /**
    -   * Queries all of the sources to see if any new data is available. When there is new data the
    -   * batchId counter is incremented and a new log entry is written with the newest offsets.
    -   *
    -   * Note that committing the offsets for a new batch implicitly marks the previous batch as
    -   * finished and thus this method should only be called when all currently available data
    -   * has been written to the sink.
    +   * Commit the batch. Note that committing the offsets for a new batch implicitly marks the
    +   * previous batch as finished and thus this method should only be called when all currently
    +   * available data has been written to the sink.
        */
    -  private def commitAndConstructNextBatch(): Boolean = {
    +  private def commitBatch(): Unit = {
    --- End diff --
    
    And in that case, the current log line at the end of runBatch is sufficient.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14382][SQL]QueryProgress should be post...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12155#issuecomment-205496005
  
    **[Test build #54894 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54894/consoleFull)** for PR 12155 at commit [`9717c55`](https://github.com/apache/spark/commit/9717c559d884534eee6ed30539334874a172eea9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14382][SQL]QueryProgress should be post...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12155#issuecomment-205622525
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54925/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14382][SQL]QueryProgress should be post...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12155#discussion_r58454505
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala ---
    @@ -331,7 +338,6 @@ class StreamExecution(
     
         val batchTime = (System.nanoTime() - startTime).toDouble / 1000000
         logInfo(s"Completed up to $availableOffsets in ${batchTime}ms")
    --- End diff --
    
    There is a log in `constructNextBatch` when `dataAvailable` is true: `logInfo(s"Committed offsets for batch $currentBatchId.")`. And I think we should not log anything when `dataAvailable` is false. Otherwise, when we run the batches as fast as possible, there will be too many meaningless logs. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14382][SQL]QueryProgress should be post...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12155#discussion_r58488224
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala ---
    @@ -331,7 +338,6 @@ class StreamExecution(
     
         val batchTime = (System.nanoTime() - startTime).toDouble / 1000000
         logInfo(s"Completed up to $availableOffsets in ${batchTime}ms")
    --- End diff --
    
    Updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14382][SQL]QueryProgress should be post...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12155#issuecomment-205525360
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54894/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14382][SQL]QueryProgress should be post...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12155#discussion_r58453837
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala ---
    @@ -331,7 +338,6 @@ class StreamExecution(
     
         val batchTime = (System.nanoTime() - startTime).toDouble / 1000000
         logInfo(s"Completed up to $availableOffsets in ${batchTime}ms")
    --- End diff --
    
    This log line should move to `commit`
    And there should be a new log line to log the construction of the new batch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14382][SQL]QueryProgress should be post...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12155#issuecomment-205525359
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14382][SQL]QueryProgress should be post...

Posted by tdas <gi...@git.apache.org>.
Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12155#discussion_r58453973
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala ---
    @@ -207,7 +210,7 @@ class StreamExecution(
           case None => // We are starting this stream for the first time.
             logInfo(s"Starting new continuous query.")
             currentBatchId = 0
    -        commitAndConstructNextBatch()
    --- End diff --
    
    I like this change. Its weird to commitAndConstruct when there is nothing to commit


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org