You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by kanzhang <gi...@git.apache.org> on 2014/04/09 06:58:07 UTC

[GitHub] spark pull request: SPARK-1407 drain event queue before stopping e...

GitHub user kanzhang opened a pull request:

    https://github.com/apache/spark/pull/366

    SPARK-1407 drain event queue before stopping event logger

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kanzhang/spark SPARK-1407

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/366.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #366
    
----
commit b073ee6136d978c1367d1ed81e29729eca21d53d
Author: Kan Zhang <kz...@apache.org>
Date:   2014-04-09T04:34:02Z

    SPARK-1407 drain event queue before stopping event logger

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1407 drain event queue before stopping e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/366#issuecomment-39931893
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1407 drain event queue before stopping e...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/366#issuecomment-39931680
  
    Hi @kanzhang, we have a long-running discussion about this at #251. Joining on the thread is a simple alternative to what is proposed in the other PR, but at the same time preserves clear semantics (i.e. all events before STOP will be processed to completion, and no events after STOP will be processed). Both @pwendell and I are inclined towards this approach.
    
    Could you add a test for this? #251 provided an excellent example of this. @concretevitamin could we use your test case here? We'll be sure to credit both of you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1407 drain event queue before stopping e...

Posted by kanzhang <gi...@git.apache.org>.
Github user kanzhang commented on the pull request:

    https://github.com/apache/spark/pull/366#issuecomment-40319077
  
    @pwendell @andrewor14 , I think the unit test depends on a race condition that the stopper thread is run before the listener thread does (but we don't know the actual order). Otherwise, the queue would have drained before stopper started and the test wouldn't be effective. I tried to fix in the following PR. pls take a look. thx. 
    
    https://github.com/apache/spark/pull/401


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1407 drain event queue before stopping e...

Posted by concretevitamin <gi...@git.apache.org>.
Github user concretevitamin commented on the pull request:

    https://github.com/apache/spark/pull/366#issuecomment-39931839
  
    This does look much simpler. Also, free free to use the test there ;)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1407 drain event queue before stopping e...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/366#issuecomment-40121574
  
    Yes, could you open one that describes the listener bus draining part of this PR specifically? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1407 drain event queue before stopping e...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/366#discussion_r11423705
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/SparkHdfsLR.scala ---
    @@ -73,6 +73,7 @@ object SparkHdfsLR {
         }
     
         println("Final w: " + w)
    +    sc.stop()
    --- End diff --
    
    By the way, is the System.exit(0) that follows necessary? Can we just remove it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1407 drain event queue before stopping e...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/366#issuecomment-40024403
  
    Thanks - I've merged this. I think SPARK-1407 is an incorrect label btw. I also re-wrote the unit test with @andrewor14 on merge to make it a bit simpler.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1407 drain event queue before stopping e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/366#issuecomment-39935451
  
    Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1407 drain event queue before stopping e...

Posted by kanzhang <gi...@git.apache.org>.
Github user kanzhang commented on the pull request:

    https://github.com/apache/spark/pull/366#issuecomment-40031940
  
    Thanks for getting rid of the 100ms wait. I noticed it but didn't have better idea.
    
    I think this patch addresses SPARK-1407. Without this patch, even if sc.stop() is called, it will only close the logger and flush whatever is already in its write buffer. If, at this point, some events are yet to be drained from event queue, when listenerThread tries to post them to logger later on, the same Filesystem is closed exception will be encountered. Consequently event log file will be incomplete (same as in SPARK-1407). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1407 drain event queue before stopping e...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/366#discussion_r11423726
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -884,6 +883,7 @@ class SparkContext(
           cleaner.foreach(_.stop())
           dagSchedulerCopy.stop()
           listenerBus.stop()
    +      eventLogger.foreach(_.stop())
    --- End diff --
    
    `listenerBus.stop()` should be moved all the way to the end of `sc.stop()`, in case there are listeners that block for a long time and prevent other components of Spark to stop. Perhaps `eventLogger.foreach(_.stop())` should go right after the listener bus.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1407 drain event queue before stopping e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/366#issuecomment-39939019
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13940/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1407 drain event queue before stopping e...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/366#discussion_r11423695
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/SparkHdfsLR.scala ---
    @@ -73,6 +73,7 @@ object SparkHdfsLR {
         }
     
         println("Final w: " + w)
    +    sc.stop()
    --- End diff --
    
    Good call, the lack of this is a cause for bugs in other places of Spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1407 drain event queue before stopping e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/366#issuecomment-39936421
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1407 drain event queue before stopping e...

Posted by kanzhang <gi...@git.apache.org>.
Github user kanzhang commented on the pull request:

    https://github.com/apache/spark/pull/366#issuecomment-40019447
  
    thanks for your feedback. Made changes based on reviewer comments and merged test from #251.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1407 drain event queue before stopping e...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/366#issuecomment-39936400
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1407 drain event queue before stopping e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/366#issuecomment-39931899
  
    Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1407 drain event queue before stopping e...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/366#issuecomment-39931703
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1407 drain event queue before stopping e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/366#issuecomment-39939018
  
    Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1407 drain event queue before stopping e...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/366#discussion_r11423681
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala ---
    @@ -36,6 +36,7 @@ private[spark] class LiveListenerBus extends SparkListenerBus with Logging {
       private val eventQueue = new LinkedBlockingQueue[SparkListenerEvent](EVENT_QUEUE_CAPACITY)
       private var queueFullErrorMessageLogged = false
       private var started = false
    +  private var sparkListenerBus: Option[Thread] = _
    --- End diff --
    
    This doesn't have to be an option, you can move the declaration of the thread up here, and do thread.start() in `start()`. Also, a better name for this would be something like `listenerThread`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1407 drain event queue before stopping e...

Posted by kanzhang <gi...@git.apache.org>.
Github user kanzhang commented on the pull request:

    https://github.com/apache/spark/pull/366#issuecomment-40039003
  
    Or I can open a separate JIRA to track it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1407 drain event queue before stopping e...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/366


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1407 drain event queue before stopping e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/366#issuecomment-39928894
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1407 drain event queue before stopping e...

Posted by kanzhang <gi...@git.apache.org>.
Github user kanzhang commented on the pull request:

    https://github.com/apache/spark/pull/366#issuecomment-40262882
  
    I opened SPARK-1475 to track this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1407 drain event queue before stopping e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/366#issuecomment-39936412
  
     Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1407 drain event queue before stopping e...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/366#issuecomment-39935453
  
    
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13935/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---