You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by tdas <gi...@git.apache.org> on 2015/04/03 09:46:56 UTC

[GitHub] spark pull request: [SPARK-5681][Streaming] Add tracker status and...

Github user tdas commented on the pull request:

    https://github.com/apache/spark/pull/4467#issuecomment-89207216
  
    @viirya Sorry for slacking on this, been busy. I think understand your explanation. But I also spent some more time thinking about this ground up. 
    
    Correct me if I am wrong, but the thing was getting stuck because of this line
    https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala#L248
    What happens in this line? The `receiverInfo` should be empty because whatever had registered, would deregister and make the map empty. So that cannot be the reason of the hang. So the only reason it will stay stuck in that line is because `running = true`. `running` is set to false only if the job that runs the receiver completes (this [line](https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala#L310)). The fact that it is stuck indefinitely that means this job never completes. 
    
    That can happen only if the receiver (C in your examples) that had not registered by the time "stop gracefully" was called are somehow still running indefinitely. Ideally, it should have never registered and started at all! That is, the ReceiverTracker should prevent any further registration as soon as it gets a stop signal. In that case the sequence of the events should be. 
    
    
    time | tracker | receivers
    t = 1 | started | registered:{A, B}; starting, not registered: {C}
    t = 2 | stopping | got stop msg:{A, B}; starting, not registered: {C}
    t = 3 | stopping | stopped:{A, B}; attempts to register but denied by tracker, never starts itself: {C}
    
    Since C is not allowed to start, its stops itself and the task completes. The jobs completes, running = false, tracker stops.
    
    t = 4 | stopped | stopped:{A, B}; stopped: {C}  
    
    Isnt this a viable solution? If so, I think this is simpler than introducing another state.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org