You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by scwf <gi...@git.apache.org> on 2014/03/30 17:42:49 UTC

[GitHub] spark pull request: DAGScheduler NullPointerException bug fix

GitHub user scwf opened a pull request:

    https://github.com/apache/spark/pull/273

    DAGScheduler NullPointerException bug fix

    DAGScheduler throws this NullPointerException below  occasionally  when running spark apps.
    【
    java.lang.NullPointerException
    	at org.apache.spark.scheduler.DAGScheduler.executorAdded(DAGScheduler.scala:186)
    	at org.apache.spark.scheduler.TaskSchedulerImpl.executorAdded(TaskSchedulerImpl.scala:409)
    	at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$1.apply(TaskSchedulerImpl.scala:210)
    	at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$1.apply(TaskSchedulerImpl.scala:206)
    	at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:73)
    	at org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:206)
    	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor.makeOffers(CoarseGrainedSchedulerBackend.scala:130)
    	at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receive$1.applyOrEls
    】
        This exception actually happens in the situation that eventProcessActor in DAGScheduler has not being initialized when ExecutorBackend register with SchedulerBackend. so the executorAdded method in DAGScheduler will throw the exception.  We can replay this  exception by adding this line of code  to start method of DAGScheduler
    
          def start() {
              Thread.sleep(10000)  // add this line
              eventProcessActor = env.actorSystem.actorOf(Props(new Actor {
          ...
    
    
    So fix this problem by judging eventProcessActor initialized or not when adding executor.
    	
      
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/scwf/spark dagbugfix

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/273.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #273
    
----
commit ac2e0ded2f886fc4b1c10ef393bf222d5d81ded4
Author: scwf <root@kf-thinkpad-sl410.(none)>
Date:   2014-03-30T15:00:37Z

    dagscheduler bug fix

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: DAGScheduler NullPointerException bug fix

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:

    https://github.com/apache/spark/pull/273#issuecomment-39053529
  
    @markhamstra , I think createTaskScheduler initialize the ClusterSchedulerBackend, which registers the executors, even we adjust the order, it still has this problem?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: DAGScheduler NullPointerException bug fix

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:

    https://github.com/apache/spark/pull/273#issuecomment-39053957
  
    To facilitate your debugging:
    
    time 0: createTaskScheduler (initializes ClusterSchedulerBackEnd, which calls TaskSchedulerImpl.resourceOffers() to add Executor in ReviveOffers event) 
    
    time 0: dagScheduler constructor is pending there (for some weird reason, e.g. I just realized that I once met this when Travis is pretty busy)
    
    time 1: ReviveOffers message is received, TaskScheduler.ExecutorAdded is called, TaskScheduler.setDAGScheduler hasn't been called) NPE is thrown
    
    en, maybe we could separate the initialization from createTaskScheduler, let taskScheduler initialize it after it has set DAGScheduler (in #186 , also after eventProcessActor is ready)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: DAGScheduler NullPointerException bug fix

Posted by scwf <gi...@git.apache.org>.
Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/273#issuecomment-39050945
  
    i have check this and on #186 it alse has the same problem that 
    DAGScheduler has not initialized when ExecutorBackend register with 
    SchedulerBackend.
    i will update my patch for better way
    
    
    On 2014/3/31 1:05, Mark Hamstra wrote:
    > There isn't actually any need to separate DAGScheduler construction from
    > eventProcessActor initialization anymore; so that is already being
    > eliminated as part of the work-in-progress on #186
    > <https://github.com/apache/spark/pull/186>, which will also fix this
    > bug, I believe. In any event, the better fix to this is just to make
    > sure that taskScheduler.setDAGScheduler(this) doesn't get called from
    > within the DAGScheduler until after the eventProcessActor is valid, so
    > please make sure that that happens, @CodingCat
    > <https://github.com/CodingCat>.
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/273#issuecomment-39031618>.
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: DAGScheduler NullPointerException bug fix

Posted by scwf <gi...@git.apache.org>.
Github user scwf closed the pull request at:

    https://github.com/apache/spark/pull/273


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: DAGScheduler NullPointerException bug fix

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:

    https://github.com/apache/spark/pull/273#issuecomment-39081093
  
    @scwf Ah, I see, backend actor is started within TaskSchedulerImpl.start(), then adjusting order should be enough


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: DAGScheduler NullPointerException bug fix

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:

    https://github.com/apache/spark/pull/273#issuecomment-39032959
  
    yes, I agree with @markhamstra , the new dagScheduler will initialize the eventProcessingVector in its constructor (and manages it with a supervisor actor), and it will call taskScheduler.setDAGScheduler after the supervisor actor is valid, so this problem should be fixed.....


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: DAGScheduler NullPointerException bug fix

Posted by scwf <gi...@git.apache.org>.
Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/273#issuecomment-39050276
  
    ok, i will create a jira for this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: DAGScheduler NullPointerException bug fix

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/273#issuecomment-39031441
  
    Do you mind creating a JIRA for this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: DAGScheduler NullPointerException bug fix

Posted by scwf <gi...@git.apache.org>.
Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/273#issuecomment-39081816
  
    @CodingCat  right, it is an order issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: DAGScheduler NullPointerException bug fix

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/273#issuecomment-39028814
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: DAGScheduler NullPointerException bug fix

Posted by markhamstra <gi...@git.apache.org>.
Github user markhamstra commented on the pull request:

    https://github.com/apache/spark/pull/273#issuecomment-39053458
  
    @CodingCat there is the matter of the order of construction and starting of the taskScheduler and the dagScheduler in SparkContext (https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L224.)  I believe that https://github.com/apache/spark/pull/186 and the following should do things in the right order:
    ```scala
      // Create and start the scheduler
      private[spark] var taskScheduler = SparkContext.createTaskScheduler(this, master)
      @volatile private[spark] var dagScheduler = new DAGScheduler(this)
      taskScheduler.start()
    
      postEnvironmentUpdate()
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: DAGScheduler NullPointerException bug fix

Posted by markhamstra <gi...@git.apache.org>.
Github user markhamstra commented on the pull request:

    https://github.com/apache/spark/pull/273#issuecomment-39053688
  
    Could be -- I didn't check it too closely yet.  @scwf is correct that we need some means to avoid this race condition, but I'm not certain yet what is the cleanest way to do that.  cc/ @kayousterhout 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: DAGScheduler NullPointerException bug fix

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:

    https://github.com/apache/spark/pull/273#issuecomment-39079212
  
    @scwf how did you reproduce this bug? I think what you are fixing is different with what you show in the original PR description, where your eventProcessActor is null....


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: DAGScheduler NullPointerException bug fix

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/273#issuecomment-39354778
  
    Hm, this patch seems to secretly include merge commits, making it hard to tell which part you actually changed. Would you mind doing a git rebase on top of master?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: DAGScheduler NullPointerException bug fix

Posted by scwf <gi...@git.apache.org>.
Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/273#issuecomment-39065809
  
    adjusting the order works. executorbackend register executor after 
    SchedulerBackend start.
    
    
    
    
    On 2014/3/31 12:38, Nan Zhu wrote:
    > @markhamstra <https://github.com/markhamstra> , I think
    > createTaskScheduler initialize the ClusterSchedulerBackend, which
    > registers the executors, even we adjust the order, it still has this
    > problem?
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/273#issuecomment-39053529>.
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: DAGScheduler NullPointerException bug fix

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:

    https://github.com/apache/spark/pull/273#issuecomment-39319966
  
    I noticed that you added explicit return type to all public methods...I think there is a PR about this (but I cannot find it now.....)....en...is that merged? @pwendell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: DAGScheduler NullPointerException bug fix

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:

    https://github.com/apache/spark/pull/273#issuecomment-39053984
  
    style fix is required....file=/home/travis/build/apache/spark/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala message=File line length exceeds 100 characters line=59


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: DAGScheduler NullPointerException bug fix

Posted by scwf <gi...@git.apache.org>.
Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/273#issuecomment-39420452
  
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: DAGScheduler NullPointerException bug fix

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:

    https://github.com/apache/spark/pull/273#issuecomment-39051150
  
    in #186, the eventProcessActor is initialized in the constructor of the DAGSchedulor and the reference is passed in preStart hook of the eventProcessActor, can you give more details on your thoughts that  #186 still has this problem?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: DAGScheduler NullPointerException bug fix

Posted by markhamstra <gi...@git.apache.org>.
Github user markhamstra commented on the pull request:

    https://github.com/apache/spark/pull/273#issuecomment-39031618
  
    There isn't actually any need to separate DAGScheduler construction from eventProcessActor initialization anymore; so that is already being eliminated as part of the work-in-progress on https://github.com/apache/spark/pull/186, which will also fix this bug, I believe.  In any event, the better fix to this is just to make sure that taskScheduler.setDAGScheduler(this) doesn't get called from within the DAGScheduler until after the eventProcessActor is valid, so please make sure that that happens, @CodingCat. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: DAGScheduler NullPointerException bug fix

Posted by scwf <gi...@git.apache.org>.
Github user scwf commented on the pull request:

    https://github.com/apache/spark/pull/273#issuecomment-39082090
  
    i have create an JIRA for this @pwendell  
    [SPARK-1361]



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---