You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by scwf <gi...@git.apache.org> on 2014/03/30 17:42:49 UTC
[GitHub] spark pull request: DAGScheduler NullPointerException bug fix
GitHub user scwf opened a pull request:
https://github.com/apache/spark/pull/273
DAGScheduler NullPointerException bug fix
DAGScheduler throws this NullPointerException below occasionally when running spark apps.
【
java.lang.NullPointerException
at org.apache.spark.scheduler.DAGScheduler.executorAdded(DAGScheduler.scala:186)
at org.apache.spark.scheduler.TaskSchedulerImpl.executorAdded(TaskSchedulerImpl.scala:409)
at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$1.apply(TaskSchedulerImpl.scala:210)
at org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$resourceOffers$1.apply(TaskSchedulerImpl.scala:206)
at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:73)
at org.apache.spark.scheduler.TaskSchedulerImpl.resourceOffers(TaskSchedulerImpl.scala:206)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor.makeOffers(CoarseGrainedSchedulerBackend.scala:130)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverActor$$anonfun$receive$1.applyOrEls
】
This exception actually happens in the situation that eventProcessActor in DAGScheduler has not being initialized when ExecutorBackend register with SchedulerBackend. so the executorAdded method in DAGScheduler will throw the exception. We can replay this exception by adding this line of code to start method of DAGScheduler
def start() {
Thread.sleep(10000) // add this line
eventProcessActor = env.actorSystem.actorOf(Props(new Actor {
...
So fix this problem by judging eventProcessActor initialized or not when adding executor.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/scwf/spark dagbugfix
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/273.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #273
----
commit ac2e0ded2f886fc4b1c10ef393bf222d5d81ded4
Author: scwf <root@kf-thinkpad-sl410.(none)>
Date: 2014-03-30T15:00:37Z
dagscheduler bug fix
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: DAGScheduler NullPointerException bug fix
Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/273#issuecomment-39053529
@markhamstra , I think createTaskScheduler initialize the ClusterSchedulerBackend, which registers the executors, even we adjust the order, it still has this problem?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: DAGScheduler NullPointerException bug fix
Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/273#issuecomment-39053957
To facilitate your debugging:
time 0: createTaskScheduler (initializes ClusterSchedulerBackEnd, which calls TaskSchedulerImpl.resourceOffers() to add Executor in ReviveOffers event)
time 0: dagScheduler constructor is pending there (for some weird reason, e.g. I just realized that I once met this when Travis is pretty busy)
time 1: ReviveOffers message is received, TaskScheduler.ExecutorAdded is called, TaskScheduler.setDAGScheduler hasn't been called) NPE is thrown
en, maybe we could separate the initialization from createTaskScheduler, let taskScheduler initialize it after it has set DAGScheduler (in #186 , also after eventProcessActor is ready)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: DAGScheduler NullPointerException bug fix
Posted by scwf <gi...@git.apache.org>.
Github user scwf commented on the pull request:
https://github.com/apache/spark/pull/273#issuecomment-39050945
i have check this and on #186 it alse has the same problem that
DAGScheduler has not initialized when ExecutorBackend register with
SchedulerBackend.
i will update my patch for better way
On 2014/3/31 1:05, Mark Hamstra wrote:
> There isn't actually any need to separate DAGScheduler construction from
> eventProcessActor initialization anymore; so that is already being
> eliminated as part of the work-in-progress on #186
> <https://github.com/apache/spark/pull/186>, which will also fix this
> bug, I believe. In any event, the better fix to this is just to make
> sure that taskScheduler.setDAGScheduler(this) doesn't get called from
> within the DAGScheduler until after the eventProcessActor is valid, so
> please make sure that that happens, @CodingCat
> <https://github.com/CodingCat>.
>
> —
> Reply to this email directly or view it on GitHub
> <https://github.com/apache/spark/pull/273#issuecomment-39031618>.
>
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: DAGScheduler NullPointerException bug fix
Posted by scwf <gi...@git.apache.org>.
Github user scwf closed the pull request at:
https://github.com/apache/spark/pull/273
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: DAGScheduler NullPointerException bug fix
Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/273#issuecomment-39081093
@scwf Ah, I see, backend actor is started within TaskSchedulerImpl.start(), then adjusting order should be enough
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: DAGScheduler NullPointerException bug fix
Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/273#issuecomment-39032959
yes, I agree with @markhamstra , the new dagScheduler will initialize the eventProcessingVector in its constructor (and manages it with a supervisor actor), and it will call taskScheduler.setDAGScheduler after the supervisor actor is valid, so this problem should be fixed.....
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: DAGScheduler NullPointerException bug fix
Posted by scwf <gi...@git.apache.org>.
Github user scwf commented on the pull request:
https://github.com/apache/spark/pull/273#issuecomment-39050276
ok, i will create a jira for this
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: DAGScheduler NullPointerException bug fix
Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/273#issuecomment-39031441
Do you mind creating a JIRA for this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: DAGScheduler NullPointerException bug fix
Posted by scwf <gi...@git.apache.org>.
Github user scwf commented on the pull request:
https://github.com/apache/spark/pull/273#issuecomment-39081816
@CodingCat right, it is an order issue.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: DAGScheduler NullPointerException bug fix
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/273#issuecomment-39028814
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: DAGScheduler NullPointerException bug fix
Posted by markhamstra <gi...@git.apache.org>.
Github user markhamstra commented on the pull request:
https://github.com/apache/spark/pull/273#issuecomment-39053458
@CodingCat there is the matter of the order of construction and starting of the taskScheduler and the dagScheduler in SparkContext (https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L224.) I believe that https://github.com/apache/spark/pull/186 and the following should do things in the right order:
```scala
// Create and start the scheduler
private[spark] var taskScheduler = SparkContext.createTaskScheduler(this, master)
@volatile private[spark] var dagScheduler = new DAGScheduler(this)
taskScheduler.start()
postEnvironmentUpdate()
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: DAGScheduler NullPointerException bug fix
Posted by markhamstra <gi...@git.apache.org>.
Github user markhamstra commented on the pull request:
https://github.com/apache/spark/pull/273#issuecomment-39053688
Could be -- I didn't check it too closely yet. @scwf is correct that we need some means to avoid this race condition, but I'm not certain yet what is the cleanest way to do that. cc/ @kayousterhout
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: DAGScheduler NullPointerException bug fix
Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/273#issuecomment-39079212
@scwf how did you reproduce this bug? I think what you are fixing is different with what you show in the original PR description, where your eventProcessActor is null....
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: DAGScheduler NullPointerException bug fix
Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:
https://github.com/apache/spark/pull/273#issuecomment-39354778
Hm, this patch seems to secretly include merge commits, making it hard to tell which part you actually changed. Would you mind doing a git rebase on top of master?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: DAGScheduler NullPointerException bug fix
Posted by scwf <gi...@git.apache.org>.
Github user scwf commented on the pull request:
https://github.com/apache/spark/pull/273#issuecomment-39065809
adjusting the order works. executorbackend register executor after
SchedulerBackend start.
On 2014/3/31 12:38, Nan Zhu wrote:
> @markhamstra <https://github.com/markhamstra> , I think
> createTaskScheduler initialize the ClusterSchedulerBackend, which
> registers the executors, even we adjust the order, it still has this
> problem?
>
> —
> Reply to this email directly or view it on GitHub
> <https://github.com/apache/spark/pull/273#issuecomment-39053529>.
>
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: DAGScheduler NullPointerException bug fix
Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/273#issuecomment-39319966
I noticed that you added explicit return type to all public methods...I think there is a PR about this (but I cannot find it now.....)....en...is that merged? @pwendell
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: DAGScheduler NullPointerException bug fix
Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/273#issuecomment-39053984
style fix is required....file=/home/travis/build/apache/spark/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala message=File line length exceeds 100 characters line=59
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: DAGScheduler NullPointerException bug fix
Posted by scwf <gi...@git.apache.org>.
Github user scwf commented on the pull request:
https://github.com/apache/spark/pull/273#issuecomment-39420452
ok
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: DAGScheduler NullPointerException bug fix
Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:
https://github.com/apache/spark/pull/273#issuecomment-39051150
in #186, the eventProcessActor is initialized in the constructor of the DAGSchedulor and the reference is passed in preStart hook of the eventProcessActor, can you give more details on your thoughts that #186 still has this problem?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: DAGScheduler NullPointerException bug fix
Posted by markhamstra <gi...@git.apache.org>.
Github user markhamstra commented on the pull request:
https://github.com/apache/spark/pull/273#issuecomment-39031618
There isn't actually any need to separate DAGScheduler construction from eventProcessActor initialization anymore; so that is already being eliminated as part of the work-in-progress on https://github.com/apache/spark/pull/186, which will also fix this bug, I believe. In any event, the better fix to this is just to make sure that taskScheduler.setDAGScheduler(this) doesn't get called from within the DAGScheduler until after the eventProcessActor is valid, so please make sure that that happens, @CodingCat.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: DAGScheduler NullPointerException bug fix
Posted by scwf <gi...@git.apache.org>.
Github user scwf commented on the pull request:
https://github.com/apache/spark/pull/273#issuecomment-39082090
i have create an JIRA for this @pwendell
[SPARK-1361]
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---