You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Sandy Ryza <sa...@cloudera.com> on 2015/03/09 08:25:15 UTC

Re: No executors allocated on yarn with latest master branch

You would have needed to configure it by
setting yarn.scheduler.capacity.resource-calculator to something ending in
DominantResourceCalculator.  If you haven't configured it, there's a high
probability that the recently committed
https://issues.apache.org/jira/browse/SPARK-6050 will fix your problem.

On Wed, Feb 25, 2015 at 1:36 AM, Anders Arpteg <ar...@spotify.com> wrote:

> We're using the capacity scheduler, to the best of my knowledge. Unsure if
> multi resource scheduling is used, but if you know of an easy way to figure
> that out, then let me know.
>
> Thanks,
> Anders
>
> On Sat, Feb 21, 2015 at 12:05 AM, Sandy Ryza <sa...@cloudera.com>
> wrote:
>
>> Are you using the capacity scheduler or fifo scheduler without multi
>> resource scheduling by any chance?
>>
>> On Thu, Feb 12, 2015 at 1:51 PM, Anders Arpteg <ar...@spotify.com>
>> wrote:
>>
>>> The nm logs only seems to contain similar to the following. Nothing else
>>> in the same time range. Any help?
>>>
>>> 2015-02-12 20:47:31,245 WARN
>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>>> Event EventType: KILL_CONTAINER sent to absent container
>>> container_1422406067005_0053_01_000002
>>> 2015-02-12 20:47:31,246 WARN
>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>>> Event EventType: KILL_CONTAINER sent to absent container
>>> container_1422406067005_0053_01_000012
>>> 2015-02-12 20:47:31,246 WARN
>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>>> Event EventType: KILL_CONTAINER sent to absent container
>>> container_1422406067005_0053_01_000022
>>> 2015-02-12 20:47:31,246 WARN
>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>>> Event EventType: KILL_CONTAINER sent to absent container
>>> container_1422406067005_0053_01_000032
>>> 2015-02-12 20:47:31,246 WARN
>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>>> Event EventType: KILL_CONTAINER sent to absent container
>>> container_1422406067005_0053_01_000042
>>> 2015-02-12 21:24:30,515 WARN
>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
>>> Event EventType: FINISH_APPLICATION sent to absent application
>>> application_1422406067005_0053
>>>
>>> On Thu, Feb 12, 2015 at 10:38 PM, Sandy Ryza <sa...@cloudera.com>
>>> wrote:
>>>
>>>> It seems unlikely to me that it would be a 2.2 issue, though not
>>>> entirely impossible.  Are you able to find any of the container logs?  Is
>>>> the NodeManager launching containers and reporting some exit code?
>>>>
>>>> -Sandy
>>>>
>>>> On Thu, Feb 12, 2015 at 1:21 PM, Anders Arpteg <ar...@spotify.com>
>>>> wrote:
>>>>
>>>>> No, not submitting from windows, from a debian distribution. Had a
>>>>> quick look at the rm logs, and it seems some containers are allocated but
>>>>> then released again for some reason. Not easy to make sense of the logs,
>>>>> but here is a snippet from the logs (from a test in our small test cluster)
>>>>> if you'd like to have a closer look: http://pastebin.com/8WU9ivqC
>>>>>
>>>>> Sandy, sounds like it could possible be a 2.2 issue then, or what do
>>>>> you think?
>>>>>
>>>>> Thanks,
>>>>> Anders
>>>>>
>>>>> On Thu, Feb 12, 2015 at 3:11 PM, Aniket Bhatnagar <
>>>>> aniket.bhatnagar@gmail.com> wrote:
>>>>>
>>>>>> This is tricky to debug. Check logs of node and resource manager of
>>>>>> YARN to see if you can trace the error. In the past I have to closely look
>>>>>> at arguments getting passed to YARN container (they get logged before
>>>>>> attempting to launch containers). If I still don't get a clue, I had to
>>>>>> check the script generated by YARN to execute the container and even run
>>>>>> manually to trace at what line the error has occurred.
>>>>>>
>>>>>> BTW are you submitting the job from windows?
>>>>>>
>>>>>> On Thu, Feb 12, 2015, 3:34 PM Anders Arpteg <ar...@spotify.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Interesting to hear that it works for you. Are you using Yarn 2.2 as
>>>>>>> well? No strange log message during startup, and can't see any other log
>>>>>>> messages since no executer gets launched. Does not seems to work in
>>>>>>> yarn-client mode either, failing with the exception below.
>>>>>>>
>>>>>>> Exception in thread "main" org.apache.spark.SparkException: Yarn
>>>>>>> application has already ended! It might have been killed or unable to
>>>>>>> launch application master.
>>>>>>>         at
>>>>>>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:119)
>>>>>>>         at
>>>>>>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:59)
>>>>>>>         at
>>>>>>> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141)
>>>>>>>         at
>>>>>>> org.apache.spark.SparkContext.<init>(SparkContext.scala:370)
>>>>>>>         at
>>>>>>> com.spotify.analytics.AnalyticsSparkContext.<init>(AnalyticsSparkContext.scala:8)
>>>>>>>         at
>>>>>>> com.spotify.analytics.DataSampler$.main(DataSampler.scala:42)
>>>>>>>         at com.spotify.analytics.DataSampler.main(DataSampler.scala)
>>>>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>>>>>>> Method)
>>>>>>>         at
>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>>         at
>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>         at
>>>>>>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:551)
>>>>>>>         at
>>>>>>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:155)
>>>>>>>         at
>>>>>>> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:178)
>>>>>>>         at
>>>>>>> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:99)
>>>>>>>         at
>>>>>>> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>>>>>
>>>>>>> /Anders
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Feb 12, 2015 at 1:33 AM, Sandy Ryza <sandy.ryza@cloudera.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hi Anders,
>>>>>>>>
>>>>>>>> I just tried this out and was able to successfully acquire
>>>>>>>> executors.  Any strange log messages or additional color you can provide on
>>>>>>>> your setup?  Does yarn-client mode work?
>>>>>>>>
>>>>>>>> -Sandy
>>>>>>>>
>>>>>>>> On Wed, Feb 11, 2015 at 1:28 PM, Anders Arpteg <ar...@spotify.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Compiled the latest master of Spark yesterday (2015-02-10) for
>>>>>>>>> Hadoop 2.2 and failed executing jobs in yarn-cluster mode for
>>>>>>>>> that build. Works successfully with spark 1.2 (and also master from
>>>>>>>>> 2015-01-16), so something has changed since then that prevents the job from
>>>>>>>>> receiving any executors on the cluster.
>>>>>>>>>
>>>>>>>>> Basic symptoms are that the jobs fires up the AM, but after
>>>>>>>>> examining the "executors" page in the web ui, only the driver is
>>>>>>>>> listed, no executors are ever received, and the driver keep waiting
>>>>>>>>> forever. Has anyone seemed similar problems?
>>>>>>>>>
>>>>>>>>> Thanks for any insights,
>>>>>>>>> Anders
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>