You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Umesh Kacha <um...@gmail.com> on 2015/08/03 17:59:45 UTC

Re: How to control Spark Executors from getting Lost when using YARN client mode?

Hi all any help will be much appreciated my spark job runs fine but in the
middle it starts loosing executors because of netafetchfailed exception
saying shuffle not found at the location since executor is lost
On Jul 31, 2015 11:41 PM, "Umesh Kacha" <um...@gmail.com> wrote:

> Hi thanks for the response. It looks like YARN container is getting killed
> but dont know why I see shuffle metafetchexception as mentioned in the
> following SO link. I have enough memory 8 nodes 8 cores 30 gig memory each.
> And because of this metafetchexpcetion YARN killing container running
> executor how can it over run memory I tried to give each executor 25 gig
> still it is not sufficient and it fails. Please guide I dont understand
> what is going on I am using Spark 1.4.0 I am using spark.shuffle.memory as
> 0.0 and spark.storage.memory as 0.5. I have almost all optimal properties
> like Kyro serializer I have kept 500 akka frame size 20 akka threads dont
> know I am trapped its been two days I am trying to recover from this issue.
>
>
> http://stackoverflow.com/questions/29850784/what-are-the-likely-causes-of-org-apache-spark-shuffle-metadatafetchfailedexcept
>
>
>
> On Thu, Jul 30, 2015 at 9:56 PM, Ashwin Giridharan <ashwin.focus@gmail.com
> > wrote:
>
>> What is your cluster configuration ( size and resources) ?
>>
>> If you do not have enough resources, then your executor will not run.
>> Moreover allocating 8 cores to an executor is too much.
>>
>> If you have a cluster with four nodes running NodeManagers, each equipped
>> with 4 cores and 8GB of memory,
>> then an optimal configuration would be,
>>
>> --num-executors 8 --executor-cores 2 --executor-memory 2G
>>
>> Thanks,
>> Ashwin
>>
>> On Thu, Jul 30, 2015 at 12:08 PM, unk1102 <um...@gmail.com> wrote:
>>
>>> Hi I have one Spark job which runs fine locally with less data but when I
>>> schedule it on YARN to execute I keep on getting the following ERROR and
>>> slowly all executors gets removed from UI and my job fails
>>>
>>> 15/07/30 10:18:13 ERROR cluster.YarnScheduler: Lost executor 8 on
>>> myhost1.com: remote Rpc client disassociated
>>> 15/07/30 10:18:13 ERROR cluster.YarnScheduler: Lost executor 6 on
>>> myhost2.com: remote Rpc client disassociated
>>> I use the following command to schedule spark job in yarn-client mode
>>>
>>>  ./spark-submit --class com.xyz.MySpark --conf
>>> "spark.executor.extraJavaOptions=-XX:MaxPermSize=512M"
>>> --driver-java-options
>>> -XX:MaxPermSize=512m --driver-memory 3g --master yarn-client
>>> --executor-memory 2G --executor-cores 8 --num-executors 12
>>> /home/myuser/myspark-1.0.jar
>>>
>>> I dont know what is the problem please guide. I am new to Spark. Thanks
>>> in
>>> advance.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-control-Spark-Executors-from-getting-Lost-when-using-YARN-client-mode-tp24084.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>>
>> --
>> Thanks & Regards,
>> Ashwin Giridharan
>>
>
>

Re: How to control Spark Executors from getting Lost when using YARN client mode?

Posted by Jeff Zhang <zj...@gmail.com>.

Please check the node manager logs to see why the container is killed.

On Mon, Aug 3, 2015 at 11:59 PM, Umesh Kacha <um...@gmail.com> wrote:

> Hi all any help will be much appreciated my spark job runs fine but in the
> middle it starts loosing executors because of netafetchfailed exception
> saying shuffle not found at the location since executor is lost
> On Jul 31, 2015 11:41 PM, "Umesh Kacha" <um...@gmail.com> wrote:
>
>> Hi thanks for the response. It looks like YARN container is getting
>> killed but dont know why I see shuffle metafetchexception as mentioned in
>> the following SO link. I have enough memory 8 nodes 8 cores 30 gig memory
>> each. And because of this metafetchexpcetion YARN killing container running
>> executor how can it over run memory I tried to give each executor 25 gig
>> still it is not sufficient and it fails. Please guide I dont understand
>> what is going on I am using Spark 1.4.0 I am using spark.shuffle.memory as
>> 0.0 and spark.storage.memory as 0.5. I have almost all optimal properties
>> like Kyro serializer I have kept 500 akka frame size 20 akka threads dont
>> know I am trapped its been two days I am trying to recover from this issue.
>>
>>
>> http://stackoverflow.com/questions/29850784/what-are-the-likely-causes-of-org-apache-spark-shuffle-metadatafetchfailedexcept
>>
>>
>>
>> On Thu, Jul 30, 2015 at 9:56 PM, Ashwin Giridharan <
>> ashwin.focus@gmail.com> wrote:
>>
>>> What is your cluster configuration ( size and resources) ?
>>>
>>> If you do not have enough resources, then your executor will not run.
>>> Moreover allocating 8 cores to an executor is too much.
>>>
>>> If you have a cluster with four nodes running NodeManagers, each
>>> equipped with 4 cores and 8GB of memory,
>>> then an optimal configuration would be,
>>>
>>> --num-executors 8 --executor-cores 2 --executor-memory 2G
>>>
>>> Thanks,
>>> Ashwin
>>>
>>> On Thu, Jul 30, 2015 at 12:08 PM, unk1102 <um...@gmail.com> wrote:
>>>
>>>> Hi I have one Spark job which runs fine locally with less data but when
>>>> I
>>>> schedule it on YARN to execute I keep on getting the following ERROR and
>>>> slowly all executors gets removed from UI and my job fails
>>>>
>>>> 15/07/30 10:18:13 ERROR cluster.YarnScheduler: Lost executor 8 on
>>>> myhost1.com: remote Rpc client disassociated
>>>> 15/07/30 10:18:13 ERROR cluster.YarnScheduler: Lost executor 6 on
>>>> myhost2.com: remote Rpc client disassociated
>>>> I use the following command to schedule spark job in yarn-client mode
>>>>
>>>>  ./spark-submit --class com.xyz.MySpark --conf
>>>> "spark.executor.extraJavaOptions=-XX:MaxPermSize=512M"
>>>> --driver-java-options
>>>> -XX:MaxPermSize=512m --driver-memory 3g --master yarn-client
>>>> --executor-memory 2G --executor-cores 8 --num-executors 12
>>>> /home/myuser/myspark-1.0.jar
>>>>
>>>> I dont know what is the problem please guide. I am new to Spark. Thanks
>>>> in
>>>> advance.
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-control-Spark-Executors-from-getting-Lost-when-using-YARN-client-mode-tp24084.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>
>>>>
>>>
>>>
>>> --
>>> Thanks & Regards,
>>> Ashwin Giridharan
>>>
>>
>>


-- 
Best Regards

Jeff Zhang