You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Guillaume Pitel <gu...@exensa.com> on 2014/01/08 14:28:17 UTC

Dying workers since migration to 0.8.1

Hi,

We migrated from 0.8.0 to 0.8.1 on Monday, since then we have observed a high 
rate of disappearing (they're not in the list anymore) or dying (marked DEAD) 
workers.

This is particularly strange since the processes often continue running.

Any idea / advice about this particular problem ?

Thanks

Guillaume
-- 
eXenSa

	
*Guillaume PITEL, Président*
+33(0)6 25 48 86 80

eXenSa S.A.S. <http://www.exensa.com/>
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05


Re: Dying workers since migration to 0.8.1

Posted by Guillaume Pitel <gu...@exensa.com>.
Hi,

The apparent cause of the problem is when I ctrl-c the driver on compute-heavy 
tasks. The slaves continue running (a long time after the driver has been 
stopped) and the slaves are marked as dead.

Guillaume
> Hi, sorry for the poor information I initially gave :)
>
> Cluster is standalone on premise, 4 nodes.
>
> No problem before with 0.8.0 (more exactly, it happened once or twice, not 
> several times/day)
>
> No exceptions, just these warnings on master (t1.exensa.loc had disappeared, 
> while its process continue working) :
>
> *WARN Master: Got heartbeat from unregistered worker* 
> worker-20140108170426-*t1.exensa.loc*-38178
>
> Guillaume
>
>> Hi,
>>
>> Can you give a little more details about the problem apart from a few hints 
>> that would be great !. I would like to exactly what you did and how did you 
>> end up getting those stuck up executors. This can be due to network too. Are 
>> you on ec2 ? in that case ec2 n/w is often unpredictable.
>>
>
>
> -- 
> eXenSa
>
> 	
> *Guillaume PITEL, Président*
> +33(0)6 25 48 86 80
>
> eXenSa S.A.S. <http://www.exensa.com/>
> 41, rue Périer - 92120 Montrouge - FRANCE
> Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05
>


-- 
eXenSa

	
*Guillaume PITEL, Président*
+33(0)6 25 48 86 80

eXenSa S.A.S. <http://www.exensa.com/>
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05


Re: Dying workers since migration to 0.8.1

Posted by Guillaume Pitel <gu...@exensa.com>.
Hi, sorry for the poor information I initially gave :)

Cluster is standalone on premise, 4 nodes.

No problem before with 0.8.0 (more exactly, it happened once or twice, not 
several times/day)

No exceptions, just these warnings on master (t1.exensa.loc had disappeared, 
while its process continue working) :

*WARN Master: Got heartbeat from unregistered worker* 
worker-20140108170426-*t1.exensa.loc*-38178

Guillaume

> Hi,
>
> Can you give a little more details about the problem apart from a few hints 
> that would be great !. I would like to exactly what you did and how did you 
> end up getting those stuck up executors. This can be due to network too. Are 
> you on ec2 ? in that case ec2 n/w is often unpredictable.
>


-- 
eXenSa

	
*Guillaume PITEL, Président*
+33(0)6 25 48 86 80

eXenSa S.A.S. <http://www.exensa.com/>
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05


Re: Dying workers since migration to 0.8.1

Posted by Andrew Ash <an...@andrewash.com>.
Any exceptions you see in the worker machine's logs would be particularly
useful too.


On Wed, Jan 8, 2014 at 6:00 AM, Prashant Sharma <sc...@gmail.com>wrote:

> Hi,
>
> Can you give a little more details about the problem apart from a few
> hints that would be great !. I would like to exactly what you did and how
> did you end up getting those stuck up executors. This can be due to network
> too. Are you on ec2 ? in that case ec2 n/w is often unpredictable.
>
>
> On Wed, Jan 8, 2014 at 6:58 PM, Guillaume Pitel <
> guillaume.pitel@exensa.com> wrote:
>
>>  Hi,
>>
>> We migrated from 0.8.0 to 0.8.1 on Monday, since then we have observed a
>> high rate of disappearing (they're not in the list anymore) or dying
>> (marked DEAD) workers.
>>
>> This is particularly strange since the processes often continue running.
>>
>> Any idea / advice about this particular problem ?
>>
>> Thanks
>>
>> Guillaume
>> --
>>    [image: eXenSa]
>>  *Guillaume PITEL, Président*
>> +33(0)6 25 48 86 80
>>
>> eXenSa S.A.S. <http://www.exensa.com/>
>>  41, rue Périer - 92120 Montrouge - FRANCE
>> Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05
>>
>
>
>
> --
> Prashant
>

Re: Dying workers since migration to 0.8.1

Posted by Prashant Sharma <sc...@gmail.com>.
Hi,

Can you give a little more details about the problem apart from a few hints
that would be great !. I would like to exactly what you did and how did you
end up getting those stuck up executors. This can be due to network too.
Are you on ec2 ? in that case ec2 n/w is often unpredictable.


On Wed, Jan 8, 2014 at 6:58 PM, Guillaume Pitel
<gu...@exensa.com>wrote:

>  Hi,
>
> We migrated from 0.8.0 to 0.8.1 on Monday, since then we have observed a
> high rate of disappearing (they're not in the list anymore) or dying
> (marked DEAD) workers.
>
> This is particularly strange since the processes often continue running.
>
> Any idea / advice about this particular problem ?
>
> Thanks
>
> Guillaume
> --
>    [image: eXenSa]
>  *Guillaume PITEL, Président*
> +33(0)6 25 48 86 80
>
> eXenSa S.A.S. <http://www.exensa.com/>
>  41, rue Périer - 92120 Montrouge - FRANCE
> Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05
>



-- 
Prashant