You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Guillaume Pitel <gu...@exensa.com> on 2014/01/08 14:28:17 UTC
Dying workers since migration to 0.8.1
Hi,
We migrated from 0.8.0 to 0.8.1 on Monday, since then we have observed a high
rate of disappearing (they're not in the list anymore) or dying (marked DEAD)
workers.
This is particularly strange since the processes often continue running.
Any idea / advice about this particular problem ?
Thanks
Guillaume
--
eXenSa
*Guillaume PITEL, Président*
+33(0)6 25 48 86 80
eXenSa S.A.S. <http://www.exensa.com/>
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05
Re: Dying workers since migration to 0.8.1
Posted by Guillaume Pitel <gu...@exensa.com>.
Hi,
The apparent cause of the problem is when I ctrl-c the driver on compute-heavy
tasks. The slaves continue running (a long time after the driver has been
stopped) and the slaves are marked as dead.
Guillaume
> Hi, sorry for the poor information I initially gave :)
>
> Cluster is standalone on premise, 4 nodes.
>
> No problem before with 0.8.0 (more exactly, it happened once or twice, not
> several times/day)
>
> No exceptions, just these warnings on master (t1.exensa.loc had disappeared,
> while its process continue working) :
>
> *WARN Master: Got heartbeat from unregistered worker*
> worker-20140108170426-*t1.exensa.loc*-38178
>
> Guillaume
>
>> Hi,
>>
>> Can you give a little more details about the problem apart from a few hints
>> that would be great !. I would like to exactly what you did and how did you
>> end up getting those stuck up executors. This can be due to network too. Are
>> you on ec2 ? in that case ec2 n/w is often unpredictable.
>>
>
>
> --
> eXenSa
>
>
> *Guillaume PITEL, Président*
> +33(0)6 25 48 86 80
>
> eXenSa S.A.S. <http://www.exensa.com/>
> 41, rue Périer - 92120 Montrouge - FRANCE
> Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05
>
--
eXenSa
*Guillaume PITEL, Président*
+33(0)6 25 48 86 80
eXenSa S.A.S. <http://www.exensa.com/>
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05
Re: Dying workers since migration to 0.8.1
Posted by Guillaume Pitel <gu...@exensa.com>.
Hi, sorry for the poor information I initially gave :)
Cluster is standalone on premise, 4 nodes.
No problem before with 0.8.0 (more exactly, it happened once or twice, not
several times/day)
No exceptions, just these warnings on master (t1.exensa.loc had disappeared,
while its process continue working) :
*WARN Master: Got heartbeat from unregistered worker*
worker-20140108170426-*t1.exensa.loc*-38178
Guillaume
> Hi,
>
> Can you give a little more details about the problem apart from a few hints
> that would be great !. I would like to exactly what you did and how did you
> end up getting those stuck up executors. This can be due to network too. Are
> you on ec2 ? in that case ec2 n/w is often unpredictable.
>
--
eXenSa
*Guillaume PITEL, Président*
+33(0)6 25 48 86 80
eXenSa S.A.S. <http://www.exensa.com/>
41, rue Périer - 92120 Montrouge - FRANCE
Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05
Re: Dying workers since migration to 0.8.1
Posted by Andrew Ash <an...@andrewash.com>.
Any exceptions you see in the worker machine's logs would be particularly
useful too.
On Wed, Jan 8, 2014 at 6:00 AM, Prashant Sharma <sc...@gmail.com>wrote:
> Hi,
>
> Can you give a little more details about the problem apart from a few
> hints that would be great !. I would like to exactly what you did and how
> did you end up getting those stuck up executors. This can be due to network
> too. Are you on ec2 ? in that case ec2 n/w is often unpredictable.
>
>
> On Wed, Jan 8, 2014 at 6:58 PM, Guillaume Pitel <
> guillaume.pitel@exensa.com> wrote:
>
>> Hi,
>>
>> We migrated from 0.8.0 to 0.8.1 on Monday, since then we have observed a
>> high rate of disappearing (they're not in the list anymore) or dying
>> (marked DEAD) workers.
>>
>> This is particularly strange since the processes often continue running.
>>
>> Any idea / advice about this particular problem ?
>>
>> Thanks
>>
>> Guillaume
>> --
>> [image: eXenSa]
>> *Guillaume PITEL, Président*
>> +33(0)6 25 48 86 80
>>
>> eXenSa S.A.S. <http://www.exensa.com/>
>> 41, rue Périer - 92120 Montrouge - FRANCE
>> Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05
>>
>
>
>
> --
> Prashant
>
Re: Dying workers since migration to 0.8.1
Posted by Prashant Sharma <sc...@gmail.com>.
Hi,
Can you give a little more details about the problem apart from a few hints
that would be great !. I would like to exactly what you did and how did you
end up getting those stuck up executors. This can be due to network too.
Are you on ec2 ? in that case ec2 n/w is often unpredictable.
On Wed, Jan 8, 2014 at 6:58 PM, Guillaume Pitel
<gu...@exensa.com>wrote:
> Hi,
>
> We migrated from 0.8.0 to 0.8.1 on Monday, since then we have observed a
> high rate of disappearing (they're not in the list anymore) or dying
> (marked DEAD) workers.
>
> This is particularly strange since the processes often continue running.
>
> Any idea / advice about this particular problem ?
>
> Thanks
>
> Guillaume
> --
> [image: eXenSa]
> *Guillaume PITEL, Président*
> +33(0)6 25 48 86 80
>
> eXenSa S.A.S. <http://www.exensa.com/>
> 41, rue Périer - 92120 Montrouge - FRANCE
> Tel +33(0)1 84 16 36 77 / Fax +33(0)9 72 28 37 05
>
--
Prashant