You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by twinkle sachdeva <tw...@gmail.com> on 2015/03/03 12:16:06 UTC

delay between removing the block manager of an executor, and marking that as lost

Hi,

Is there any relation between removing block manager of an executor and
marking that as lost?

In my setup,even after removing block manager ( after failing to do some
operation )...it is taking more than 20 mins, to mark that as lost executor.

Following are the logs:

*15/03/03 10:26:49 WARN storage.BlockManagerMaster: Failed to remove
broadcast 20 with removeFromMaster = true - Ask timed out on
[Actor[akka.tcp://sparkExecutor@TMO-DN73:54363/user/BlockManagerActor1#-966525686]]
after [30000 ms]}*

*15/03/03 10:27:41 WARN storage.BlockManagerMasterActor: Removing
BlockManager BlockManagerId(1, TMO-DN73, 47777) with no recent heart beats:
76924ms exceeds 45000ms*

*15/03/03 10:27:41 INFO storage.BlockManagerMasterActor: Removing block
manager BlockManagerId(1, TMO-DN73, 47777)*

*15/03/03 10:49:10 ERROR cluster.YarnClusterScheduler: Lost executor 1 on
TMO-DN73: remote Akka client disassociated*

How can i make this to happen faster?

Thanks,
Twinkle

Re: delay between removing the block manager of an executor, and marking that as lost

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
You can look at the following

- spark.akka.timeout
- spark.akka.heartbeat.pauses

from http://spark.apache.org/docs/1.2.0/configuration.html

Thanks
Best Regards

On Tue, Mar 3, 2015 at 4:46 PM, twinkle sachdeva <twinkle.sachdeva@gmail.com
> wrote:

> Hi,
>
> Is there any relation between removing block manager of an executor and
> marking that as lost?
>
> In my setup,even after removing block manager ( after failing to do some
> operation )...it is taking more than 20 mins, to mark that as lost executor.
>
> Following are the logs:
>
> *15/03/03 10:26:49 WARN storage.BlockManagerMaster: Failed to remove
> broadcast 20 with removeFromMaster = true - Ask timed out on
> [Actor[akka.tcp://sparkExecutor@TMO-DN73:54363/user/BlockManagerActor1#-966525686]]
> after [30000 ms]}*
>
> *15/03/03 10:27:41 WARN storage.BlockManagerMasterActor: Removing
> BlockManager BlockManagerId(1, TMO-DN73, 47777) with no recent heart beats:
> 76924ms exceeds 45000ms*
>
> *15/03/03 10:27:41 INFO storage.BlockManagerMasterActor: Removing block
> manager BlockManagerId(1, TMO-DN73, 47777)*
>
> *15/03/03 10:49:10 ERROR cluster.YarnClusterScheduler: Lost executor 1 on
> TMO-DN73: remote Akka client disassociated*
>
> How can i make this to happen faster?
>
> Thanks,
> Twinkle
>