You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by twinkle sachdeva <tw...@gmail.com> on 2015/03/03 12:16:06 UTC
delay between removing the block manager of an executor, and marking
that as lost
Hi,
Is there any relation between removing block manager of an executor and
marking that as lost?
In my setup,even after removing block manager ( after failing to do some
operation )...it is taking more than 20 mins, to mark that as lost executor.
Following are the logs:
*15/03/03 10:26:49 WARN storage.BlockManagerMaster: Failed to remove
broadcast 20 with removeFromMaster = true - Ask timed out on
[Actor[akka.tcp://sparkExecutor@TMO-DN73:54363/user/BlockManagerActor1#-966525686]]
after [30000 ms]}*
*15/03/03 10:27:41 WARN storage.BlockManagerMasterActor: Removing
BlockManager BlockManagerId(1, TMO-DN73, 47777) with no recent heart beats:
76924ms exceeds 45000ms*
*15/03/03 10:27:41 INFO storage.BlockManagerMasterActor: Removing block
manager BlockManagerId(1, TMO-DN73, 47777)*
*15/03/03 10:49:10 ERROR cluster.YarnClusterScheduler: Lost executor 1 on
TMO-DN73: remote Akka client disassociated*
How can i make this to happen faster?
Thanks,
Twinkle
Re: delay between removing the block manager of an executor, and
marking that as lost
Posted by Akhil Das <ak...@sigmoidanalytics.com>.
You can look at the following
- spark.akka.timeout
- spark.akka.heartbeat.pauses
from http://spark.apache.org/docs/1.2.0/configuration.html
Thanks
Best Regards
On Tue, Mar 3, 2015 at 4:46 PM, twinkle sachdeva <twinkle.sachdeva@gmail.com
> wrote:
> Hi,
>
> Is there any relation between removing block manager of an executor and
> marking that as lost?
>
> In my setup,even after removing block manager ( after failing to do some
> operation )...it is taking more than 20 mins, to mark that as lost executor.
>
> Following are the logs:
>
> *15/03/03 10:26:49 WARN storage.BlockManagerMaster: Failed to remove
> broadcast 20 with removeFromMaster = true - Ask timed out on
> [Actor[akka.tcp://sparkExecutor@TMO-DN73:54363/user/BlockManagerActor1#-966525686]]
> after [30000 ms]}*
>
> *15/03/03 10:27:41 WARN storage.BlockManagerMasterActor: Removing
> BlockManager BlockManagerId(1, TMO-DN73, 47777) with no recent heart beats:
> 76924ms exceeds 45000ms*
>
> *15/03/03 10:27:41 INFO storage.BlockManagerMasterActor: Removing block
> manager BlockManagerId(1, TMO-DN73, 47777)*
>
> *15/03/03 10:49:10 ERROR cluster.YarnClusterScheduler: Lost executor 1 on
> TMO-DN73: remote Akka client disassociated*
>
> How can i make this to happen faster?
>
> Thanks,
> Twinkle
>