You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Lars Francke <la...@gmail.com> on 2019/05/13 20:31:54 UTC

Failover & Cold start time and block reports

Hi,

I'm working with a few clusters of 100+ nodes and I've been wondering how
exactly the failover, as well as a cold start, works in respect to the
block reports.

I sometimes see failover times of 15-45 minutes waiting in the safe mode
for all blocks to report in.

Datanodes usually send a report every six hours I believe, so there must be
something else going on.

How are Datanodes informed of the new Namenode?
How do they know that they should send a full block report (assuming this
is what happens)?
-> I assume the answer to both lies in Heartbeats?

Are there any guidelines on how long recovery should take and are there any
options that can be used to decrease the time?

Thank you!

Re: Failover & Cold start time and block reports

Posted by Lars Francke <la...@gmail.com>.

Just pinging to see if anyone has any insight here?

On Mon, May 13, 2019 at 10:31 PM Lars Francke <la...@gmail.com>
wrote:

> Hi,
>
> I'm working with a few clusters of 100+ nodes and I've been wondering how
> exactly the failover, as well as a cold start, works in respect to the
> block reports.
>
> I sometimes see failover times of 15-45 minutes waiting in the safe mode
> for all blocks to report in.
>
> Datanodes usually send a report every six hours I believe, so there must
> be something else going on.
>
> How are Datanodes informed of the new Namenode?
> How do they know that they should send a full block report (assuming this
> is what happens)?
> -> I assume the answer to both lies in Heartbeats?
>
> Are there any guidelines on how long recovery should take and are there
> any options that can be used to decrease the time?
>
> Thank you!
>