You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Chen Song <ch...@gmail.com> on 2015/02/10 18:37:24 UTC

missing data blocks after active name node crashes

When the active name node crashes, it seems there is always a chance that
the data blocks in flight will be missing.

My understanding is that when the active name node crashes, the metadata of
data blocks in transition which exist in active name node memory is not
successfully captured by journal nodes and thus not available on standby
name node when it is promoted to active by zkfc.

Is my understanding correct? Any way to mitigate this problem or race
condition?

-- 
Chen Song

Re: missing data blocks after active name node crashes

Posted by Chen Song <ch...@gmail.com>.
Thanks for the reply, Ravi.

In my case, what I see constantly is there are always missing blocks every
time active name node crashes. The active name node crashes because of
timeout on journal nodes.

Could this be a specific case which could lead to missing blocks?

Chen

On Tue, Feb 10, 2015 at 2:20 PM, Ravi Prakash <ra...@ymail.com> wrote:

> Hi Chen!
>
> From my understanding, every operation on the Namenode is logged (and
> flushed) to disk / QJM / shared storage. This includes the addBlock
> operation. So when a client requests to write a new block, the metadata is
> logged by the active NN, so even if it crashes later on, the new active NN
> would still see the creation of the block.
>
> HTH
> Ravi
>
>
>   On Tuesday, February 10, 2015 9:38 AM, Chen Song <ch...@gmail.com>
> wrote:
>
>
> When the active name node crashes, it seems there is always a chance that
> the data blocks in flight will be missing.
> My understanding is that when the active name node crashes, the metadata
> of data blocks in transition which exist in active name node memory is not
> successfully captured by journal nodes and thus not available on standby
> name node when it is promoted to active by zkfc.
> Is my understanding correct? Any way to mitigate this problem or race
> condition?
>
> --
> Chen Song
>
>
>
>


-- 
Chen Song

Re: missing data blocks after active name node crashes

Posted by Chen Song <ch...@gmail.com>.
Thanks for the reply, Ravi.

In my case, what I see constantly is there are always missing blocks every
time active name node crashes. The active name node crashes because of
timeout on journal nodes.

Could this be a specific case which could lead to missing blocks?

Chen

On Tue, Feb 10, 2015 at 2:20 PM, Ravi Prakash <ra...@ymail.com> wrote:

> Hi Chen!
>
> From my understanding, every operation on the Namenode is logged (and
> flushed) to disk / QJM / shared storage. This includes the addBlock
> operation. So when a client requests to write a new block, the metadata is
> logged by the active NN, so even if it crashes later on, the new active NN
> would still see the creation of the block.
>
> HTH
> Ravi
>
>
>   On Tuesday, February 10, 2015 9:38 AM, Chen Song <ch...@gmail.com>
> wrote:
>
>
> When the active name node crashes, it seems there is always a chance that
> the data blocks in flight will be missing.
> My understanding is that when the active name node crashes, the metadata
> of data blocks in transition which exist in active name node memory is not
> successfully captured by journal nodes and thus not available on standby
> name node when it is promoted to active by zkfc.
> Is my understanding correct? Any way to mitigate this problem or race
> condition?
>
> --
> Chen Song
>
>
>
>


-- 
Chen Song

Re: missing data blocks after active name node crashes

Posted by Chen Song <ch...@gmail.com>.
Thanks for the reply, Ravi.

In my case, what I see constantly is there are always missing blocks every
time active name node crashes. The active name node crashes because of
timeout on journal nodes.

Could this be a specific case which could lead to missing blocks?

Chen

On Tue, Feb 10, 2015 at 2:20 PM, Ravi Prakash <ra...@ymail.com> wrote:

> Hi Chen!
>
> From my understanding, every operation on the Namenode is logged (and
> flushed) to disk / QJM / shared storage. This includes the addBlock
> operation. So when a client requests to write a new block, the metadata is
> logged by the active NN, so even if it crashes later on, the new active NN
> would still see the creation of the block.
>
> HTH
> Ravi
>
>
>   On Tuesday, February 10, 2015 9:38 AM, Chen Song <ch...@gmail.com>
> wrote:
>
>
> When the active name node crashes, it seems there is always a chance that
> the data blocks in flight will be missing.
> My understanding is that when the active name node crashes, the metadata
> of data blocks in transition which exist in active name node memory is not
> successfully captured by journal nodes and thus not available on standby
> name node when it is promoted to active by zkfc.
> Is my understanding correct? Any way to mitigate this problem or race
> condition?
>
> --
> Chen Song
>
>
>
>


-- 
Chen Song

Re: missing data blocks after active name node crashes

Posted by Chen Song <ch...@gmail.com>.
Thanks for the reply, Ravi.

In my case, what I see constantly is there are always missing blocks every
time active name node crashes. The active name node crashes because of
timeout on journal nodes.

Could this be a specific case which could lead to missing blocks?

Chen

On Tue, Feb 10, 2015 at 2:20 PM, Ravi Prakash <ra...@ymail.com> wrote:

> Hi Chen!
>
> From my understanding, every operation on the Namenode is logged (and
> flushed) to disk / QJM / shared storage. This includes the addBlock
> operation. So when a client requests to write a new block, the metadata is
> logged by the active NN, so even if it crashes later on, the new active NN
> would still see the creation of the block.
>
> HTH
> Ravi
>
>
>   On Tuesday, February 10, 2015 9:38 AM, Chen Song <ch...@gmail.com>
> wrote:
>
>
> When the active name node crashes, it seems there is always a chance that
> the data blocks in flight will be missing.
> My understanding is that when the active name node crashes, the metadata
> of data blocks in transition which exist in active name node memory is not
> successfully captured by journal nodes and thus not available on standby
> name node when it is promoted to active by zkfc.
> Is my understanding correct? Any way to mitigate this problem or race
> condition?
>
> --
> Chen Song
>
>
>
>


-- 
Chen Song

Re: missing data blocks after active name node crashes

Posted by Ravi Prakash <ra...@ymail.com>.
Hi Chen!
>From my understanding, every operation on the Namenode is logged (and flushed) to disk / QJM / shared storage. This includes the addBlock operation. So when a client requests to write a new block, the metadata is logged by the active NN, so even if it crashes later on, the new active NN would still see the creation of the block.
HTH
Ravi
 

     On Tuesday, February 10, 2015 9:38 AM, Chen Song <ch...@gmail.com> wrote:
   

 When the active name node crashes, it seems there is always a chance that the data blocks in flight will be missing.My understanding is that when the active name node crashes, the metadata of data blocks in transition which exist in active name node memory is not successfully captured by journal nodes and thus not available on standby name node when it is promoted to active by zkfc.Is my understanding correct? Any way to mitigate this problem or race condition?
-- 
Chen Song



    

Re: missing data blocks after active name node crashes

Posted by Ravi Prakash <ra...@ymail.com>.
Hi Chen!
>From my understanding, every operation on the Namenode is logged (and flushed) to disk / QJM / shared storage. This includes the addBlock operation. So when a client requests to write a new block, the metadata is logged by the active NN, so even if it crashes later on, the new active NN would still see the creation of the block.
HTH
Ravi
 

     On Tuesday, February 10, 2015 9:38 AM, Chen Song <ch...@gmail.com> wrote:
   

 When the active name node crashes, it seems there is always a chance that the data blocks in flight will be missing.My understanding is that when the active name node crashes, the metadata of data blocks in transition which exist in active name node memory is not successfully captured by journal nodes and thus not available on standby name node when it is promoted to active by zkfc.Is my understanding correct? Any way to mitigate this problem or race condition?
-- 
Chen Song



    

Re: missing data blocks after active name node crashes

Posted by Ravi Prakash <ra...@ymail.com>.
Hi Chen!
>From my understanding, every operation on the Namenode is logged (and flushed) to disk / QJM / shared storage. This includes the addBlock operation. So when a client requests to write a new block, the metadata is logged by the active NN, so even if it crashes later on, the new active NN would still see the creation of the block.
HTH
Ravi
 

     On Tuesday, February 10, 2015 9:38 AM, Chen Song <ch...@gmail.com> wrote:
   

 When the active name node crashes, it seems there is always a chance that the data blocks in flight will be missing.My understanding is that when the active name node crashes, the metadata of data blocks in transition which exist in active name node memory is not successfully captured by journal nodes and thus not available on standby name node when it is promoted to active by zkfc.Is my understanding correct? Any way to mitigate this problem or race condition?
-- 
Chen Song



    

Re: missing data blocks after active name node crashes

Posted by Ravi Prakash <ra...@ymail.com>.
Hi Chen!
>From my understanding, every operation on the Namenode is logged (and flushed) to disk / QJM / shared storage. This includes the addBlock operation. So when a client requests to write a new block, the metadata is logged by the active NN, so even if it crashes later on, the new active NN would still see the creation of the block.
HTH
Ravi
 

     On Tuesday, February 10, 2015 9:38 AM, Chen Song <ch...@gmail.com> wrote:
   

 When the active name node crashes, it seems there is always a chance that the data blocks in flight will be missing.My understanding is that when the active name node crashes, the metadata of data blocks in transition which exist in active name node memory is not successfully captured by journal nodes and thus not available on standby name node when it is promoted to active by zkfc.Is my understanding correct? Any way to mitigate this problem or race condition?
-- 
Chen Song