You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Cooper Bethea <co...@siftscience.com> on 2014/01/08 21:00:37 UTC

persistent under-replicated blocks

Hi HDFS developers,

I have a worrying problem in a 2.0.0-cdh4.4.0 HDFS cluster I am running. 9
blocks in the cluster are persistently reported to be under-replicated per
"hdfs fsck".

I am able to fetch the files that contain these blocks, so I know that the
data is there, but for some reason replication is not taking effect. In
hopes of getting the cluster to notice that there were under-replicated
blocks I tried using "hdfs dfs -setrep" to raise the replication factor,
but the cluster continues to report a single replica for each of these
blocks. When viewing master logs I see that the replication factor change
is respected, but there are no messages that refer to the under-replicated
blocks.

Thanks for your time. Please let me know what I can do to investigate
further.

Re: persistent under-replicated blocks

Posted by Cooper Bethea <co...@siftscience.com>.
+ hdfs-dev for posterity

Thanks, Andrew. . I've been able to manually replicate one of the
under-replicated blocks by scp-ing the block file and its .meta file to
other datanodes and restarting them as you suggest. Once I get all the data
fully replicated I'll try to retrieve the information you've asked for.


On Thu, Jan 9, 2014 at 1:56 PM, Andrew Wang <an...@cloudera.com>wrote:

> Hi Chris,
>
> BCC'ing hdfs-dev@ since you're using CDH, moving us to cdh-user@.
>
> You should be able to manually copy the under-replicated blocks and md5
> files to a different datanode and restart it. I'm curious that you're
> having this issue though, I haven't encountered it before. Can you send
> your NN logs to me, either as an attachment or a file drop? Also, what
> version of CDH are you using?
>
> Here are also a few ideas for things you can check:
>
> * There are a number of block replication stats available in the NN /jmx
> webui, e.g. PendingReplicationBlocks, UnderReplicatedBlocks,
> ScheduledReplicationBlocks. This will let you know if the NN is at least
> attempting to replicate your blocks (pending and scheduled).
> * Look in the NN log for BlockPlacementPolicy errors. It'll help to enable
> DEBUG level output here.
>
> Best,
> Andrew
>
>
> On Thu, Jan 9, 2014 at 10:46 AM, Cooper Bethea <coops@siftscience.com
> >wrote:
>
> > I have only 9 under-replicated blocks on the cluster, and it is very
> > important that I restore my cluster to a fully-replicated state. Is
> there a
> > way I can manually copy these blocks to other datanodes, or perhaps new
> > datanodes?
> >
> >
> > On Thu, Jan 9, 2014 at 10:34 AM, Cooper Bethea <coops@siftscience.com
> > >wrote:
> >
> > > Chris, Steve, thanks for responding.
> > >
> > > Overnight I ran a script to bump replication, then lower it, as Chris
> > > suggested. There has been no effect--all underreplicated blocks still
> > have
> > > only 1 replica.
> > >
> > > Steve, I am running the rebalancer.
> > >
> > >
> > > On Thu, Jan 9, 2014 at 1:33 AM, Steve Loughran <stevel@hortonworks.com
> > >wrote:
> > >
> > >> are you  running the rebalancer?
> > >>
> > >>
> > >> On 9 January 2014 04:40, Chris Embree <ce...@gmail.com> wrote:
> > >>
> > >> > It's too bad that this hasn't been corrected in HDFS 2.0....  I
> have a
> > >> > script that I run several times a day to ensure that blocks are
> > >> replicated
> > >> > correctly.  Here a link to an article about it:
> > >> > http://dataforprofit.com/?p=427
> > >> >
> > >> >
> > >> > On Wed, Jan 8, 2014 at 9:00 PM, Cooper Bethea <
> coops@siftscience.com>
> > >> > wrote:
> > >> >
> > >> > > Following on--is there a way that I can forcibly replicate these
> > >> blocks,
> > >> > > perhaps by rsyncing the underlying files to other datanodes? As
> you
> > >> might
> > >> > > imagine under-replicated data makes me very uneasy.
> > >> > >
> > >> > >
> > >> > > On Wed, Jan 8, 2014 at 12:00 PM, Cooper Bethea <
> > coops@siftscience.com
> > >> > > >wrote:
> > >> > >
> > >> > > > Hi HDFS developers,
> > >> > > >
> > >> > > > I have a worrying problem in a 2.0.0-cdh4.4.0 HDFS cluster I am
> > >> > running.
> > >> > > 9
> > >> > > > blocks in the cluster are persistently reported to be
> > >> under-replicated
> > >> > > per
> > >> > > > "hdfs fsck".
> > >> > > >
> > >> > > > I am able to fetch the files that contain these blocks, so I
> know
> > >> that
> > >> > > the
> > >> > > > data is there, but for some reason replication is not taking
> > >> effect. In
> > >> > > > hopes of getting the cluster to notice that there were
> > >> under-replicated
> > >> > > > blocks I tried using "hdfs dfs -setrep" to raise the replication
> > >> > factor,
> > >> > > > but the cluster continues to report a single replica for each of
> > >> these
> > >> > > > blocks. When viewing master logs I see that the replication
> factor
> > >> > change
> > >> > > > is respected, but there are no messages that refer to the
> > >> > > under-replicated
> > >> > > > blocks.
> > >> > > >
> > >> > > > Thanks for your time. Please let me know what I can do to
> > >> investigate
> > >> > > > further.
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >> --
> > >> CONFIDENTIALITY NOTICE
> > >> NOTICE: This message is intended for the use of the individual or
> entity
> > >> to
> > >> which it is addressed and may contain information that is
> confidential,
> > >> privileged and exempt from disclosure under applicable law. If the
> > reader
> > >> of this message is not the intended recipient, you are hereby notified
> > >> that
> > >> any printing, copying, dissemination, distribution, disclosure or
> > >> forwarding of this communication is strictly prohibited. If you have
> > >> received this communication in error, please contact the sender
> > >> immediately
> > >> and delete it from your system. Thank You.
> > >>
> > >
> > >
> >
>

Re: persistent under-replicated blocks

Posted by Andrew Wang <an...@cloudera.com>.
Hi Chris,

BCC'ing hdfs-dev@ since you're using CDH, moving us to cdh-user@.

You should be able to manually copy the under-replicated blocks and md5
files to a different datanode and restart it. I'm curious that you're
having this issue though, I haven't encountered it before. Can you send
your NN logs to me, either as an attachment or a file drop? Also, what
version of CDH are you using?

Here are also a few ideas for things you can check:

* There are a number of block replication stats available in the NN /jmx
webui, e.g. PendingReplicationBlocks, UnderReplicatedBlocks,
ScheduledReplicationBlocks. This will let you know if the NN is at least
attempting to replicate your blocks (pending and scheduled).
* Look in the NN log for BlockPlacementPolicy errors. It'll help to enable
DEBUG level output here.

Best,
Andrew


On Thu, Jan 9, 2014 at 10:46 AM, Cooper Bethea <co...@siftscience.com>wrote:

> I have only 9 under-replicated blocks on the cluster, and it is very
> important that I restore my cluster to a fully-replicated state. Is there a
> way I can manually copy these blocks to other datanodes, or perhaps new
> datanodes?
>
>
> On Thu, Jan 9, 2014 at 10:34 AM, Cooper Bethea <coops@siftscience.com
> >wrote:
>
> > Chris, Steve, thanks for responding.
> >
> > Overnight I ran a script to bump replication, then lower it, as Chris
> > suggested. There has been no effect--all underreplicated blocks still
> have
> > only 1 replica.
> >
> > Steve, I am running the rebalancer.
> >
> >
> > On Thu, Jan 9, 2014 at 1:33 AM, Steve Loughran <stevel@hortonworks.com
> >wrote:
> >
> >> are you  running the rebalancer?
> >>
> >>
> >> On 9 January 2014 04:40, Chris Embree <ce...@gmail.com> wrote:
> >>
> >> > It's too bad that this hasn't been corrected in HDFS 2.0....  I have a
> >> > script that I run several times a day to ensure that blocks are
> >> replicated
> >> > correctly.  Here a link to an article about it:
> >> > http://dataforprofit.com/?p=427
> >> >
> >> >
> >> > On Wed, Jan 8, 2014 at 9:00 PM, Cooper Bethea <co...@siftscience.com>
> >> > wrote:
> >> >
> >> > > Following on--is there a way that I can forcibly replicate these
> >> blocks,
> >> > > perhaps by rsyncing the underlying files to other datanodes? As you
> >> might
> >> > > imagine under-replicated data makes me very uneasy.
> >> > >
> >> > >
> >> > > On Wed, Jan 8, 2014 at 12:00 PM, Cooper Bethea <
> coops@siftscience.com
> >> > > >wrote:
> >> > >
> >> > > > Hi HDFS developers,
> >> > > >
> >> > > > I have a worrying problem in a 2.0.0-cdh4.4.0 HDFS cluster I am
> >> > running.
> >> > > 9
> >> > > > blocks in the cluster are persistently reported to be
> >> under-replicated
> >> > > per
> >> > > > "hdfs fsck".
> >> > > >
> >> > > > I am able to fetch the files that contain these blocks, so I know
> >> that
> >> > > the
> >> > > > data is there, but for some reason replication is not taking
> >> effect. In
> >> > > > hopes of getting the cluster to notice that there were
> >> under-replicated
> >> > > > blocks I tried using "hdfs dfs -setrep" to raise the replication
> >> > factor,
> >> > > > but the cluster continues to report a single replica for each of
> >> these
> >> > > > blocks. When viewing master logs I see that the replication factor
> >> > change
> >> > > > is respected, but there are no messages that refer to the
> >> > > under-replicated
> >> > > > blocks.
> >> > > >
> >> > > > Thanks for your time. Please let me know what I can do to
> >> investigate
> >> > > > further.
> >> > > >
> >> > >
> >> >
> >>
> >> --
> >> CONFIDENTIALITY NOTICE
> >> NOTICE: This message is intended for the use of the individual or entity
> >> to
> >> which it is addressed and may contain information that is confidential,
> >> privileged and exempt from disclosure under applicable law. If the
> reader
> >> of this message is not the intended recipient, you are hereby notified
> >> that
> >> any printing, copying, dissemination, distribution, disclosure or
> >> forwarding of this communication is strictly prohibited. If you have
> >> received this communication in error, please contact the sender
> >> immediately
> >> and delete it from your system. Thank You.
> >>
> >
> >
>

Re: persistent under-replicated blocks

Posted by Cooper Bethea <co...@siftscience.com>.
I have only 9 under-replicated blocks on the cluster, and it is very
important that I restore my cluster to a fully-replicated state. Is there a
way I can manually copy these blocks to other datanodes, or perhaps new
datanodes?


On Thu, Jan 9, 2014 at 10:34 AM, Cooper Bethea <co...@siftscience.com>wrote:

> Chris, Steve, thanks for responding.
>
> Overnight I ran a script to bump replication, then lower it, as Chris
> suggested. There has been no effect--all underreplicated blocks still have
> only 1 replica.
>
> Steve, I am running the rebalancer.
>
>
> On Thu, Jan 9, 2014 at 1:33 AM, Steve Loughran <st...@hortonworks.com>wrote:
>
>> are you  running the rebalancer?
>>
>>
>> On 9 January 2014 04:40, Chris Embree <ce...@gmail.com> wrote:
>>
>> > It's too bad that this hasn't been corrected in HDFS 2.0....  I have a
>> > script that I run several times a day to ensure that blocks are
>> replicated
>> > correctly.  Here a link to an article about it:
>> > http://dataforprofit.com/?p=427
>> >
>> >
>> > On Wed, Jan 8, 2014 at 9:00 PM, Cooper Bethea <co...@siftscience.com>
>> > wrote:
>> >
>> > > Following on--is there a way that I can forcibly replicate these
>> blocks,
>> > > perhaps by rsyncing the underlying files to other datanodes? As you
>> might
>> > > imagine under-replicated data makes me very uneasy.
>> > >
>> > >
>> > > On Wed, Jan 8, 2014 at 12:00 PM, Cooper Bethea <coops@siftscience.com
>> > > >wrote:
>> > >
>> > > > Hi HDFS developers,
>> > > >
>> > > > I have a worrying problem in a 2.0.0-cdh4.4.0 HDFS cluster I am
>> > running.
>> > > 9
>> > > > blocks in the cluster are persistently reported to be
>> under-replicated
>> > > per
>> > > > "hdfs fsck".
>> > > >
>> > > > I am able to fetch the files that contain these blocks, so I know
>> that
>> > > the
>> > > > data is there, but for some reason replication is not taking
>> effect. In
>> > > > hopes of getting the cluster to notice that there were
>> under-replicated
>> > > > blocks I tried using "hdfs dfs -setrep" to raise the replication
>> > factor,
>> > > > but the cluster continues to report a single replica for each of
>> these
>> > > > blocks. When viewing master logs I see that the replication factor
>> > change
>> > > > is respected, but there are no messages that refer to the
>> > > under-replicated
>> > > > blocks.
>> > > >
>> > > > Thanks for your time. Please let me know what I can do to
>> investigate
>> > > > further.
>> > > >
>> > >
>> >
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified
>> that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender
>> immediately
>> and delete it from your system. Thank You.
>>
>
>

Re: persistent under-replicated blocks

Posted by Cooper Bethea <co...@siftscience.com>.
Chris, Steve, thanks for responding.

Overnight I ran a script to bump replication, then lower it, as Chris
suggested. There has been no effect--all underreplicated blocks still have
only 1 replica.

Steve, I am running the rebalancer.


On Thu, Jan 9, 2014 at 1:33 AM, Steve Loughran <st...@hortonworks.com>wrote:

> are you  running the rebalancer?
>
>
> On 9 January 2014 04:40, Chris Embree <ce...@gmail.com> wrote:
>
> > It's too bad that this hasn't been corrected in HDFS 2.0....  I have a
> > script that I run several times a day to ensure that blocks are
> replicated
> > correctly.  Here a link to an article about it:
> > http://dataforprofit.com/?p=427
> >
> >
> > On Wed, Jan 8, 2014 at 9:00 PM, Cooper Bethea <co...@siftscience.com>
> > wrote:
> >
> > > Following on--is there a way that I can forcibly replicate these
> blocks,
> > > perhaps by rsyncing the underlying files to other datanodes? As you
> might
> > > imagine under-replicated data makes me very uneasy.
> > >
> > >
> > > On Wed, Jan 8, 2014 at 12:00 PM, Cooper Bethea <coops@siftscience.com
> > > >wrote:
> > >
> > > > Hi HDFS developers,
> > > >
> > > > I have a worrying problem in a 2.0.0-cdh4.4.0 HDFS cluster I am
> > running.
> > > 9
> > > > blocks in the cluster are persistently reported to be
> under-replicated
> > > per
> > > > "hdfs fsck".
> > > >
> > > > I am able to fetch the files that contain these blocks, so I know
> that
> > > the
> > > > data is there, but for some reason replication is not taking effect.
> In
> > > > hopes of getting the cluster to notice that there were
> under-replicated
> > > > blocks I tried using "hdfs dfs -setrep" to raise the replication
> > factor,
> > > > but the cluster continues to report a single replica for each of
> these
> > > > blocks. When viewing master logs I see that the replication factor
> > change
> > > > is respected, but there are no messages that refer to the
> > > under-replicated
> > > > blocks.
> > > >
> > > > Thanks for your time. Please let me know what I can do to investigate
> > > > further.
> > > >
> > >
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Re: persistent under-replicated blocks

Posted by Steve Loughran <st...@hortonworks.com>.
are you  running the rebalancer?


On 9 January 2014 04:40, Chris Embree <ce...@gmail.com> wrote:

> It's too bad that this hasn't been corrected in HDFS 2.0....  I have a
> script that I run several times a day to ensure that blocks are replicated
> correctly.  Here a link to an article about it:
> http://dataforprofit.com/?p=427
>
>
> On Wed, Jan 8, 2014 at 9:00 PM, Cooper Bethea <co...@siftscience.com>
> wrote:
>
> > Following on--is there a way that I can forcibly replicate these blocks,
> > perhaps by rsyncing the underlying files to other datanodes? As you might
> > imagine under-replicated data makes me very uneasy.
> >
> >
> > On Wed, Jan 8, 2014 at 12:00 PM, Cooper Bethea <coops@siftscience.com
> > >wrote:
> >
> > > Hi HDFS developers,
> > >
> > > I have a worrying problem in a 2.0.0-cdh4.4.0 HDFS cluster I am
> running.
> > 9
> > > blocks in the cluster are persistently reported to be under-replicated
> > per
> > > "hdfs fsck".
> > >
> > > I am able to fetch the files that contain these blocks, so I know that
> > the
> > > data is there, but for some reason replication is not taking effect. In
> > > hopes of getting the cluster to notice that there were under-replicated
> > > blocks I tried using "hdfs dfs -setrep" to raise the replication
> factor,
> > > but the cluster continues to report a single replica for each of these
> > > blocks. When viewing master logs I see that the replication factor
> change
> > > is respected, but there are no messages that refer to the
> > under-replicated
> > > blocks.
> > >
> > > Thanks for your time. Please let me know what I can do to investigate
> > > further.
> > >
> >
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: persistent under-replicated blocks

Posted by Chris Embree <ce...@gmail.com>.
It's too bad that this hasn't been corrected in HDFS 2.0....  I have a
script that I run several times a day to ensure that blocks are replicated
correctly.  Here a link to an article about it:
http://dataforprofit.com/?p=427


On Wed, Jan 8, 2014 at 9:00 PM, Cooper Bethea <co...@siftscience.com> wrote:

> Following on--is there a way that I can forcibly replicate these blocks,
> perhaps by rsyncing the underlying files to other datanodes? As you might
> imagine under-replicated data makes me very uneasy.
>
>
> On Wed, Jan 8, 2014 at 12:00 PM, Cooper Bethea <coops@siftscience.com
> >wrote:
>
> > Hi HDFS developers,
> >
> > I have a worrying problem in a 2.0.0-cdh4.4.0 HDFS cluster I am running.
> 9
> > blocks in the cluster are persistently reported to be under-replicated
> per
> > "hdfs fsck".
> >
> > I am able to fetch the files that contain these blocks, so I know that
> the
> > data is there, but for some reason replication is not taking effect. In
> > hopes of getting the cluster to notice that there were under-replicated
> > blocks I tried using "hdfs dfs -setrep" to raise the replication factor,
> > but the cluster continues to report a single replica for each of these
> > blocks. When viewing master logs I see that the replication factor change
> > is respected, but there are no messages that refer to the
> under-replicated
> > blocks.
> >
> > Thanks for your time. Please let me know what I can do to investigate
> > further.
> >
>

Re: persistent under-replicated blocks

Posted by Chris Embree <ce...@gmail.com>.
Hm, I had hoped this would have been fixed in hdfs 2.  I have a script that
I run several times per day that identifies under replicated blocks and
increases the replication factor by 1.  It then reduces the replication
factor back to normal.

I can dig up a link if you need it.
On Jan 8, 2014 9:00 PM, "Cooper Bethea" <co...@siftscience.com> wrote:

> Following on--is there a way that I can forcibly replicate these blocks,
> perhaps by rsyncing the underlying files to other datanodes? As you might
> imagine under-replicated data makes me very uneasy.
>
>
> On Wed, Jan 8, 2014 at 12:00 PM, Cooper Bethea <coops@siftscience.com
> >wrote:
>
> > Hi HDFS developers,
> >
> > I have a worrying problem in a 2.0.0-cdh4.4.0 HDFS cluster I am running.
> 9
> > blocks in the cluster are persistently reported to be under-replicated
> per
> > "hdfs fsck".
> >
> > I am able to fetch the files that contain these blocks, so I know that
> the
> > data is there, but for some reason replication is not taking effect. In
> > hopes of getting the cluster to notice that there were under-replicated
> > blocks I tried using "hdfs dfs -setrep" to raise the replication factor,
> > but the cluster continues to report a single replica for each of these
> > blocks. When viewing master logs I see that the replication factor change
> > is respected, but there are no messages that refer to the
> under-replicated
> > blocks.
> >
> > Thanks for your time. Please let me know what I can do to investigate
> > further.
> >
>

Re: persistent under-replicated blocks

Posted by Cooper Bethea <co...@siftscience.com>.
Following on--is there a way that I can forcibly replicate these blocks,
perhaps by rsyncing the underlying files to other datanodes? As you might
imagine under-replicated data makes me very uneasy.


On Wed, Jan 8, 2014 at 12:00 PM, Cooper Bethea <co...@siftscience.com>wrote:

> Hi HDFS developers,
>
> I have a worrying problem in a 2.0.0-cdh4.4.0 HDFS cluster I am running. 9
> blocks in the cluster are persistently reported to be under-replicated per
> "hdfs fsck".
>
> I am able to fetch the files that contain these blocks, so I know that the
> data is there, but for some reason replication is not taking effect. In
> hopes of getting the cluster to notice that there were under-replicated
> blocks I tried using "hdfs dfs -setrep" to raise the replication factor,
> but the cluster continues to report a single replica for each of these
> blocks. When viewing master logs I see that the replication factor change
> is respected, but there are no messages that refer to the under-replicated
> blocks.
>
> Thanks for your time. Please let me know what I can do to investigate
> further.
>