You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Billy Pearson <sa...@pearsonwholesale.com> on 2009/04/12 02:02:08 UTC

WARN org.apache.hadoop.hdfs.DFSClient: NotReplicatedYetException sleeping

I getting a bunch of WARNS
WARN org.apache.hadoop.hdfs.DFSClient: NotReplicatedYetException sleeping

This is only happening on the hlogs on the servers while under heave import 
30K/sec on 7 server
I tried to bump the hlog size between rolls to 100K in stead of 10K thing 
that would help but the problem is still there
but not as much sense the logs are not rolling as often.

not sure if hbase.regionserver.flushlogentries would help any one else seen 
this I am running 0.19.2-dev branch

Billy

Re: WARN org.apache.hadoop.hdfs.DFSClient: NotReplicatedYetException sleeping

Posted by Billy Pearson <sa...@pearsonwholesale.com>.

greping the datanode looks like I get these messages when it happends

[root@server-5 hadoop]# tail -n500 -f hadoop-root-datanode-server-5.log | 
grep WARN
2009-04-12 01:06:51,099 WARN 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.0.1.5:50010, 
storageID=DS-234949010-10.0.1.5-50010-1237522267977, infoPort=50075, 
ipcPort=50020):Failed to transfer blk_9059760482849889388_248126 to 
10.0.1.4:50010 got java.net.SocketException: Original Exception : 
java.io.IOException: Connection reset by peer
2009-04-12 01:06:53,033 WARN 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.0.1.5:50010, 
storageID=DS-234949010-10.0.1.5-50010-1237522267977, infoPort=50075, 
ipcPort=50020):Got exception while serving blk_865228606208483123_247552 to 
/10.0.1.5:
2009-04-12 01:06:58,400 WARN 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.0.1.5:50010, 
storageID=DS-234949010-10.0.1.5-50010-1237522267977, infoPort=50075, 
ipcPort=50020):Got exception while serving blk_-2405312561519352544_247560 
to /10.0.1.5:
2009-04-12 01:07:06,154 WARN 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.0.1.5:50010, 
storageID=DS-234949010-10.0.1.5-50010-1237522267977, infoPort=50075, 
ipcPort=50020):Got exception while serving blk_-7270111728792506289_247565 
to /10.0.1.5:

[root@server-5 hadoop]# tail -n500 -f hadoop-root-datanode-server-5.log | 
grep ERROR
2009-04-12 01:06:58,400 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.0.1.5:50010, 
storageID=DS-234949010-10.0.1.5-50010-1237522267977, infoPort=50075, 
ipcPort=50020):DataXceiver
2009-04-12 01:07:06,154 ERROR 
org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.0.1.5:50010, 
storageID=DS-234949010-10.0.1.5-50010-1237522267977, infoPort=50075, 
ipcPort=50020):DataXceiver



Not sure if I need to bump handles for the datanodes or not?

Billy



"Andrew Purtell" <ap...@apache.org> wrote in 
message news:479243.71024.qm@web65516.mail.ac4.yahoo.com...
>
> The "Blocks not replicated yet" is a HDFS problem.
> Maybe I am not understanding what you are saying?
>
> So you have not increased the number of xceivers in
> the datanode configs? Are there any messages of
> interest in the datanode logs?
>
>   - Andy
>
>
>> From: Billy Pearson
>> Subject: Re: WARN org.apache.hadoop.hdfs.DFSClient: 
>> NotReplicatedYetException sleeping
>> To: hbase-user@hadoop.apache.org
>> Date: Saturday, April 11, 2009, 8:00 PM
>> Everything is default on them except max open files its some
>> reaily high number
>> the only change I know that could be effecting it is nice
>> level of hbase and hadoop
>> hadoop nice = 5
>> hbase nice = 10
>>
>> That way hbase runs slower then the rest when we get a load
>> I run other stuff on the nodes about 6 hours out of the day
>> but this is happening when there is spare cpu
>>
>> Running dual 2.4ghz with 4GB mem dual 250GB 7200 RPM raid 0
>> drives 3 running with 147GB 15K rpm scsi drive
>> about 8 region avgerage heap on datanodes and regionservers
>> is still 1GB
>>
>> Flushing is happending offten with these high import speeds
>> so could that be blocking the hlog?
>> Sense flushing is happening often then minor compactions
>> are running almost all the time keeping up.
>>
>> Billy
>>
>>
>> "Andrew Purtell" <ap...@apache.org>
>> wrote in message
>> news:899617.77011.qm@web65501.mail.ac4.yahoo.com...
>> >
>> > Hi Billy,
>> >
>> > It makes sense to me that you'd see this on the
>> HLogs
>> > first. HDFS blocks are allocated most frequently for
>> > them, except during compaction.
>> >
>> > Seems like a classic sign of DFS stress to me. What
>> are
>> > your configuration details in terms of max open files,
>> > maximum xceiver limit, and datanode handlers?
>> >
>> >   - Andy
>> >
>> >> From: Billy Pearson
>> >> Subject: WARN org.apache.hadoop.hdfs.DFSClient:
>> NotReplicatedYetException sleeping
>> >> To: hbase-user@hadoop.apache.org
>> >> Date: Saturday, April 11, 2009, 5:02 PM
>> >> I getting a bunch of WARNS
>> >> WARN org.apache.hadoop.hdfs.DFSClient:
>> >> NotReplicatedYetException sleeping
>> >>
>> >> This is only happening on the hlogs on the servers
>> while
>> >> under heave import 30K/sec on 7 server
>> >> I tried to bump the hlog size between rolls to
>> 100K in
>> >> stead of 10K thing that would help but the problem
>> is still
>> >> there but not as much sense the logs are not
>> rolling as
>> >> often.
>> >>
>> >> not sure if hbase.regionserver.flushlogentries
>> would help
>> >> any one else seen this I am running 0.19.2-dev
>> branch
>> >>
>> >> Billy
>> >
>> >
>> >
>> >
>
>
>
>

Re: WARN org.apache.hadoop.hdfs.DFSClient: NotReplicatedYetException sleeping

Posted by Andrew Purtell <ap...@apache.org>.

The "Blocks not replicated yet" is a HDFS problem.
Maybe I am not understanding what you are saying?

So you have not increased the number of xceivers in
the datanode configs? Are there any messages of
interest in the datanode logs?

   - Andy


> From: Billy Pearson
> Subject: Re: WARN org.apache.hadoop.hdfs.DFSClient: NotReplicatedYetException sleeping
> To: hbase-user@hadoop.apache.org
> Date: Saturday, April 11, 2009, 8:00 PM
> Everything is default on them except max open files its some
> reaily high number
> the only change I know that could be effecting it is nice
> level of hbase and hadoop
> hadoop nice = 5
> hbase nice = 10
> 
> That way hbase runs slower then the rest when we get a load
> I run other stuff on the nodes about 6 hours out of the day
> but this is happening when there is spare cpu
> 
> Running dual 2.4ghz with 4GB mem dual 250GB 7200 RPM raid 0
> drives 3 running with 147GB 15K rpm scsi drive
> about 8 region avgerage heap on datanodes and regionservers
> is still 1GB
> 
> Flushing is happending offten with these high import speeds
> so could that be blocking the hlog?
> Sense flushing is happening often then minor compactions
> are running almost all the time keeping up.
> 
> Billy
> 
> 
> "Andrew Purtell" <ap...@apache.org>
> wrote in message
> news:899617.77011.qm@web65501.mail.ac4.yahoo.com...
> > 
> > Hi Billy,
> > 
> > It makes sense to me that you'd see this on the
> HLogs
> > first. HDFS blocks are allocated most frequently for
> > them, except during compaction.
> > 
> > Seems like a classic sign of DFS stress to me. What
> are
> > your configuration details in terms of max open files,
> > maximum xceiver limit, and datanode handlers?
> > 
> >   - Andy
> > 
> >> From: Billy Pearson
> >> Subject: WARN org.apache.hadoop.hdfs.DFSClient:
> NotReplicatedYetException sleeping
> >> To: hbase-user@hadoop.apache.org
> >> Date: Saturday, April 11, 2009, 5:02 PM
> >> I getting a bunch of WARNS
> >> WARN org.apache.hadoop.hdfs.DFSClient:
> >> NotReplicatedYetException sleeping
> >> 
> >> This is only happening on the hlogs on the servers
> while
> >> under heave import 30K/sec on 7 server
> >> I tried to bump the hlog size between rolls to
> 100K in
> >> stead of 10K thing that would help but the problem
> is still
> >> there but not as much sense the logs are not
> rolling as
> >> often.
> >> 
> >> not sure if hbase.regionserver.flushlogentries
> would help
> >> any one else seen this I am running 0.19.2-dev
> branch
> >> 
> >> Billy
> > 
> > 
> > 
> >

Re: WARN org.apache.hadoop.hdfs.DFSClient: NotReplicatedYetException sleeping

Posted by Billy Pearson <sa...@pearsonwholesale.com>.

Everything is default on them except max open files its some reaily high 
number
the only change I know that could be effecting it is nice level of hbase and 
hadoop
hadoop nice = 5
hbase nice = 10

That way hbase runs slower then the rest when we get a load I run other 
stuff on the nodes about 6 hours out of the day but this is happening when 
there is spare cpu

Running dual 2.4ghz with 4GB mem dual 250GB 7200 RPM raid 0 drives 3 running 
with 147GB 15K rpm scsi drive
about 8 region avgerage heap on datanodes and regionservers is still 1GB

Flushing is happending offten with these high import speeds so could that be 
blocking the hlog?
Sense flushing is happening often then minor compactions are running almost 
all the time keeping up.

Billy


"Andrew Purtell" <ap...@apache.org> wrote in 
message news:899617.77011.qm@web65501.mail.ac4.yahoo.com...
>
> Hi Billy,
>
> It makes sense to me that you'd see this on the HLogs
> first. HDFS blocks are allocated most frequently for
> them, except during compaction.
>
> Seems like a classic sign of DFS stress to me. What are
> your configuration details in terms of max open files,
> maximum xceiver limit, and datanode handlers?
>
>   - Andy
>
>> From: Billy Pearson
>> Subject: WARN org.apache.hadoop.hdfs.DFSClient: NotReplicatedYetException 
>> sleeping
>> To: hbase-user@hadoop.apache.org
>> Date: Saturday, April 11, 2009, 5:02 PM
>> I getting a bunch of WARNS
>> WARN org.apache.hadoop.hdfs.DFSClient:
>> NotReplicatedYetException sleeping
>>
>> This is only happening on the hlogs on the servers while
>> under heave import 30K/sec on 7 server
>> I tried to bump the hlog size between rolls to 100K in
>> stead of 10K thing that would help but the problem is still
>> there but not as much sense the logs are not rolling as
>> often.
>>
>> not sure if hbase.regionserver.flushlogentries would help
>> any one else seen this I am running 0.19.2-dev branch
>>
>> Billy
>
>
>
>

Re: WARN org.apache.hadoop.hdfs.DFSClient: NotReplicatedYetException sleeping

Posted by Andrew Purtell <ap...@apache.org>.

Hi Billy,

It makes sense to me that you'd see this on the HLogs
first. HDFS blocks are allocated most frequently for
them, except during compaction. 

Seems like a classic sign of DFS stress to me. What are
your configuration details in terms of max open files,
maximum xceiver limit, and datanode handlers?

   - Andy

> From: Billy Pearson
> Subject: WARN org.apache.hadoop.hdfs.DFSClient: NotReplicatedYetException sleeping
> To: hbase-user@hadoop.apache.org
> Date: Saturday, April 11, 2009, 5:02 PM
> I getting a bunch of WARNS
> WARN org.apache.hadoop.hdfs.DFSClient:
> NotReplicatedYetException sleeping
> 
> This is only happening on the hlogs on the servers while
> under heave import 30K/sec on 7 server
> I tried to bump the hlog size between rolls to 100K in
> stead of 10K thing that would help but the problem is still
> there but not as much sense the logs are not rolling as
> often.
> 
> not sure if hbase.regionserver.flushlogentries would help
> any one else seen this I am running 0.19.2-dev branch
> 
> Billy