You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Wayne <wa...@gmail.com> on 2011/01/10 18:20:51 UTC

HDFS Errors Deleting Blocks

We are seeing a lot of warnings and errors in the HDFS logs (examples
below). I am looking for any help or recommendations anyone can provide. It
almost looks like compaction/splits occur and errors occur while looking for
the old data. Could this be true? Are these warnings and errors "normal"?

2011-01-10 16:55:48,920 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing
datanode Command java.io.IOException: Error in deleting blocks.



2011-01-10 05:33:25,353 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: Unexpected error
trying to delete block blk_6074091327320053721_1406212. BlockInfo not
found in volumeMap.

We also see a lot of messages like the one below.

2011-01-10 17:04:51,593 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(x.x.x.x1:50010,
storageID=DS-1353352923-127.0.0.1-50010-1293566862209, infoPort=50075,
ipcPort=50020):Got exception while serving
blk_-6095972096875566659_1323729 to /x.x.x.x8:
java.io.IOException: Block blk_-6095972096875566659_1323729 is not valid.
	at org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:931)
	at org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:894)
	at org.apache.hadoop.hdfs.server.datanode.FSDataset.getVisibleLength(FSDataset.java:904)
	at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:94)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
	at java.lang.Thread.run(Unknown Source)

2011-01-10 17:04:51,593 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(x.x.x.x1:50010,
storageID=DS-1353352923-127.0.0.1-50010-1293566862209, infoPort=50075,
ipcPort=50020):DataXceiver
java.io.IOException: Block blk_-6095972096875566659_1323729 is not valid.
	at org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:931)
	at org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:894)
	at org.apache.hadoop.hdfs.server.datanode.FSDataset.getVisibleLength(FSDataset.java:904)
	at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:94)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
	at java.lang.Thread.run(Unknown Source)


I see that the block in question was deleted several hours earlier in the
logs on node 8.

2011-01-10 05:32:04,500 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block
blk_-6095972096875566659_1323729 file
/mainraid/hadoop/hbase/data/current/subdir2/subdir15/blk_-6095972096875566659



Thanks.

Re: HDFS Errors Deleting Blocks

Posted by Wayne <wa...@gmail.com>.
We are rebuilding our cluster and starting from scratch. We are trying to
set up every layer correctly (removing raid, moving to ext4 + noatime etc. )
to try to get rid of the large number of java.io.IOExceptions we see per day
on every node (deletes being part of the problem). The hope is that it is
the root cause of some/most of the issues we have been having. Once that has
been done I will look out for this problem again.

Thanks for your help.


On Mon, Jan 10, 2011 at 2:55 PM, Stack <st...@duboce.net> wrote:

> Our mailling list strips attachments.  Want to pastebin it and what
> version of hbase are you on?
> Thanks Wayne,
> St.Ack
>
> On Mon, Jan 10, 2011 at 11:34 AM, Wayne <wa...@gmail.com> wrote:
> > Yes it appears blocks are being referenced that have been deleted hours
> > before. The logs below are based on tracing that history.
> >
> > On Mon, Jan 10, 2011 at 2:31 PM, Stack <st...@duboce.net> wrote:
> >
> >> Grep the blocks in namenode log.  This will associate them to files.
> >> The file association should help you build a history.  Is hbase trying
> >> to use files its already deleted elsewhere?
> >>
> >> St.Ack
> >>
> >> On Mon, Jan 10, 2011 at 9:20 AM, Wayne <wa...@gmail.com> wrote:
> >> > We are seeing a lot of warnings and errors in the HDFS logs (examples
> >> > below). I am looking for any help or recommendations anyone can
> provide.
> >> It
> >> > almost looks like compaction/splits occur and errors occur while
> looking
> >> for
> >> > the old data. Could this be true? Are these warnings and errors
> "normal"?
> >> >
> >> > 2011-01-10 16:55:48,920 WARN
> >> > org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing
> >> > datanode Command java.io.IOException: Error in deleting blocks.
> >> >
> >> >
> >> >
> >> > 2011-01-10 05:33:25,353 WARN
> >> > org.apache.hadoop.hdfs.server.datanode.DataNode: Unexpected error
> >> > trying to delete block blk_6074091327320053721_1406212. BlockInfo not
> >> > found in volumeMap.
> >> >
> >> > We also see a lot of messages like the one below.
> >> >
> >> > 2011-01-10 17:04:51,593 WARN
> >> > org.apache.hadoop.hdfs.server.datanode.DataNode:
> >> > DatanodeRegistration(x.x.x.x1:50010,
> >> > storageID=DS-1353352923-127.0.0.1-50010-1293566862209, infoPort=50075,
> >> > ipcPort=50020):Got exception while serving
> >> > blk_-6095972096875566659_1323729 to /x.x.x.x8:
> >> > java.io.IOException: Block blk_-6095972096875566659_1323729 is not
> valid.
> >> >        at
> >>
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:931)
> >> >        at
> >>
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:894)
> >> >        at
> >>
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getVisibleLength(FSDataset.java:904)
> >> >        at
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:94)
> >> >        at
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
> >> >        at
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
> >> >        at java.lang.Thread.run(Unknown Source)
> >> >
> >> > 2011-01-10 17:04:51,593 ERROR
> >> > org.apache.hadoop.hdfs.server.datanode.DataNode:
> >> > DatanodeRegistration(x.x.x.x1:50010,
> >> > storageID=DS-1353352923-127.0.0.1-50010-1293566862209, infoPort=50075,
> >> > ipcPort=50020):DataXceiver
> >> > java.io.IOException: Block blk_-6095972096875566659_1323729 is not
> valid.
> >> >        at
> >>
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:931)
> >> >        at
> >>
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:894)
> >> >        at
> >>
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getVisibleLength(FSDataset.java:904)
> >> >        at
> >>
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:94)
> >> >        at
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
> >> >        at
> >>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
> >> >        at java.lang.Thread.run(Unknown Source)
> >> >
> >> >
> >> > I see that the block in question was deleted several hours earlier in
> the
> >> > logs on node 8.
> >> >
> >> > 2011-01-10 05:32:04,500 INFO
> >> > org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block
> >> > blk_-6095972096875566659_1323729 file
> >> >
> >>
> /mainraid/hadoop/hbase/data/current/subdir2/subdir15/blk_-6095972096875566659
> >> >
> >> >
> >> >
> >> > Thanks.
> >> >
> >>
> >
>

Re: HDFS Errors Deleting Blocks

Posted by Stack <st...@duboce.net>.
Our mailling list strips attachments.  Want to pastebin it and what
version of hbase are you on?
Thanks Wayne,
St.Ack

On Mon, Jan 10, 2011 at 11:34 AM, Wayne <wa...@gmail.com> wrote:
> Yes it appears blocks are being referenced that have been deleted hours
> before. The logs below are based on tracing that history.
>
> On Mon, Jan 10, 2011 at 2:31 PM, Stack <st...@duboce.net> wrote:
>
>> Grep the blocks in namenode log.  This will associate them to files.
>> The file association should help you build a history.  Is hbase trying
>> to use files its already deleted elsewhere?
>>
>> St.Ack
>>
>> On Mon, Jan 10, 2011 at 9:20 AM, Wayne <wa...@gmail.com> wrote:
>> > We are seeing a lot of warnings and errors in the HDFS logs (examples
>> > below). I am looking for any help or recommendations anyone can provide.
>> It
>> > almost looks like compaction/splits occur and errors occur while looking
>> for
>> > the old data. Could this be true? Are these warnings and errors "normal"?
>> >
>> > 2011-01-10 16:55:48,920 WARN
>> > org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing
>> > datanode Command java.io.IOException: Error in deleting blocks.
>> >
>> >
>> >
>> > 2011-01-10 05:33:25,353 WARN
>> > org.apache.hadoop.hdfs.server.datanode.DataNode: Unexpected error
>> > trying to delete block blk_6074091327320053721_1406212. BlockInfo not
>> > found in volumeMap.
>> >
>> > We also see a lot of messages like the one below.
>> >
>> > 2011-01-10 17:04:51,593 WARN
>> > org.apache.hadoop.hdfs.server.datanode.DataNode:
>> > DatanodeRegistration(x.x.x.x1:50010,
>> > storageID=DS-1353352923-127.0.0.1-50010-1293566862209, infoPort=50075,
>> > ipcPort=50020):Got exception while serving
>> > blk_-6095972096875566659_1323729 to /x.x.x.x8:
>> > java.io.IOException: Block blk_-6095972096875566659_1323729 is not valid.
>> >        at
>> org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:931)
>> >        at
>> org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:894)
>> >        at
>> org.apache.hadoop.hdfs.server.datanode.FSDataset.getVisibleLength(FSDataset.java:904)
>> >        at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:94)
>> >        at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
>> >        at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>> >        at java.lang.Thread.run(Unknown Source)
>> >
>> > 2011-01-10 17:04:51,593 ERROR
>> > org.apache.hadoop.hdfs.server.datanode.DataNode:
>> > DatanodeRegistration(x.x.x.x1:50010,
>> > storageID=DS-1353352923-127.0.0.1-50010-1293566862209, infoPort=50075,
>> > ipcPort=50020):DataXceiver
>> > java.io.IOException: Block blk_-6095972096875566659_1323729 is not valid.
>> >        at
>> org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:931)
>> >        at
>> org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:894)
>> >        at
>> org.apache.hadoop.hdfs.server.datanode.FSDataset.getVisibleLength(FSDataset.java:904)
>> >        at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:94)
>> >        at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
>> >        at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>> >        at java.lang.Thread.run(Unknown Source)
>> >
>> >
>> > I see that the block in question was deleted several hours earlier in the
>> > logs on node 8.
>> >
>> > 2011-01-10 05:32:04,500 INFO
>> > org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block
>> > blk_-6095972096875566659_1323729 file
>> >
>> /mainraid/hadoop/hbase/data/current/subdir2/subdir15/blk_-6095972096875566659
>> >
>> >
>> >
>> > Thanks.
>> >
>>
>

Re: HDFS Errors Deleting Blocks

Posted by Wayne <wa...@gmail.com>.
Yes it appears blocks are being referenced that have been deleted hours
before. The logs below are based on tracing that history.

On Mon, Jan 10, 2011 at 2:31 PM, Stack <st...@duboce.net> wrote:

> Grep the blocks in namenode log.  This will associate them to files.
> The file association should help you build a history.  Is hbase trying
> to use files its already deleted elsewhere?
>
> St.Ack
>
> On Mon, Jan 10, 2011 at 9:20 AM, Wayne <wa...@gmail.com> wrote:
> > We are seeing a lot of warnings and errors in the HDFS logs (examples
> > below). I am looking for any help or recommendations anyone can provide.
> It
> > almost looks like compaction/splits occur and errors occur while looking
> for
> > the old data. Could this be true? Are these warnings and errors "normal"?
> >
> > 2011-01-10 16:55:48,920 WARN
> > org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing
> > datanode Command java.io.IOException: Error in deleting blocks.
> >
> >
> >
> > 2011-01-10 05:33:25,353 WARN
> > org.apache.hadoop.hdfs.server.datanode.DataNode: Unexpected error
> > trying to delete block blk_6074091327320053721_1406212. BlockInfo not
> > found in volumeMap.
> >
> > We also see a lot of messages like the one below.
> >
> > 2011-01-10 17:04:51,593 WARN
> > org.apache.hadoop.hdfs.server.datanode.DataNode:
> > DatanodeRegistration(x.x.x.x1:50010,
> > storageID=DS-1353352923-127.0.0.1-50010-1293566862209, infoPort=50075,
> > ipcPort=50020):Got exception while serving
> > blk_-6095972096875566659_1323729 to /x.x.x.x8:
> > java.io.IOException: Block blk_-6095972096875566659_1323729 is not valid.
> >        at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:931)
> >        at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:894)
> >        at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getVisibleLength(FSDataset.java:904)
> >        at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:94)
> >        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
> >        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
> >        at java.lang.Thread.run(Unknown Source)
> >
> > 2011-01-10 17:04:51,593 ERROR
> > org.apache.hadoop.hdfs.server.datanode.DataNode:
> > DatanodeRegistration(x.x.x.x1:50010,
> > storageID=DS-1353352923-127.0.0.1-50010-1293566862209, infoPort=50075,
> > ipcPort=50020):DataXceiver
> > java.io.IOException: Block blk_-6095972096875566659_1323729 is not valid.
> >        at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:931)
> >        at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:894)
> >        at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getVisibleLength(FSDataset.java:904)
> >        at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:94)
> >        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
> >        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
> >        at java.lang.Thread.run(Unknown Source)
> >
> >
> > I see that the block in question was deleted several hours earlier in the
> > logs on node 8.
> >
> > 2011-01-10 05:32:04,500 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block
> > blk_-6095972096875566659_1323729 file
> >
> /mainraid/hadoop/hbase/data/current/subdir2/subdir15/blk_-6095972096875566659
> >
> >
> >
> > Thanks.
> >
>

Re: HDFS Errors Deleting Blocks

Posted by Stack <st...@duboce.net>.
Grep the blocks in namenode log.  This will associate them to files.
The file association should help you build a history.  Is hbase trying
to use files its already deleted elsewhere?

St.Ack

On Mon, Jan 10, 2011 at 9:20 AM, Wayne <wa...@gmail.com> wrote:
> We are seeing a lot of warnings and errors in the HDFS logs (examples
> below). I am looking for any help or recommendations anyone can provide. It
> almost looks like compaction/splits occur and errors occur while looking for
> the old data. Could this be true? Are these warnings and errors "normal"?
>
> 2011-01-10 16:55:48,920 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing
> datanode Command java.io.IOException: Error in deleting blocks.
>
>
>
> 2011-01-10 05:33:25,353 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: Unexpected error
> trying to delete block blk_6074091327320053721_1406212. BlockInfo not
> found in volumeMap.
>
> We also see a lot of messages like the one below.
>
> 2011-01-10 17:04:51,593 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(x.x.x.x1:50010,
> storageID=DS-1353352923-127.0.0.1-50010-1293566862209, infoPort=50075,
> ipcPort=50020):Got exception while serving
> blk_-6095972096875566659_1323729 to /x.x.x.x8:
> java.io.IOException: Block blk_-6095972096875566659_1323729 is not valid.
>        at org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:931)
>        at org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:894)
>        at org.apache.hadoop.hdfs.server.datanode.FSDataset.getVisibleLength(FSDataset.java:904)
>        at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:94)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>        at java.lang.Thread.run(Unknown Source)
>
> 2011-01-10 17:04:51,593 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(x.x.x.x1:50010,
> storageID=DS-1353352923-127.0.0.1-50010-1293566862209, infoPort=50075,
> ipcPort=50020):DataXceiver
> java.io.IOException: Block blk_-6095972096875566659_1323729 is not valid.
>        at org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:931)
>        at org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:894)
>        at org.apache.hadoop.hdfs.server.datanode.FSDataset.getVisibleLength(FSDataset.java:904)
>        at org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:94)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>        at java.lang.Thread.run(Unknown Source)
>
>
> I see that the block in question was deleted several hours earlier in the
> logs on node 8.
>
> 2011-01-10 05:32:04,500 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block
> blk_-6095972096875566659_1323729 file
> /mainraid/hadoop/hbase/data/current/subdir2/subdir15/blk_-6095972096875566659
>
>
>
> Thanks.
>