You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Michael Dagaev <mi...@gmail.com> on 2009/03/19 12:06:16 UTC

Yet another Hbase failure

Hi, all

    We are running a small hbase 0.18 cluster. It stopped working now.
The master log contains "java.io.IOException: HStoreScanner failed
construction" errors.
Restart (stop/start-hbase) does not help. Hadoop fsck says that
everything is fine.

Has anybody run into such a problem ?

Thank you for your cooperation,
M.

2009-03-19 10:57:01,306 WARN org.apache.hadoop.hbase.master.BaseScanner:
Scan one META region: {regionname: .META.,,1, startKey: <>, server:
10.251.142.47:60020}
java.io.IOException: HStoreScanner failed construction
	at org.apache.hadoop.hbase.regionserver.StoreFileScanner.<init>(StoreFileScanner.java:70)
	at org.apache.hadoop.hbase.regionserver.HStoreScanner.<init>(HStoreScanner.java:70)
	at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:1916)
	at org.apache.hadoop.hbase.regionserver.HRegion$HScanner.<init>(HRegion.java:1954)
	at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1345)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1175)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:554)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
Caused by: java.io.IOException: Could not obtain block:
blk_-2893342489105927361_2846854
file=/hbase/.META./1028785192/info/mapfiles/3882153324238090640/index
	at org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1462)
	at org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1312)
	at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1417)
	at java.io.DataInputStream.readFully(DataInputStream.java:178)
	at java.io.DataInputStream.readFully(DataInputStream.java:152)
	at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1453)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1431)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1420)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1415)
	at org.apache.hadoop.io.MapFile$Reader.open(MapFile.java:292)
	at org.apache.hadoop.hbase.regionserver.HStoreFile$HbaseMapFile$HbaseReader.<init>(HStoreFile.java:632)
	at org.apache.hadoop.hbase.regionserver.HStoreFile$BloomFilterMapFile$Reader.<init>(HStoreFile.java:714)
	at org.apache.hadoop.hbase.regionserver.HStoreFile.getReader(HStoreFile.java:413)
	at org.apache.hadoop.hbase.regionserver.StoreFileScanner.openReaders(StoreFileScanner.java:96)
	at org.apache.hadoop.hbase.regionserver.StoreFileScanner.<init>(StoreFileScanner.java:67)
	... 11 more

	at sun.reflect.GeneratedConstructorAccessor11.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
	at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:191)
	at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
	at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
	at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
	at org.apache.hadoop.hbase.Chore.run(Chore.java:62)

Re: Yet another Hbase failure

Posted by Andrew Purtell <ap...@apache.org>.
Michael,

But can you claim this is "yet another" HBase failure? Or are
these DFS problems related to running with too small a cluster?
It's been some time so my recollection is hazy, but didn't you
mention you have a cluster of 4 nodes only? 

I found that most of my DFS issues were caused by attempting
to host too much load on too few physical resources, and that
adding nodes to distribute the load solved my problems. 

Running with dfs.datanode.max.xcievers=2048 helped for a while,
but that was an indication that per DataNode load was too high.

Best regards,

   - Andy


> From: Michael Dagaev <mi...@gmail.com>
> Subject: Re: Yet another Hbase failure
> To: hbase-user@hadoop.apache.org
> Date: Friday, March 20, 2009, 7:04 AM
> Hi, stack
> 
> See the hadoop-site.xml in the attachment.
> dfs.datanode.socket.write.timeout = 0,
> dfs.datanode.max.xcievers=1023
> 
> The hbase-site.xml is not interesting.
> It contains only "hbase.rootdir" and
> "hbase.master"
> 
> I checked the logs (I did not know hbase logged ulimit).
> On all region server hosts ulimit is 32768. On the master
> host ulimit is 1024.
> 
> Thank you for your cooperation,
> M.



      

Re: Yet another Hbase failure

Posted by Michael Dagaev <mi...@gmail.com>.
Hi, stack

See the hadoop-site.xml in the attachment.
dfs.datanode.socket.write.timeout = 0, dfs.datanode.max.xcievers=1023

The hbase-site.xml is not interesting.
It contains only "hbase.rootdir" and "hbase.master"

I checked the logs (I did not know hbase logged ulimit).
On all region server hosts ulimit is 32768. On the master host ulimit is 1024.

Thank you for your cooperation,
M.

On Thu, Mar 19, 2009 at 7:52 PM, stack <st...@duboce.net> wrote:
> Can you pull this file from your hdfs?
>
> /hbase/.META./1028785192/info/mapfiles/3882153324238090640/index
>
> Michael, these seem to be issues related to ulimit -n or to xceivers count
> or to dfsclient timeout. �Can you confirm for us that you have indeed have
> correct configurations in place. �Tell us also how you have these
> configurations deployed.
>
> Can you paste the hbase-site.xml, the hadoop-site.xml, and the first few
> lines of your log when it starts up where it shows the jvm and ulimit
> settings?
>
> Thanks,
> St.Ack

Re: Yet another Hbase failure

Posted by stack <st...@duboce.net>.
Can you pull this file from your hdfs?

/hbase/.META./1028785192/info/mapfiles/3882153324238090640/index

Michael, these seem to be issues related to ulimit -n or to xceivers count
or to dfsclient timeout.  Can you confirm for us that you have indeed have
correct configurations in place.  Tell us also how you have these
configurations deployed.

Can you paste the hbase-site.xml, the hadoop-site.xml, and the first few
lines of your log when it starts up where it shows the jvm and ulimit
settings?

Thanks,
St.Ack


On Thu, Mar 19, 2009 at 10:00 AM, Michael Dagaev
<mi...@gmail.com>wrote:

> Yes, unfortunately  it is semi-production data.
> I will try to get on IRC.
>
> I am afraid though it will be difficult to fix the data
> since I won't be able to start and start HDFS/Hbase.
>
> I changed "dfs.data.dir" to point out to another file system and
> HDFS/Hbase are running now.
>
>  I switched the dfs.data.dir to another file system, so Hadoop and
> Hbase are not working now with the   Can we fix the meta region while
> Hadoop and Hbase are working with another
>
> On Thu, Mar 19, 2009 at 2:59 PM, Jean-Daniel Cryans <jd...@apache.org>
> wrote:
> > Michael,
> >
> > This may be related to your other problem. We need to find the root of
> > what seems to be data lost which is probable.
> >
> > Is it production data? If so, we may have to fix your meta region. Any
> > chance you drop on IRC?
> >
> > J-D
> >
> > On Thu, Mar 19, 2009 at 7:06 AM, Michael Dagaev
> > <mi...@gmail.com> wrote:
> >> Hi, all
> >>
> >>    We are running a small hbase 0.18 cluster. It stopped working now.
> >> The master log contains "java.io.IOException: HStoreScanner failed
> >> construction" errors.
> >> Restart (stop/start-hbase) does not help. Hadoop fsck says that
> >> everything is fine.
> >>
> >> Has anybody run into such a problem ?
> >>
> >> Thank you for your cooperation,
> >> M.
> >>
> >> 2009-03-19 10:57:01,306 WARN org.apache.hadoop.hbase.master.BaseScanner:
> >> Scan one META region: {regionname: .META.,,1, startKey: <>, server:
> >> 10.251.142.47:60020}
> >> java.io.IOException: HStoreScanner failed construction
> >>        at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.<init>(StoreFileScanner.java:70)
> >>        at
> org.apache.hadoop.hbase.regionserver.HStoreScanner.<init>(HStoreScanner.java:70)
> >>        at
> org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:1916)
> >>        at
> org.apache.hadoop.hbase.regionserver.HRegion$HScanner.<init>(HRegion.java:1954)
> >>        at
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1345)
> >>        at
> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1175)
> >>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>        at java.lang.reflect.Method.invoke(Method.java:597)
> >>        at
> org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:554)
> >>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
> >> Caused by: java.io.IOException: Could not obtain block:
> >> blk_-2893342489105927361_2846854
> >> file=/hbase/.META./1028785192/info/mapfiles/3882153324238090640/index
> >>        at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1462)
> >>        at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1312)
> >>        at
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1417)
> >>        at java.io.DataInputStream.readFully(DataInputStream.java:178)
> >>        at java.io.DataInputStream.readFully(DataInputStream.java:152)
> >>        at
> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1453)
> >>        at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1431)
> >>        at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1420)
> >>        at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1415)
> >>        at org.apache.hadoop.io.MapFile$Reader.open(MapFile.java:292)
> >>        at
> org.apache.hadoop.hbase.regionserver.HStoreFile$HbaseMapFile$HbaseReader.<init>(HStoreFile.java:632)
> >>        at
> org.apache.hadoop.hbase.regionserver.HStoreFile$BloomFilterMapFile$Reader.<init>(HStoreFile.java:714)
> >>        at
> org.apache.hadoop.hbase.regionserver.HStoreFile.getReader(HStoreFile.java:413)
> >>        at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.openReaders(StoreFileScanner.java:96)
> >>        at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.<init>(StoreFileScanner.java:67)
> >>        ... 11 more
> >>
> >>        at sun.reflect.GeneratedConstructorAccessor11.newInstance(Unknown
> Source)
> >>        at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> >>        at
> java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> >>        at
> org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
> >>        at
> org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:191)
> >>        at
> org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
> >>        at
> org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
> >>        at
> org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
> >>        at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
> >>
> >
>

Re: Yet another Hbase failure

Posted by Michael Dagaev <mi...@gmail.com>.
Yes, unfortunately  it is semi-production data.
I will try to get on IRC.

I am afraid though it will be difficult to fix the data
since I won't be able to start and start HDFS/Hbase.

I changed "dfs.data.dir" to point out to another file system and
HDFS/Hbase are running now.

 I switched the dfs.data.dir to another file system, so Hadoop and
Hbase are not working now with the   Can we fix the meta region while
Hadoop and Hbase are working with another

On Thu, Mar 19, 2009 at 2:59 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
> Michael,
>
> This may be related to your other problem. We need to find the root of
> what seems to be data lost which is probable.
>
> Is it production data? If so, we may have to fix your meta region. Any
> chance you drop on IRC?
>
> J-D
>
> On Thu, Mar 19, 2009 at 7:06 AM, Michael Dagaev
> <mi...@gmail.com> wrote:
>> Hi, all
>>
>>    We are running a small hbase 0.18 cluster. It stopped working now.
>> The master log contains "java.io.IOException: HStoreScanner failed
>> construction" errors.
>> Restart (stop/start-hbase) does not help. Hadoop fsck says that
>> everything is fine.
>>
>> Has anybody run into such a problem ?
>>
>> Thank you for your cooperation,
>> M.
>>
>> 2009-03-19 10:57:01,306 WARN org.apache.hadoop.hbase.master.BaseScanner:
>> Scan one META region: {regionname: .META.,,1, startKey: <>, server:
>> 10.251.142.47:60020}
>> java.io.IOException: HStoreScanner failed construction
>>        at org.apache.hadoop.hbase.regionserver.StoreFileScanner.<init>(StoreFileScanner.java:70)
>>        at org.apache.hadoop.hbase.regionserver.HStoreScanner.<init>(HStoreScanner.java:70)
>>        at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:1916)
>>        at org.apache.hadoop.hbase.regionserver.HRegion$HScanner.<init>(HRegion.java:1954)
>>        at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1345)
>>        at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1175)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:554)
>>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
>> Caused by: java.io.IOException: Could not obtain block:
>> blk_-2893342489105927361_2846854
>> file=/hbase/.META./1028785192/info/mapfiles/3882153324238090640/index
>>        at org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1462)
>>        at org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1312)
>>        at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1417)
>>        at java.io.DataInputStream.readFully(DataInputStream.java:178)
>>        at java.io.DataInputStream.readFully(DataInputStream.java:152)
>>        at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1453)
>>        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1431)
>>        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1420)
>>        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1415)
>>        at org.apache.hadoop.io.MapFile$Reader.open(MapFile.java:292)
>>        at org.apache.hadoop.hbase.regionserver.HStoreFile$HbaseMapFile$HbaseReader.<init>(HStoreFile.java:632)
>>        at org.apache.hadoop.hbase.regionserver.HStoreFile$BloomFilterMapFile$Reader.<init>(HStoreFile.java:714)
>>        at org.apache.hadoop.hbase.regionserver.HStoreFile.getReader(HStoreFile.java:413)
>>        at org.apache.hadoop.hbase.regionserver.StoreFileScanner.openReaders(StoreFileScanner.java:96)
>>        at org.apache.hadoop.hbase.regionserver.StoreFileScanner.<init>(StoreFileScanner.java:67)
>>        ... 11 more
>>
>>        at sun.reflect.GeneratedConstructorAccessor11.newInstance(Unknown Source)
>>        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>>        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>>        at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
>>        at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:191)
>>        at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
>>        at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>>        at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
>>        at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
>>
>

Re: Yet another Hbase failure

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Michael,

This may be related to your other problem. We need to find the root of
what seems to be data lost which is probable.

Is it production data? If so, we may have to fix your meta region. Any
chance you drop on IRC?

J-D

On Thu, Mar 19, 2009 at 7:06 AM, Michael Dagaev
<mi...@gmail.com> wrote:
> Hi, all
>
>    We are running a small hbase 0.18 cluster. It stopped working now.
> The master log contains "java.io.IOException: HStoreScanner failed
> construction" errors.
> Restart (stop/start-hbase) does not help. Hadoop fsck says that
> everything is fine.
>
> Has anybody run into such a problem ?
>
> Thank you for your cooperation,
> M.
>
> 2009-03-19 10:57:01,306 WARN org.apache.hadoop.hbase.master.BaseScanner:
> Scan one META region: {regionname: .META.,,1, startKey: <>, server:
> 10.251.142.47:60020}
> java.io.IOException: HStoreScanner failed construction
>        at org.apache.hadoop.hbase.regionserver.StoreFileScanner.<init>(StoreFileScanner.java:70)
>        at org.apache.hadoop.hbase.regionserver.HStoreScanner.<init>(HStoreScanner.java:70)
>        at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:1916)
>        at org.apache.hadoop.hbase.regionserver.HRegion$HScanner.<init>(HRegion.java:1954)
>        at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1345)
>        at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1175)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:554)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
> Caused by: java.io.IOException: Could not obtain block:
> blk_-2893342489105927361_2846854
> file=/hbase/.META./1028785192/info/mapfiles/3882153324238090640/index
>        at org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1462)
>        at org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1312)
>        at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1417)
>        at java.io.DataInputStream.readFully(DataInputStream.java:178)
>        at java.io.DataInputStream.readFully(DataInputStream.java:152)
>        at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1453)
>        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1431)
>        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1420)
>        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1415)
>        at org.apache.hadoop.io.MapFile$Reader.open(MapFile.java:292)
>        at org.apache.hadoop.hbase.regionserver.HStoreFile$HbaseMapFile$HbaseReader.<init>(HStoreFile.java:632)
>        at org.apache.hadoop.hbase.regionserver.HStoreFile$BloomFilterMapFile$Reader.<init>(HStoreFile.java:714)
>        at org.apache.hadoop.hbase.regionserver.HStoreFile.getReader(HStoreFile.java:413)
>        at org.apache.hadoop.hbase.regionserver.StoreFileScanner.openReaders(StoreFileScanner.java:96)
>        at org.apache.hadoop.hbase.regionserver.StoreFileScanner.<init>(StoreFileScanner.java:67)
>        ... 11 more
>
>        at sun.reflect.GeneratedConstructorAccessor11.newInstance(Unknown Source)
>        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>        at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
>        at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:191)
>        at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:74)
>        at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>        at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
>        at org.apache.hadoop.hbase.Chore.run(Chore.java:62)
>