You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Michal Medvecky <me...@pexe.so> on 2016/04/20 05:29:43 UTC

Failed open of region

Hello,

after several network outages in AWS (never ever run HBase there!), my
HBase was seriously damaged. After doing some steps like restarting
namenodes, hdfs fsck, restarting all regionservers and hbase master, i'm
still having 8 offline regions I am unable to start.

When running hbck with any combination of repair parameters, it's always
stuck on messages like:

2016-04-20 03:26:16,812 INFO  [hbasefsck-pool1-t45]
util.HBaseFsckRepair: *Region
still in transition, waiting for it to become assigned*: {ENCODED =>
8fe9d66a1f4c4739dd1929e3c38bf951, NAME =>
'MEDIA,\x01rvkUDKIuye0\x00YT,1460997677820.8fe9d66a1f4c4739dd1929e3c38bf951.',
STARTKEY => '\x01rvkUDKIuye0\x00YT', ENDKEY =>
'\x01stefanonoferini/club-edition-17'}

when looking into regionserver logs, I see messages like:

2016-04-19 23:27:54,969 ERROR
[RS_OPEN_REGION-prod-aws-hbase-data-0010:16020-80]
handler.OpenRegionHandler: Failed open of region=MEDIA,\x05JEklcNpOKos\
x00YT,1461001150488.20d48fd40c94c7c81049cbc506de4ad4., starting to roll
back the global memstore size.
java.io.IOException: java.io.IOException: java.io.FileNotFoundException: *File
does not exist: /hbase/data/default/MEDIA/ecd1e565ab8a8bfba77cab46ed023539*
/F/5eacfeb8a2eb419cb6fe348df0540145
        at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
        at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
        at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:587)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB
.java:365)
        at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.j
ava)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
2016-04-19 23:27:54,957 INFO
 [StoreOpener-20d48fd40c94c7c81049cbc506de4ad4-1] hfile.CacheConfig:
blockCache=LruBlockCache{blockCount=2, currentSize=328
5448, freeSize=3198122040, maxSize=3201407488, heapSize=3285448,
minSize=3041337088, minFactor=0.95, multiSize=1520668544, multiFactor=0.5,
singleSize=7
60334272, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false,
cacheIndexesOnWrite=false, cacheBloomsOnWrite=false, cacheEvictOnClose=false
, cacheDataCompressed=false, prefetchOnOpen=false
2016-04-19 23:27:54,957 INFO
 [StoreOpener-20d48fd40c94c7c81049cbc506de4ad4-1]
compactions.CompactionConfiguration: size [134217728, 9223372036854775807
); files [3, 10); ratio 1.200000; off-peak ratio 5.000000; throttle point
2684354560; major period 604800000, major jitter 0.500000, min locality to
com
pact 0.700000
2016-04-19 23:27:54,962 INFO  [StoreFileOpenerThread-F-1]
regionserver.StoreFile$Reader: Loaded Delete Family Bloom
(CompoundBloomFilter) metadata for 5
eacfeb8a2eb419cb6fe348df0540145
2016-04-19 23:27:54,969 ERROR
[RS_OPEN_REGION-prod-aws-hbase-data-0010:16020-80] regionserver.HRegion:
Could not initialize all stores for the region=ME
DIA,\x05JEklcNpOKos\x00YT,1461001150488.20d48fd40c94c7c81049cbc506de4ad4.
2016-04-19 23:27:54,969 WARN
 [StoreOpener-20d48fd40c94c7c81049cbc506de4ad4-1] ipc.Client: interrupted
waiting to send rpc request to server
java.lang.InterruptedException
        at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
        at java.util.concurrent.FutureTask.get(FutureTask.java:191)
        at
org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1054)
        at org.apache.hadoop.ipc.Client.call(Client.java:1449)
        at org.apache.hadoop.ipc.Client.call(Client.java:1407)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
        at com.sun.proxy.$Proxy16.getFileInfo(Unknown Source)
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
        at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy17.getFileInfo(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at
org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:279)
        at com.sun.proxy.$Proxy18.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
        at
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
        at
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
        at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
        at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)
        at
org.apache.hadoop.hbase.regionserver.HRegionFileSystem.createStoreDir(HRegionFileSystem.java:171)
        at
org.apache.hadoop.hbase.regionserver.HStore.<init>(HStore.java:220)
        at
org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:4973)
        at
org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:925)
        at
org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:922)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

I did all kinds of recovery magic, like restarting all components or
cleaning ZK.

I found this thread:
http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/31308 that
supposes to create empty hfiles, but I'm a bit afraid to do this.

I'm using hbase 1.1.3 with hadoop 2.7.1, (both binary-downloaded from their
websites) on ubuntu 14.04.

Thank you for any help

Michal

Re: Failed open of region

Posted by Stack <st...@duboce.net>.
On Wed, Apr 20, 2016 at 2:11 PM, Michal Medvecky <me...@pexe.so> wrote:

> >
> > NameNode is telling hbase they exist. Can you get some DEBUG in there?
> You
> > know how to set log levels? Could give us a clue.
> >
> >
> I finally removed some lingering reference files and HBCK managed to fix
> the issue.
>
>
You have more on your ugly issue Michal? It was hbase references to missing
files?
Thanks,
St.Ack




> Michal
>

Re: Failed open of region

Posted by Michal Medvecky <me...@pexe.so>.
>
> NameNode is telling hbase they exist. Can you get some DEBUG in there? You
> know how to set log levels? Could give us a clue.
>
>
I finally removed some lingering reference files and HBCK managed to fix
the issue.

Michal

Re: Failed open of region

Posted by Stack <st...@duboce.net>.
On Tue, Apr 19, 2016 at 9:09 PM, Michal Medvecky <me...@pexe.so> wrote:

> On Tue, Apr 19, 2016 at 9:02 PM, Stack <st...@duboce.net> wrote:
>
> > What happens if you try to copy /hbase/data/default/MEDIA/
> > ecd1e565ab8a8bfba77cab46ed023539*
> > /F/5eacfeb8a2eb419cb6fe348df0540145 to local filesystem from HDFS (hdfs
> dfs
> > -copyLocal... or whatever it is called)?
> >
>
> Nothing, these files (and even those directories) do not exist.
>
>
NameNode is telling hbase they exist. Can you get some DEBUG in there? You
know how to set log levels? Could give us a clue.
Thanks,
St.Ack



> Your AWS giving you grief?
> >
>
> AWS is driving me crazy, but that's not for this mailinglist.
>
> Michal
>

Re: Failed open of region

Posted by Michal Medvecky <me...@pexe.so>.
On Tue, Apr 19, 2016 at 9:02 PM, Stack <st...@duboce.net> wrote:

> If you run hdfs fsck it shows missing blocks?
>

No, HDFS reports healthy filesystem.


> What happens if you try to copy /hbase/data/default/MEDIA/
> ecd1e565ab8a8bfba77cab46ed023539*
> /F/5eacfeb8a2eb419cb6fe348df0540145 to local filesystem from HDFS (hdfs dfs
> -copyLocal... or whatever it is called)?
>

Nothing, these files (and even those directories) do not exist.

Your AWS giving you grief?
>

AWS is driving me crazy, but that's not for this mailinglist.

Michal

Re: Failed open of region

Posted by Stack <st...@duboce.net>.
If you run hdfs fsck it shows missing blocks?

What happens if you try to copy /hbase/data/default/MEDIA/
ecd1e565ab8a8bfba77cab46ed023539*
/F/5eacfeb8a2eb419cb6fe348df0540145 to local filesystem from HDFS (hdfs dfs
-copyLocal... or whatever it is called)?

Try moving aside the problematic files?

Your AWS giving you grief?
St.Ack

On Tue, Apr 19, 2016 at 8:29 PM, Michal Medvecky <me...@pexe.so> wrote:

> Hello,
>
> after several network outages in AWS (never ever run HBase there!), my
> HBase was seriously damaged. After doing some steps like restarting
> namenodes, hdfs fsck, restarting all regionservers and hbase master, i'm
> still having 8 offline regions I am unable to start.
>
> When running hbck with any combination of repair parameters, it's always
> stuck on messages like:
>
> 2016-04-20 03:26:16,812 INFO  [hbasefsck-pool1-t45]
> util.HBaseFsckRepair: *Region
> still in transition, waiting for it to become assigned*: {ENCODED =>
> 8fe9d66a1f4c4739dd1929e3c38bf951, NAME =>
>
> 'MEDIA,\x01rvkUDKIuye0\x00YT,1460997677820.8fe9d66a1f4c4739dd1929e3c38bf951.',
> STARTKEY => '\x01rvkUDKIuye0\x00YT', ENDKEY =>
> '\x01stefanonoferini/club-edition-17'}
>
> when looking into regionserver logs, I see messages like:
>
> 2016-04-19 23:27:54,969 ERROR
> [RS_OPEN_REGION-prod-aws-hbase-data-0010:16020-80]
> handler.OpenRegionHandler: Failed open of region=MEDIA,\x05JEklcNpOKos\
> x00YT,1461001150488.20d48fd40c94c7c81049cbc506de4ad4., starting to roll
> back the global memstore size.
> java.io.IOException: java.io.IOException: java.io.FileNotFoundException:
> *File
> does not exist: /hbase/data/default/MEDIA/ecd1e565ab8a8bfba77cab46ed023539*
> /F/5eacfeb8a2eb419cb6fe348df0540145
>         at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
>         at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
>         at
>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828)
>         at
>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
>         at
>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
>         at
>
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:587)
>         at
>
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB
> .java:365)
>         at
>
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.j
> ava)
>         at
>
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 2016-04-19 23:27:54,957 INFO
>  [StoreOpener-20d48fd40c94c7c81049cbc506de4ad4-1] hfile.CacheConfig:
> blockCache=LruBlockCache{blockCount=2, currentSize=328
> 5448, freeSize=3198122040, maxSize=3201407488, heapSize=3285448,
> minSize=3041337088, minFactor=0.95, multiSize=1520668544, multiFactor=0.5,
> singleSize=7
> 60334272, singleFactor=0.25}, cacheDataOnRead=true, cacheDataOnWrite=false,
> cacheIndexesOnWrite=false, cacheBloomsOnWrite=false,
> cacheEvictOnClose=false
> , cacheDataCompressed=false, prefetchOnOpen=false
> 2016-04-19 23:27:54,957 INFO
>  [StoreOpener-20d48fd40c94c7c81049cbc506de4ad4-1]
> compactions.CompactionConfiguration: size [134217728, 9223372036854775807
> ); files [3, 10); ratio 1.200000; off-peak ratio 5.000000; throttle point
> 2684354560; major period 604800000, major jitter 0.500000, min locality to
> com
> pact 0.700000
> 2016-04-19 23:27:54,962 INFO  [StoreFileOpenerThread-F-1]
> regionserver.StoreFile$Reader: Loaded Delete Family Bloom
> (CompoundBloomFilter) metadata for 5
> eacfeb8a2eb419cb6fe348df0540145
> 2016-04-19 23:27:54,969 ERROR
> [RS_OPEN_REGION-prod-aws-hbase-data-0010:16020-80] regionserver.HRegion:
> Could not initialize all stores for the region=ME
> DIA,\x05JEklcNpOKos\x00YT,1461001150488.20d48fd40c94c7c81049cbc506de4ad4.
> 2016-04-19 23:27:54,969 WARN
>  [StoreOpener-20d48fd40c94c7c81049cbc506de4ad4-1] ipc.Client: interrupted
> waiting to send rpc request to server
> java.lang.InterruptedException
>         at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:191)
>         at
> org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1054)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1449)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1407)
>         at
>
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>         at com.sun.proxy.$Proxy16.getFileInfo(Unknown Source)
>         at
>
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
>         at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
>         at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at
>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>         at
>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>         at com.sun.proxy.$Proxy17.getFileInfo(Unknown Source)
>         at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
>         at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at
> org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:279)
>         at com.sun.proxy.$Proxy18.getFileInfo(Unknown Source)
>         at
> org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
>         at
>
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
>         at
>
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
>         at
>
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at
>
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
>         at
>
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)
>         at
>
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem.createStoreDir(HRegionFileSystem.java:171)
>         at
> org.apache.hadoop.hbase.regionserver.HStore.<init>(HStore.java:220)
>         at
>
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:4973)
>         at
> org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:925)
>         at
> org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:922)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>
> I did all kinds of recovery magic, like restarting all components or
> cleaning ZK.
>
> I found this thread:
> http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/31308 that
> supposes to create empty hfiles, but I'm a bit afraid to do this.
>
> I'm using hbase 1.1.3 with hadoop 2.7.1, (both binary-downloaded from their
> websites) on ubuntu 14.04.
>
> Thank you for any help
>
> Michal
>