You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Henning Blohm <he...@zfabrik.de> on 2016/05/10 16:22:46 UTC
Region stuck in transition - Cannot repair
While running with dfs.client.read.shortcircuit set to true I ran into
an OOM on a region server that subsequently died.
Probably this was due to too little direct memory config.
However, after bringing the cluster up again one region of a table got
stuck in transtion. More specifically the master says:
---
6400e1626085724ae20b2a6fa1914db8tt_locks,,1461919149434.6400e1626085724ae20b2a6fa1914db8.
state=FAILED_CLOSE, ts=Tue May 10 17:58:29 CEST 2016 (0s ago),
server=hb-desktop,16201,1462895637261
---
Running hbase hbck
I get:
---
ERROR: Region { meta =>
tt_locks,,1461919149434.6400e1626085724ae20b2a6fa1914db8., hdfs =>
hdfs://localhost:9000/hbase/data/default/tt_locks/6400e1626085724ae20b2a6fa1914db8,
deployed => , replicaId => 0 } not deployed on any region server.
ERROR: There is a hole in the region chain between and . You need to
create a new .regioninfo and region dir in hdfs to plug the hole.
---
But the all tables are listed as "ok".
Any attempt to repair seems to have no effect. Worse, the region server
is trying like crazy to get that region opened and runs into an OOM
after a few minutes.
(It keeps saying "Started memstore flush for..." but never seems to get
anywhere).
There is very little load really: 76 regions, 212 store files and I
allowed for 1.5G heap and 1.5G direct memory.
After disabling dfs.client.read.shortcircuit at least there is no OOM
anymore.
I have the vague suspicion that that stupid region should be simply
dropped, but I have no idea how to fix this.
As we will go into production with this system shortly, any help would
be great!!
Thanks,
Henning