You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Henning Blohm <he...@zfabrik.de> on 2016/05/10 16:22:46 UTC

Region stuck in transition - Cannot repair

While running with dfs.client.read.shortcircuit set to true I ran into 
an OOM on a region server that subsequently died.

Probably this was due to too little direct memory config.

However, after bringing the cluster up again one region of a table got 
stuck in transtion. More specifically the master says:

---
6400e1626085724ae20b2a6fa1914db8tt_locks,,1461919149434.6400e1626085724ae20b2a6fa1914db8. 
state=FAILED_CLOSE, ts=Tue May 10 17:58:29 CEST 2016 (0s ago), 
server=hb-desktop,16201,1462895637261
---

Running hbase hbck

I get:

---
ERROR: Region { meta => 
tt_locks,,1461919149434.6400e1626085724ae20b2a6fa1914db8., hdfs => 
hdfs://localhost:9000/hbase/data/default/tt_locks/6400e1626085724ae20b2a6fa1914db8, 
deployed => , replicaId => 0 } not deployed on any region server.
ERROR: There is a hole in the region chain between  and .  You need to 
create a new .regioninfo and region dir in hdfs to plug the hole.
---

But the all tables are listed as "ok".

Any attempt to repair seems to have no effect. Worse, the region server 
is trying like crazy to get that region opened and runs into an OOM 
after a few minutes.

(It keeps saying "Started memstore flush for..."  but never seems to get 
anywhere).

There is very little load really: 76 regions, 212 store files and I 
allowed for 1.5G heap and 1.5G direct memory.

After disabling dfs.client.read.shortcircuit at least there is no OOM 
anymore.

I have the vague suspicion that that stupid region should be simply 
dropped, but I have no idea how to fix this.

As we will go into production with this system shortly, any help would 
be great!!

Thanks,
Henning