You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Steve Boyle <st...@connexity.com> on 2011/12/12 23:57:33 UTC

regions get 'stuck' in transition

Hi,

I'm running hbase-0.90.4-cdh3u2. We've been having an issue for the last week or so when the balancer runs we get some regions that get stuck in transition.  I see this in the logs:

11/12/12 14:17:30 INFO master.LoadBalancer: Calculated a load balance in 14ms. Moving 49 regions off of 4 overloaded servers onto 1 less loaded servers
...
11/12/12 14:17:30 INFO master.HMaster: balance hri=PROD_trans,45ee86490,1323419692487.bc97561f48b131788949bfa65e409621., src=hb1.prod1.connexity.net,60020,1322805540274, dest=hb4.prod1.connexity.net,60020,1323724708437
11/12/12 14:17:30 INFO master.AssignmentManager: Server serverName=hb1.prod1.connexity.net,60020,1322805540274, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) returned org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Received close for PROD_trans,45ee86490,1323419692487.bc97561f48b131788949bfa65e409621. but we are not serving it for bc97561f48b131788949bfa65e409621
...
11/12/12 14:17:31 INFO master.HMaster: balance hri=PROD_trans,8fe437b80,1323434014530.1e000f925b4dfab14b39f43119e39b99., src=hb1.prod1.connexity.net,60020,1322805540274, dest=hb4.prod1.connexity.net,60020,1323724708437
11/12/12 14:17:31 INFO master.AssignmentManager: Server serverName=hb1.prod1.connexity.net,60020,1322805540274, load=(requests=0, regions=0, usedHeap=0, maxHeap=0) returned org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: Received close for PROD_trans,8fe437b80,1323434014530.1e000f925b4dfab14b39f43119e39b99. but we are not serving it for 1e000f925b4dfab14b39f43119e39b99

At that point I can go into the hbase shell and I can force unassign the region to get things cleared up, it never seems to get out of this state without manual intervention.  Seems like the HMaster (or zookeeper ?) and the RegionServer are out of sync with where the regions are currently located.  The question is what is causes this problem?

Thanks,
Steve Boyle