You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Prakash Khemani (JIRA)" <ji...@apache.org> on 2011/04/29 01:32:03 UTC

[jira] [Resolved] (HBASE-3822) region server stuck in waitOnAllRegionsToClose

     [ https://issues.apache.org/jira/browse/HBASE-3822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prakash Khemani resolved HBASE-3822.
------------------------------------

      Resolution: Invalid
    Release Note: The description is invalid. Will open a new one.

> region server stuck in waitOnAllRegionsToClose
> ----------------------------------------------
>
>                 Key: HBASE-3822
>                 URL: https://issues.apache.org/jira/browse/HBASE-3822
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Prakash Khemani
>
> The regionserver is not able to exit because the rs thread is stuck here
> "regionserver60020" prio=10 tid=0x00002ab2b039e000 nid=0x760a waiting on condition [0x000000004365e000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.hbase.util.Threads.sleep(Threads.java:126)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.waitOnAllRegionsToClose(HRegionServer.java:736)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:689)
>         at java.lang.Thread.run(Thread.java:619)
> ===
> In CloseRegionHandler.process() we do not call removeFromOnlineRegions() if there is an exception. (In this case I suspect there was a log-rolling exception because of another issue)
>     // Close the region
>     try {
>       // TODO: If we need to keep updating CLOSING stamp to prevent against
>       // a timeout if this is long-running, need to spin up a thread?
>       if (region.close(abort) == null) {
>         // This region got closed.  Most likely due to a split. So instead
>         // of doing the setClosedState() below, let's just ignore and continue.
>         // The split message will clean up the master state.
>         LOG.warn("Can't close region: was already closed during close(): " +
>           regionInfo.getRegionNameAsString());
>         return;
>       }
>     } catch (IOException e) {
>       LOG.error("Unrecoverable exception while closing region " +
>         regionInfo.getRegionNameAsString() + ", still finishing close", e);
>     }
>     this.rsServices.removeFromOnlineRegions(regionInfo.getEncodedName());
> ===
> I think we set the closing flag on the region, it won't be taking any more requests, it is as good as offline.
> Either we should refine the check in waitOnAllRegionsToClose() or CloseRegionHandler.process() should remove the region from online-regions set.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira