You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2011/08/30 01:19:37 UTC

[jira] [Created] (HBASE-4287) If region opening fails, try to transition region back to "offline" in ZK

If region opening fails, try to transition region back to "offline" in ZK
-------------------------------------------------------------------------

                 Key: HBASE-4287
                 URL: https://issues.apache.org/jira/browse/HBASE-4287
             Project: HBase
          Issue Type: Improvement
          Components: master, regionserver
    Affects Versions: 0.90.4
            Reporter: Todd Lipcon
            Assignee: Todd Lipcon
             Fix For: 0.94.0


In the case that region-opening fails, we currently just close the region again, but don't do anything to the node in ZK. Instead, we should attempt to transition it from the OPENING state back to an OFFLINE state, or perhaps a new FAILED_OPEN state. Otherwise, we have to wait for the full timeoutMonitor period to elapse, which can be quite a long time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4287) If region opening fails, try to transition region back to "offline" in ZK

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-4287:
-------------------------

    Fix Version/s:     (was: 0.94.0)
                   0.92.0

Pulling into 0.92.  Patch has two +1s.  Any reason not to have it in 0.92?

> If region opening fails, try to transition region back to "offline" in ZK
> -------------------------------------------------------------------------
>
>                 Key: HBASE-4287
>                 URL: https://issues.apache.org/jira/browse/HBASE-4287
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.92.0
>
>         Attachments: 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK-v2.patch, 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK.patch, hbase-4287.txt
>
>
> In the case that region-opening fails, we currently just close the region again, but don't do anything to the node in ZK. Instead, we should attempt to transition it from the OPENING state back to an OFFLINE state, or perhaps a new FAILED_OPEN state. Otherwise, we have to wait for the full timeoutMonitor period to elapse, which can be quite a long time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4287) If region opening fails, try to transition region back to "offline" in ZK

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HBASE-4287:
-------------------------------

    Attachment: 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK-v2.patch

Previous patch was more of a 90-branch change. This one applies/compiles in trunk (or should, at least)

> If region opening fails, try to transition region back to "offline" in ZK
> -------------------------------------------------------------------------
>
>                 Key: HBASE-4287
>                 URL: https://issues.apache.org/jira/browse/HBASE-4287
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.94.0
>
>         Attachments: 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK-v2.patch, 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK.patch
>
>
> In the case that region-opening fails, we currently just close the region again, but don't do anything to the node in ZK. Instead, we should attempt to transition it from the OPENING state back to an OFFLINE state, or perhaps a new FAILED_OPEN state. Otherwise, we have to wait for the full timeoutMonitor period to elapse, which can be quite a long time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4287) If region opening fails, try to transition region back to "offline" in ZK

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093557#comment-13093557 ] 

ramkrishna.s.vasudevan commented on HBASE-4287:
-----------------------------------------------

@Todd
{code}
region = openRegion();
if (region == null) return;
{code}
May be here we can update the zk node state. Otherwise it will be in opening state and only the timeout needs to detect it.Correct me if am wrong Todd?



> If region opening fails, try to transition region back to "offline" in ZK
> -------------------------------------------------------------------------
>
>                 Key: HBASE-4287
>                 URL: https://issues.apache.org/jira/browse/HBASE-4287
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.94.0
>
>         Attachments: 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK-v2.patch, 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK.patch
>
>
> In the case that region-opening fails, we currently just close the region again, but don't do anything to the node in ZK. Instead, we should attempt to transition it from the OPENING state back to an OFFLINE state, or perhaps a new FAILED_OPEN state. Otherwise, we have to wait for the full timeoutMonitor period to elapse, which can be quite a long time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4287) If region opening fails, try to transition region back to "offline" in ZK

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093463#comment-13093463 ] 

Todd Lipcon commented on HBASE-4287:
------------------------------------

bq. Why a new state? Don't we have enough to manage already?

My personal preference is that each state is only used for one specific thing -- this makes it easier to understand when looking at logs. We'll see that it transitioned from OPENING to FAILED_OPEN, rather than OPENING to OFFLINE or CLOSED -- makes it more obvious what actually happened. In the future we could also include some more data in there (eg, the exception message)

bq. Because here also there a chance for failure due to some zk exception.

If it fails due to ZK exception, then the options are (a) this RS lost its ZK lease, in which case it will be handled by the server shutdown handler, or (b) someone else ended up opening the region and thus the version numbers didn't match. In that latter case, the transition to FAILED_OPEN would fail for the same reason.

> If region opening fails, try to transition region back to "offline" in ZK
> -------------------------------------------------------------------------
>
>                 Key: HBASE-4287
>                 URL: https://issues.apache.org/jira/browse/HBASE-4287
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.94.0
>
>         Attachments: 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK-v2.patch, 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK.patch
>
>
> In the case that region-opening fails, we currently just close the region again, but don't do anything to the node in ZK. Instead, we should attempt to transition it from the OPENING state back to an OFFLINE state, or perhaps a new FAILED_OPEN state. Otherwise, we have to wait for the full timeoutMonitor period to elapse, which can be quite a long time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4287) If region opening fails, try to transition region back to "offline" in ZK

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104893#comment-13104893 ] 

Hudson commented on HBASE-4287:
-------------------------------

Integrated in HBase-TRUNK #2209 (See [https://builds.apache.org/job/HBase-TRUNK/2209/])
    HBASE-4406  TestOpenRegionHandler failing after HBASE-4287

todd : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestOpenRegionHandler.java


> If region opening fails, try to transition region back to "offline" in ZK
> -------------------------------------------------------------------------
>
>                 Key: HBASE-4287
>                 URL: https://issues.apache.org/jira/browse/HBASE-4287
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.92.0
>
>         Attachments: 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK-v2.patch, 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK.patch, hbase-4287.txt
>
>
> In the case that region-opening fails, we currently just close the region again, but don't do anything to the node in ZK. Instead, we should attempt to transition it from the OPENING state back to an OFFLINE state, or perhaps a new FAILED_OPEN state. Otherwise, we have to wait for the full timeoutMonitor period to elapse, which can be quite a long time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4287) If region opening fails, try to transition region back to "offline" in ZK

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093398#comment-13093398 ] 

stack commented on HBASE-4287:
------------------------------

Why a new state?  Don't we have enough to manage already?  Looks like its processed same as RS_ZK_REGION_CLOSED but for the different logging.  Otherwise, having RS update zk on failed close seems fine (long as timeout monitor tolerant of the znode changing under it).

> If region opening fails, try to transition region back to "offline" in ZK
> -------------------------------------------------------------------------
>
>                 Key: HBASE-4287
>                 URL: https://issues.apache.org/jira/browse/HBASE-4287
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.94.0
>
>         Attachments: 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK-v2.patch, 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK.patch
>
>
> In the case that region-opening fails, we currently just close the region again, but don't do anything to the node in ZK. Instead, we should attempt to transition it from the OPENING state back to an OFFLINE state, or perhaps a new FAILED_OPEN state. Otherwise, we have to wait for the full timeoutMonitor period to elapse, which can be quite a long time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4287) If region opening fails, try to transition region back to "offline" in ZK

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HBASE-4287:
-------------------------------

    Attachment: hbase-4287.txt

Updated patch with some unit tests, plus addressing Ram's idea above.

> If region opening fails, try to transition region back to "offline" in ZK
> -------------------------------------------------------------------------
>
>                 Key: HBASE-4287
>                 URL: https://issues.apache.org/jira/browse/HBASE-4287
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.94.0
>
>         Attachments: 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK-v2.patch, 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK.patch, hbase-4287.txt
>
>
> In the case that region-opening fails, we currently just close the region again, but don't do anything to the node in ZK. Instead, we should attempt to transition it from the OPENING state back to an OFFLINE state, or perhaps a new FAILED_OPEN state. Otherwise, we have to wait for the full timeoutMonitor period to elapse, which can be quite a long time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4287) If region opening fails, try to transition region back to "offline" in ZK

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HBASE-4287:
-------------------------------

    Status: Patch Available  (was: Open)

> If region opening fails, try to transition region back to "offline" in ZK
> -------------------------------------------------------------------------
>
>                 Key: HBASE-4287
>                 URL: https://issues.apache.org/jira/browse/HBASE-4287
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.94.0
>
>         Attachments: 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK-v2.patch, 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK.patch, hbase-4287.txt
>
>
> In the case that region-opening fails, we currently just close the region again, but don't do anything to the node in ZK. Instead, we should attempt to transition it from the OPENING state back to an OFFLINE state, or perhaps a new FAILED_OPEN state. Otherwise, we have to wait for the full timeoutMonitor period to elapse, which can be quite a long time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4287) If region opening fails, try to transition region back to "offline" in ZK

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HBASE-4287:
-------------------------------

    Attachment: 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK.patch

Here's a patch which I've tested and basically works on a cluster (well, an 0.90.x cluster).

I didn't follow the 4015 work closely, but I think this state makes sense.

> If region opening fails, try to transition region back to "offline" in ZK
> -------------------------------------------------------------------------
>
>                 Key: HBASE-4287
>                 URL: https://issues.apache.org/jira/browse/HBASE-4287
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.94.0
>
>         Attachments: 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK.patch
>
>
> In the case that region-opening fails, we currently just close the region again, but don't do anything to the node in ZK. Instead, we should attempt to transition it from the OPENING state back to an OFFLINE state, or perhaps a new FAILED_OPEN state. Otherwise, we have to wait for the full timeoutMonitor period to elapse, which can be quite a long time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4287) If region opening fails, try to transition region back to "offline" in ZK

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104633#comment-13104633 ] 

Hudson commented on HBASE-4287:
-------------------------------

Integrated in HBase-TRUNK #2208 (See [https://builds.apache.org/job/HBase-TRUNK/2208/])
    HBASE-4287  If region opening fails, change region in transition into a FAILED_OPEN state so that it can be retried quickly.

todd : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/executor/ExecutorService.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/handler/TestOpenRegionHandler.java


> If region opening fails, try to transition region back to "offline" in ZK
> -------------------------------------------------------------------------
>
>                 Key: HBASE-4287
>                 URL: https://issues.apache.org/jira/browse/HBASE-4287
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.92.0
>
>         Attachments: 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK-v2.patch, 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK.patch, hbase-4287.txt
>
>
> In the case that region-opening fails, we currently just close the region again, but don't do anything to the node in ZK. Instead, we should attempt to transition it from the OPENING state back to an OFFLINE state, or perhaps a new FAILED_OPEN state. Otherwise, we have to wait for the full timeoutMonitor period to elapse, which can be quite a long time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4287) If region opening fails, try to transition region back to "offline" in ZK

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103322#comment-13103322 ] 

ramkrishna.s.vasudevan commented on HBASE-4287:
-----------------------------------------------

+1 on patch. 

> If region opening fails, try to transition region back to "offline" in ZK
> -------------------------------------------------------------------------
>
>                 Key: HBASE-4287
>                 URL: https://issues.apache.org/jira/browse/HBASE-4287
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.94.0
>
>         Attachments: 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK-v2.patch, 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK.patch, hbase-4287.txt
>
>
> In the case that region-opening fails, we currently just close the region again, but don't do anything to the node in ZK. Instead, we should attempt to transition it from the OPENING state back to an OFFLINE state, or perhaps a new FAILED_OPEN state. Otherwise, we have to wait for the full timeoutMonitor period to elapse, which can be quite a long time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4287) If region opening fails, try to transition region back to "offline" in ZK

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103316#comment-13103316 ] 

stack commented on HBASE-4287:
------------------------------

Patch looks good.  The new state I'm not mad about, not without our first having done a review in toto of all possible states to figure what new states we're missing, but I'm grand with this going in.

> If region opening fails, try to transition region back to "offline" in ZK
> -------------------------------------------------------------------------
>
>                 Key: HBASE-4287
>                 URL: https://issues.apache.org/jira/browse/HBASE-4287
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.94.0
>
>         Attachments: 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK-v2.patch, 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK.patch, hbase-4287.txt
>
>
> In the case that region-opening fails, we currently just close the region again, but don't do anything to the node in ZK. Instead, we should attempt to transition it from the OPENING state back to an OFFLINE state, or perhaps a new FAILED_OPEN state. Otherwise, we have to wait for the full timeoutMonitor period to elapse, which can be quite a long time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4287) If region opening fails, try to transition region back to "offline" in ZK

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093337#comment-13093337 ] 

Ted Yu commented on HBASE-4287:
-------------------------------

In the discussion of HBASE-4015, introduction of a new state was not considered the best choice.

> If region opening fails, try to transition region back to "offline" in ZK
> -------------------------------------------------------------------------
>
>                 Key: HBASE-4287
>                 URL: https://issues.apache.org/jira/browse/HBASE-4287
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.94.0
>
>
> In the case that region-opening fails, we currently just close the region again, but don't do anything to the node in ZK. Instead, we should attempt to transition it from the OPENING state back to an OFFLINE state, or perhaps a new FAILED_OPEN state. Otherwise, we have to wait for the full timeoutMonitor period to elapse, which can be quite a long time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4287) If region opening fails, try to transition region back to "offline" in ZK

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HBASE-4287:
-------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

OK. You win. Committed to trunk for 0.92. Thanks for reviewing.

> If region opening fails, try to transition region back to "offline" in ZK
> -------------------------------------------------------------------------
>
>                 Key: HBASE-4287
>                 URL: https://issues.apache.org/jira/browse/HBASE-4287
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.92.0
>
>         Attachments: 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK-v2.patch, 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK.patch, hbase-4287.txt
>
>
> In the case that region-opening fails, we currently just close the region again, but don't do anything to the node in ZK. Instead, we should attempt to transition it from the OPENING state back to an OFFLINE state, or perhaps a new FAILED_OPEN state. Otherwise, we have to wait for the full timeoutMonitor period to elapse, which can be quite a long time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4287) If region opening fails, try to transition region back to "offline" in ZK

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093450#comment-13093450 ] 

ramkrishna.s.vasudevan commented on HBASE-4287:
-----------------------------------------------

@Todd
The updation of the state on failure of open region should also be done here
{code}
      if (!transitionToOpened(region)) {
        cleanupFailedOpen(region);
        return;
      }
{code}
Because here also there a chance for failure due to some zk exception.


> If region opening fails, try to transition region back to "offline" in ZK
> -------------------------------------------------------------------------
>
>                 Key: HBASE-4287
>                 URL: https://issues.apache.org/jira/browse/HBASE-4287
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, regionserver
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.94.0
>
>         Attachments: 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK-v2.patch, 0004-HBASE-4287.-FAILED_OPEN-state-in-ZK.patch
>
>
> In the case that region-opening fails, we currently just close the region again, but don't do anything to the node in ZK. Instead, we should attempt to transition it from the OPENING state back to an OFFLINE state, or perhaps a new FAILED_OPEN state. Otherwise, we have to wait for the full timeoutMonitor period to elapse, which can be quite a long time.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira