You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Amitanand Aiyer (JIRA)" <ji...@apache.org> on 2012/11/30 01:15:58 UTC

[jira] [Created] (HBASE-7242) Use Runtime.exit() instead of Runtime.halt() upon HLog flush failures

Amitanand Aiyer created HBASE-7242:
--------------------------------------

Summary: Use Runtime.exit() instead of Runtime.halt() upon HLog flush failures
Key: HBASE-7242
URL: https://issues.apache.org/jira/browse/HBASE-7242
Project: HBase
Issue Type: Brainstorming
Reporter: Amitanand Aiyer
Priority: Minor

Hey Guys,
Should we use Runtime.exit() instead of Runtime.halt(), when we fail a Hlog sync.

The key difference is that Runtime.exit() is going to invoke the shutdown hooks; while Runtime.halt() does not.

Why we might need this:
We had a HDFS name node reboot today on one of our cells, and this caused multiple region servers to abort because they could not sync the Hlog.

However, since multiple RS died simultaneously, this seemed like a co-related failure to the master. The master waits for the
Znode to expire; but, this could take up to few minutes after RS death (this setting is in place so that we can withstand rack switch reboots, lasting a couple of minutes, without region movement).

If the shutdown hooks are called, RS will close the ZK connection, causing a immediate Znode expiry. This might help cut down the unavailability as
Regions can begin to get assigned faster.

While, we do want to abort on Hlog failure, I do not think it would hurt giving the JVM a few seconds to shutdown gracefully. Please let me know
If I am missing something.

Thanks,
-Amit

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7242) Use Runtime.exit() instead of Runtime.halt() upon HLog Sync failures

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507089#comment-13507089 ] 

stack commented on HBASE-7242:
------------------------------

What Kannan said (though if the abort flag is set, they might skip doing this)?
                
> Use Runtime.exit() instead of Runtime.halt() upon HLog Sync failures
> --------------------------------------------------------------------
>
>                 Key: HBASE-7242
>                 URL: https://issues.apache.org/jira/browse/HBASE-7242
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Amitanand Aiyer
>            Priority: Minor
>
> Hey Guys,
>   Should we use Runtime.exit() instead of Runtime.halt(), when we fail a Hlog sync. 
>  The key difference is that Runtime.exit() is going to invoke the shutdown hooks; while Runtime.halt() does not.
>  Why we might need this: 
>    We had a HDFS name node reboot today on one of our cells, and this caused multiple region servers to abort because they could not sync the Hlog.
>    However, since multiple RS died simultaneously, this seemed like a co-related failure to the master. The master waits for the
> Znode to expire; but, this could take up to few minutes after RS death (this setting is in place so that we can withstand rack switch reboots, lasting a couple of minutes, without region movement).
>   If the shutdown hooks are called, RS will close the ZK connection, causing a immediate Znode expiry. This might help cut down the unavailability as 
> Regions can begin to get assigned faster.
>  While, we do want to abort on Hlog failure, I do not think it would hurt giving the JVM a few seconds to shutdown gracefully. Please let me know
> If I am missing something.
> Thanks,
> -Amit

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7242) Use Runtime.exit() instead of Runtime.halt() upon HLog Sync failures

Posted by "Amitanand Aiyer (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508131#comment-13508131 ] 

Amitanand Aiyer commented on HBASE-7242:
----------------------------------------

Yes, thats true. 

But, we only try to do that if we are not requesting aborts, and fs is OK. (Although this state is maintained by HRegionServer, we can get update it, in this code path, if we want to skip immediately).
                
> Use Runtime.exit() instead of Runtime.halt() upon HLog Sync failures
> --------------------------------------------------------------------
>
>                 Key: HBASE-7242
>                 URL: https://issues.apache.org/jira/browse/HBASE-7242
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Amitanand Aiyer
>            Priority: Minor
>
> Hey Guys,
>   Should we use Runtime.exit() instead of Runtime.halt(), when we fail a Hlog sync. 
>  The key difference is that Runtime.exit() is going to invoke the shutdown hooks; while Runtime.halt() does not.
>  Why we might need this: 
>    We had a HDFS name node reboot today on one of our cells, and this caused multiple region servers to abort because they could not sync the Hlog.
>    However, since multiple RS died simultaneously, this seemed like a co-related failure to the master. The master waits for the
> Znode to expire; but, this could take up to few minutes after RS death (this setting is in place so that we can withstand rack switch reboots, lasting a couple of minutes, without region movement).
>   If the shutdown hooks are called, RS will close the ZK connection, causing a immediate Znode expiry. This might help cut down the unavailability as 
> Regions can begin to get assigned faster.
>  While, we do want to abort on Hlog failure, I do not think it would hurt giving the JVM a few seconds to shutdown gracefully. Please let me know
> If I am missing something.
> Thanks,
> -Amit

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7242) Use Runtime.exit() instead of Runtime.halt() upon HLog Sync failures

Posted by "Amitanand Aiyer (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amitanand Aiyer updated HBASE-7242:
-----------------------------------

    Summary: Use Runtime.exit() instead of Runtime.halt() upon HLog Sync failures  (was: Use Runtime.exit() instead of Runtime.halt() upon HLog flush failures)
    
> Use Runtime.exit() instead of Runtime.halt() upon HLog Sync failures
> --------------------------------------------------------------------
>
>                 Key: HBASE-7242
>                 URL: https://issues.apache.org/jira/browse/HBASE-7242
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Amitanand Aiyer
>            Priority: Minor
>
> Hey Guys,
>   Should we use Runtime.exit() instead of Runtime.halt(), when we fail a Hlog sync. 
>  The key difference is that Runtime.exit() is going to invoke the shutdown hooks; while Runtime.halt() does not.
>  Why we might need this: 
>    We had a HDFS name node reboot today on one of our cells, and this caused multiple region servers to abort because they could not sync the Hlog.
>    However, since multiple RS died simultaneously, this seemed like a co-related failure to the master. The master waits for the
> Znode to expire; but, this could take up to few minutes after RS death (this setting is in place so that we can withstand rack switch reboots, lasting a couple of minutes, without region movement).
>   If the shutdown hooks are called, RS will close the ZK connection, causing a immediate Znode expiry. This might help cut down the unavailability as 
> Regions can begin to get assigned faster.
>  While, we do want to abort on Hlog failure, I do not think it would hurt giving the JVM a few seconds to shutdown gracefully. Please let me know
> If I am missing something.
> Thanks,
> -Amit

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7242) Use Runtime.exit() instead of Runtime.halt() upon HLog Sync failures

Posted by "Kannan Muthukkaruppan (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506993#comment-13506993 ] 

Kannan Muthukkaruppan commented on HBASE-7242:
----------------------------------------------

Currently, don't the shutdown hooks also try to flush/close the regions before closing the ZK connection?
                
> Use Runtime.exit() instead of Runtime.halt() upon HLog Sync failures
> --------------------------------------------------------------------
>
>                 Key: HBASE-7242
>                 URL: https://issues.apache.org/jira/browse/HBASE-7242
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Amitanand Aiyer
>            Priority: Minor
>
> Hey Guys,
>   Should we use Runtime.exit() instead of Runtime.halt(), when we fail a Hlog sync. 
>  The key difference is that Runtime.exit() is going to invoke the shutdown hooks; while Runtime.halt() does not.
>  Why we might need this: 
>    We had a HDFS name node reboot today on one of our cells, and this caused multiple region servers to abort because they could not sync the Hlog.
>    However, since multiple RS died simultaneously, this seemed like a co-related failure to the master. The master waits for the
> Znode to expire; but, this could take up to few minutes after RS death (this setting is in place so that we can withstand rack switch reboots, lasting a couple of minutes, without region movement).
>   If the shutdown hooks are called, RS will close the ZK connection, causing a immediate Znode expiry. This might help cut down the unavailability as 
> Regions can begin to get assigned faster.
>  While, we do want to abort on Hlog failure, I do not think it would hurt giving the JVM a few seconds to shutdown gracefully. Please let me know
> If I am missing something.
> Thanks,
> -Amit

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira