You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Anirudh Todi (JIRA)" <ji...@apache.org> on 2011/08/05 19:11:27 UTC

[jira] [Created] (HBASE-4168) A client continues to try and connect to a powered down regionserver

A client continues to try and connect to a powered down regionserver
--------------------------------------------------------------------

                 Key: HBASE-4168
                 URL: https://issues.apache.org/jira/browse/HBASE-4168
             Project: HBase
          Issue Type: Bug
            Reporter: Anirudh Todi
            Assignee: Anirudh Todi
            Priority: Minor


Experiment-1

Started a dev cluster - META is on the same regionserver as my key-value. I kill the regionserver process but donot power down the machine.
The META is able to migrate to a new regionserver and the regions are also able to reopen elsewhere.
The client is able to talk to the META and find the new kv location and get it.

Experiment-2

Started a dev cluster - META is on a different regionserver as my key-value. I kill the regionserver process but donot power down the machine.
The META remains where it is and the regions are also able to reopen elsewhere.
The client is able to talk to the META and find the new kv location and get it.

Experiment-3

Started a dev cluster - META is on a different regionserver as my key-value. I power down the machine hosting this regionserver.
The META remains where it is and the regions are also able to reopen elsewhere.
The client is able to talk to the META and find the new kv location and get it.

Experiment-4 (This is the problematic one)

Started a dev cluster - META is on the same regionserver as my key-value. I power down the machine hosting this regionserver.
The META is able to migrate to a new regionserver - however - it takes a really long time (~30 minutes)
The regions on that regionserver DONOT reopen (I waited for 1 hour)
The client is able to find the new location of the META, however, the META keeps redirecting the client to powered down
regionserver as the location of the key-value it is trying to get. Thus the client's get is unsuccessful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4168) A client continues to try and connect to a powered down regionserver

Posted by "Anirudh Todi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anirudh Todi updated HBASE-4168:
--------------------------------

    Attachment: HBASE-4168.patch

Attaching a patch for issue

> A client continues to try and connect to a powered down regionserver
> --------------------------------------------------------------------
>
>                 Key: HBASE-4168
>                 URL: https://issues.apache.org/jira/browse/HBASE-4168
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Anirudh Todi
>            Assignee: Anirudh Todi
>            Priority: Minor
>         Attachments: HBASE-4168.patch
>
>
> Experiment-1
> Started a dev cluster - META is on the same regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META is able to migrate to a new regionserver and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-2
> Started a dev cluster - META is on a different regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-3
> Started a dev cluster - META is on a different regionserver as my key-value. I power down the machine hosting this regionserver.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-4 (This is the problematic one)
> Started a dev cluster - META is on the same regionserver as my key-value. I power down the machine hosting this regionserver.
> The META is able to migrate to a new regionserver - however - it takes a really long time (~30 minutes)
> The regions on that regionserver DONOT reopen (I waited for 1 hour)
> The client is able to find the new location of the META, however, the META keeps redirecting the client to powered down
> regionserver as the location of the key-value it is trying to get. Thus the client's get is unsuccessful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4168) A client continues to try and connect to a powered down regionserver

Posted by "Anirudh Todi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anirudh Todi updated HBASE-4168:
--------------------------------

    Attachment: hbase-hadoop-master-msgstore232.snc4.facebook.com.log

Hi Ted,

I have attached the master log. The master log is from when I ran experiment-4 on a hbase-90 branch.

The patch I submitted is from the hbase-92 version

I'm not sure if you need to handle the IOException unwrapped from RemoteException in the same way - everything seems to work w/o handling it the same way

> A client continues to try and connect to a powered down regionserver
> --------------------------------------------------------------------
>
>                 Key: HBASE-4168
>                 URL: https://issues.apache.org/jira/browse/HBASE-4168
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Anirudh Todi
>            Assignee: Anirudh Todi
>            Priority: Minor
>         Attachments: HBASE-4168-revised.patch, HBASE-4168.patch, hbase-hadoop-master-msgstore232.snc4.facebook.com.log
>
>
> Experiment-1
> Started a dev cluster - META is on the same regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META is able to migrate to a new regionserver and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-2
> Started a dev cluster - META is on a different regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-3
> Started a dev cluster - META is on a different regionserver as my key-value. I power down the machine hosting this regionserver.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-4 (This is the problematic one)
> Started a dev cluster - META is on the same regionserver as my key-value. I power down the machine hosting this regionserver.
> The META is able to migrate to a new regionserver - however - it takes a really long time (~30 minutes)
> The regions on that regionserver DONOT reopen (I waited for 1 hour)
> The client is able to find the new location of the META, however, the META keeps redirecting the client to powered down
> regionserver as the location of the key-value it is trying to get. Thus the client's get is unsuccessful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4168) A client continues to try and connect to a powered down regionserver

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082081#comment-13082081 ] 

Ted Yu commented on HBASE-4168:
-------------------------------

+1 on patch version 6.
Indentation for the following shouldn't be changed:
{code}
+                 && cause.getMessage().contains("Connection reset")) {
{code}

> A client continues to try and connect to a powered down regionserver
> --------------------------------------------------------------------
>
>                 Key: HBASE-4168
>                 URL: https://issues.apache.org/jira/browse/HBASE-4168
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Anirudh Todi
>            Assignee: Anirudh Todi
>            Priority: Critical
>         Attachments: HBASE-4168(2).patch, HBASE-4168(3).patch, HBASE-4168(4).patch, HBASE-4168(5).patch, HBASE-4168(6).patch, HBASE-4168-revised.patch, HBASE-4168.patch, hbase-hadoop-master-msgstore232.snc4.facebook.com.log
>
>
> Experiment-1
> Started a dev cluster - META is on the same regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META is able to migrate to a new regionserver and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-2
> Started a dev cluster - META is on a different regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-3
> Started a dev cluster - META is on a different regionserver as my key-value. I power down the machine hosting this regionserver.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-4 (This is the problematic one)
> Started a dev cluster - META is on the same regionserver as my key-value. I power down the machine hosting this regionserver.
> The META is able to migrate to a new regionserver - however - it takes a really long time (~30 minutes)
> The regions on that regionserver DONOT reopen (I waited for 1 hour)
> The client is able to find the new location of the META, however, the META keeps redirecting the client to powered down
> regionserver as the location of the key-value it is trying to get. Thus the client's get is unsuccessful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (HBASE-4168) A client continues to try and connect to a powered down regionserver

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-4168.
--------------------------

       Resolution: Fixed
    Fix Version/s: 0.90.5
     Hadoop Flags: [Reviewed]

Committed to branch and trunk.  Thanks for the patch Anirudh (And for review Tedd)

> A client continues to try and connect to a powered down regionserver
> --------------------------------------------------------------------
>
>                 Key: HBASE-4168
>                 URL: https://issues.apache.org/jira/browse/HBASE-4168
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Anirudh Todi
>            Assignee: Anirudh Todi
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: HBASE-4168(2).patch, HBASE-4168(3).patch, HBASE-4168(4).patch, HBASE-4168(5).patch, HBASE-4168(6).patch, HBASE-4168(7).patch, HBASE-4168-revised.patch, HBASE-4168.patch, hbase-hadoop-master-msgstore232.snc4.facebook.com.log
>
>
> Experiment-1
> Started a dev cluster - META is on the same regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META is able to migrate to a new regionserver and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-2
> Started a dev cluster - META is on a different regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-3
> Started a dev cluster - META is on a different regionserver as my key-value. I power down the machine hosting this regionserver.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-4 (This is the problematic one)
> Started a dev cluster - META is on the same regionserver as my key-value. I power down the machine hosting this regionserver.
> The META is able to migrate to a new regionserver - however - it takes a really long time (~30 minutes)
> The regions on that regionserver DONOT reopen (I waited for 1 hour)
> The client is able to find the new location of the META, however, the META keeps redirecting the client to powered down
> regionserver as the location of the key-value it is trying to get. Thus the client's get is unsuccessful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4168) A client continues to try and connect to a powered down regionserver

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081796#comment-13081796 ] 

Ted Yu commented on HBASE-4168:
-------------------------------

This happened in our staging cluster this morning.
System event log:
{code}
Tue Aug 09 2011 14:52:54		System Software event: OS Stop sensor, run-time critical stop was asserted	0.000010
{code}
Master came down after that. Here is snippet of master log:
{code}
2011-08-09 15:12:13,147 FATAL org.apache.hadoop.hbase.master.HMaster: verifyAndAssignRoot failed after10 times retries, aborting
java.net.NoRouteToHostException: No route to host
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
        at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
        at $Proxy8.getRegionInfo(Unknown Source)
        at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRegionLocation(CatalogTracker.java:426)
        at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:473)
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:91)
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:110)
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:163)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:156)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
2011-08-09 15:12:13,147 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
2011-08-09 15:12:13,147 ERROR org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while processing event M_META_SERVER_SHUTDOWN
java.io.IOException: Aborting
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:119)
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:163)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:156)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.NoRouteToHostException: No route to host
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
        at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
        at $Proxy8.getRegionInfo(Unknown Source)
        at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRegionLocation(CatalogTracker.java:426)
        at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:473)
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRoot(ServerShutdownHandler.java:91)
        at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.verifyAndAssignRootWithRetries(ServerShutdownHandler.java:110)
        ... 5 more
2011-08-09 15:12:13,809 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads
{code}

> A client continues to try and connect to a powered down regionserver
> --------------------------------------------------------------------
>
>                 Key: HBASE-4168
>                 URL: https://issues.apache.org/jira/browse/HBASE-4168
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Anirudh Todi
>            Assignee: Anirudh Todi
>            Priority: Minor
>         Attachments: HBASE-4168(2).patch, HBASE-4168-revised.patch, HBASE-4168.patch, hbase-hadoop-master-msgstore232.snc4.facebook.com.log
>
>
> Experiment-1
> Started a dev cluster - META is on the same regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META is able to migrate to a new regionserver and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-2
> Started a dev cluster - META is on a different regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-3
> Started a dev cluster - META is on a different regionserver as my key-value. I power down the machine hosting this regionserver.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-4 (This is the problematic one)
> Started a dev cluster - META is on the same regionserver as my key-value. I power down the machine hosting this regionserver.
> The META is able to migrate to a new regionserver - however - it takes a really long time (~30 minutes)
> The regions on that regionserver DONOT reopen (I waited for 1 hour)
> The client is able to find the new location of the META, however, the META keeps redirecting the client to powered down
> regionserver as the location of the key-value it is trying to get. Thus the client's get is unsuccessful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4168) A client continues to try and connect to a powered down regionserver

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081876#comment-13081876 ] 

Ted Yu commented on HBASE-4168:
-------------------------------

{code}
-      } else if (cause != null && cause.getMessage() != null
-          && cause.getMessage().contains("Connection reset")) {
{code}
I think we should keep the condition for assigning cause to t.

> A client continues to try and connect to a powered down regionserver
> --------------------------------------------------------------------
>
>                 Key: HBASE-4168
>                 URL: https://issues.apache.org/jira/browse/HBASE-4168
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Anirudh Todi
>            Assignee: Anirudh Todi
>            Priority: Critical
>         Attachments: HBASE-4168(2).patch, HBASE-4168(3).patch, HBASE-4168(4).patch, HBASE-4168-revised.patch, HBASE-4168.patch, hbase-hadoop-master-msgstore232.snc4.facebook.com.log
>
>
> Experiment-1
> Started a dev cluster - META is on the same regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META is able to migrate to a new regionserver and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-2
> Started a dev cluster - META is on a different regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-3
> Started a dev cluster - META is on a different regionserver as my key-value. I power down the machine hosting this regionserver.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-4 (This is the problematic one)
> Started a dev cluster - META is on the same regionserver as my key-value. I power down the machine hosting this regionserver.
> The META is able to migrate to a new regionserver - however - it takes a really long time (~30 minutes)
> The regions on that regionserver DONOT reopen (I waited for 1 hour)
> The client is able to find the new location of the META, however, the META keeps redirecting the client to powered down
> regionserver as the location of the key-value it is trying to get. Thus the client's get is unsuccessful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4168) A client continues to try and connect to a powered down regionserver

Posted by "Anirudh Todi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anirudh Todi updated HBASE-4168:
--------------------------------

    Attachment: HBASE-4168-revised.patch

> A client continues to try and connect to a powered down regionserver
> --------------------------------------------------------------------
>
>                 Key: HBASE-4168
>                 URL: https://issues.apache.org/jira/browse/HBASE-4168
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Anirudh Todi
>            Assignee: Anirudh Todi
>            Priority: Minor
>         Attachments: HBASE-4168-revised.patch, HBASE-4168.patch
>
>
> Experiment-1
> Started a dev cluster - META is on the same regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META is able to migrate to a new regionserver and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-2
> Started a dev cluster - META is on a different regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-3
> Started a dev cluster - META is on a different regionserver as my key-value. I power down the machine hosting this regionserver.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-4 (This is the problematic one)
> Started a dev cluster - META is on the same regionserver as my key-value. I power down the machine hosting this regionserver.
> The META is able to migrate to a new regionserver - however - it takes a really long time (~30 minutes)
> The regions on that regionserver DONOT reopen (I waited for 1 hour)
> The client is able to find the new location of the META, however, the META keeps redirecting the client to powered down
> regionserver as the location of the key-value it is trying to get. Thus the client's get is unsuccessful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4168) A client continues to try and connect to a powered down regionserver

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082851#comment-13082851 ] 

Hudson commented on HBASE-4168:
-------------------------------

Integrated in HBase-TRUNK #2108 (See [https://builds.apache.org/job/HBase-TRUNK/2108/])
    HBASE-4168 A client continues to try and connect to a powered down regionserver

stack : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java


> A client continues to try and connect to a powered down regionserver
> --------------------------------------------------------------------
>
>                 Key: HBASE-4168
>                 URL: https://issues.apache.org/jira/browse/HBASE-4168
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Anirudh Todi
>            Assignee: Anirudh Todi
>            Priority: Critical
>             Fix For: 0.90.5
>
>         Attachments: HBASE-4168(2).patch, HBASE-4168(3).patch, HBASE-4168(4).patch, HBASE-4168(5).patch, HBASE-4168(6).patch, HBASE-4168(7).patch, HBASE-4168-revised.patch, HBASE-4168.patch, hbase-hadoop-master-msgstore232.snc4.facebook.com.log
>
>
> Experiment-1
> Started a dev cluster - META is on the same regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META is able to migrate to a new regionserver and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-2
> Started a dev cluster - META is on a different regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-3
> Started a dev cluster - META is on a different regionserver as my key-value. I power down the machine hosting this regionserver.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-4 (This is the problematic one)
> Started a dev cluster - META is on the same regionserver as my key-value. I power down the machine hosting this regionserver.
> The META is able to migrate to a new regionserver - however - it takes a really long time (~30 minutes)
> The regions on that regionserver DONOT reopen (I waited for 1 hour)
> The client is able to find the new location of the META, however, the META keeps redirecting the client to powered down
> regionserver as the location of the key-value it is trying to get. Thus the client's get is unsuccessful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4168) A client continues to try and connect to a powered down regionserver

Posted by "Anirudh Todi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anirudh Todi updated HBASE-4168:
--------------------------------

    Attachment: HBASE-4168(5).patch

Updated - HBASE-4168(5).patch

> A client continues to try and connect to a powered down regionserver
> --------------------------------------------------------------------
>
>                 Key: HBASE-4168
>                 URL: https://issues.apache.org/jira/browse/HBASE-4168
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Anirudh Todi
>            Assignee: Anirudh Todi
>            Priority: Critical
>         Attachments: HBASE-4168(2).patch, HBASE-4168(3).patch, HBASE-4168(4).patch, HBASE-4168(5).patch, HBASE-4168-revised.patch, HBASE-4168.patch, hbase-hadoop-master-msgstore232.snc4.facebook.com.log
>
>
> Experiment-1
> Started a dev cluster - META is on the same regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META is able to migrate to a new regionserver and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-2
> Started a dev cluster - META is on a different regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-3
> Started a dev cluster - META is on a different regionserver as my key-value. I power down the machine hosting this regionserver.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-4 (This is the problematic one)
> Started a dev cluster - META is on the same regionserver as my key-value. I power down the machine hosting this regionserver.
> The META is able to migrate to a new regionserver - however - it takes a really long time (~30 minutes)
> The regions on that regionserver DONOT reopen (I waited for 1 hour)
> The client is able to find the new location of the META, however, the META keeps redirecting the client to powered down
> regionserver as the location of the key-value it is trying to get. Thus the client's get is unsuccessful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4168) A client continues to try and connect to a powered down regionserver

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080195#comment-13080195 ] 

Ted Yu commented on HBASE-4168:
-------------------------------

Looking at CatalogTracker in 0.90 branch, I see this at line 434:
{code}
        throw e;
{code}
Please describe the version of HBase you used.
Attaching master log containing the above stack would help us understand the issue better.

Should we handle the IOException unwrapped from RemoteException in a similar manner ?

> A client continues to try and connect to a powered down regionserver
> --------------------------------------------------------------------
>
>                 Key: HBASE-4168
>                 URL: https://issues.apache.org/jira/browse/HBASE-4168
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Anirudh Todi
>            Assignee: Anirudh Todi
>            Priority: Minor
>         Attachments: HBASE-4168-revised.patch, HBASE-4168.patch
>
>
> Experiment-1
> Started a dev cluster - META is on the same regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META is able to migrate to a new regionserver and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-2
> Started a dev cluster - META is on a different regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-3
> Started a dev cluster - META is on a different regionserver as my key-value. I power down the machine hosting this regionserver.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-4 (This is the problematic one)
> Started a dev cluster - META is on the same regionserver as my key-value. I power down the machine hosting this regionserver.
> The META is able to migrate to a new regionserver - however - it takes a really long time (~30 minutes)
> The regions on that regionserver DONOT reopen (I waited for 1 hour)
> The client is able to find the new location of the META, however, the META keeps redirecting the client to powered down
> regionserver as the location of the key-value it is trying to get. Thus the client's get is unsuccessful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4168) A client continues to try and connect to a powered down regionserver

Posted by "Anirudh Todi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anirudh Todi updated HBASE-4168:
--------------------------------

    Attachment: HBASE-4168(6).patch

Attached the wrong file - should have attached HBASE-4168(6).patch

> A client continues to try and connect to a powered down regionserver
> --------------------------------------------------------------------
>
>                 Key: HBASE-4168
>                 URL: https://issues.apache.org/jira/browse/HBASE-4168
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Anirudh Todi
>            Assignee: Anirudh Todi
>            Priority: Critical
>         Attachments: HBASE-4168(2).patch, HBASE-4168(3).patch, HBASE-4168(4).patch, HBASE-4168(5).patch, HBASE-4168(6).patch, HBASE-4168-revised.patch, HBASE-4168.patch, hbase-hadoop-master-msgstore232.snc4.facebook.com.log
>
>
> Experiment-1
> Started a dev cluster - META is on the same regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META is able to migrate to a new regionserver and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-2
> Started a dev cluster - META is on a different regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-3
> Started a dev cluster - META is on a different regionserver as my key-value. I power down the machine hosting this regionserver.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-4 (This is the problematic one)
> Started a dev cluster - META is on the same regionserver as my key-value. I power down the machine hosting this regionserver.
> The META is able to migrate to a new regionserver - however - it takes a really long time (~30 minutes)
> The regions on that regionserver DONOT reopen (I waited for 1 hour)
> The client is able to find the new location of the META, however, the META keeps redirecting the client to powered down
> regionserver as the location of the key-value it is trying to get. Thus the client's get is unsuccessful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4168) A client continues to try and connect to a powered down regionserver

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080679#comment-13080679 ] 

Ted Yu commented on HBASE-4168:
-------------------------------

Without specific version of hbase 90, I haven't figured out the variable which was null in verifyRegionLocation().
Revised patch doesn't seem to provide more clue for the above.
Can you check the master log which had your fix to see if there was any exception there.

> A client continues to try and connect to a powered down regionserver
> --------------------------------------------------------------------
>
>                 Key: HBASE-4168
>                 URL: https://issues.apache.org/jira/browse/HBASE-4168
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Anirudh Todi
>            Assignee: Anirudh Todi
>            Priority: Minor
>         Attachments: HBASE-4168-revised.patch, HBASE-4168.patch, hbase-hadoop-master-msgstore232.snc4.facebook.com.log
>
>
> Experiment-1
> Started a dev cluster - META is on the same regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META is able to migrate to a new regionserver and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-2
> Started a dev cluster - META is on a different regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-3
> Started a dev cluster - META is on a different regionserver as my key-value. I power down the machine hosting this regionserver.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-4 (This is the problematic one)
> Started a dev cluster - META is on the same regionserver as my key-value. I power down the machine hosting this regionserver.
> The META is able to migrate to a new regionserver - however - it takes a really long time (~30 minutes)
> The regions on that regionserver DONOT reopen (I waited for 1 hour)
> The client is able to find the new location of the META, however, the META keeps redirecting the client to powered down
> regionserver as the location of the key-value it is trying to get. Thus the client's get is unsuccessful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4168) A client continues to try and connect to a powered down regionserver

Posted by "Anirudh Todi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080088#comment-13080088 ] 

Anirudh Todi commented on HBASE-4168:
-------------------------------------

When the experiment described in the Description was failing, I inspected the logs of the master.

The master had finished splitting the logs. It opened region .META. on a new regionserver. It said that it detected completed assignment of META and that it was notifying the catalog tracker and then threw a NPE while processing event M_META_SERVER_SHUTDOWN. Below is the error stack that it gave:

java.lang.NullPointerException
at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRegionLocation(CatalogTracker.java:434)
at org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:271)
at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMeta(CatalogTracker.java:323)
at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForMetaServerConnectionDefault(CatalogTracker.java:363)
at org.apache.hadoop.hbase.catalog.MetaReader.getServerUserRegions(MetaReader.java:566)
at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:125)
at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

> A client continues to try and connect to a powered down regionserver
> --------------------------------------------------------------------
>
>                 Key: HBASE-4168
>                 URL: https://issues.apache.org/jira/browse/HBASE-4168
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Anirudh Todi
>            Assignee: Anirudh Todi
>            Priority: Minor
>
> Experiment-1
> Started a dev cluster - META is on the same regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META is able to migrate to a new regionserver and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-2
> Started a dev cluster - META is on a different regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-3
> Started a dev cluster - META is on a different regionserver as my key-value. I power down the machine hosting this regionserver.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-4 (This is the problematic one)
> Started a dev cluster - META is on the same regionserver as my key-value. I power down the machine hosting this regionserver.
> The META is able to migrate to a new regionserver - however - it takes a really long time (~30 minutes)
> The regions on that regionserver DONOT reopen (I waited for 1 hour)
> The client is able to find the new location of the META, however, the META keeps redirecting the client to powered down
> regionserver as the location of the key-value it is trying to get. Thus the client's get is unsuccessful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4168) A client continues to try and connect to a powered down regionserver

Posted by "Anirudh Todi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anirudh Todi updated HBASE-4168:
--------------------------------

    Attachment: HBASE-4168(7).patch

Updated.

However, Ted - it seems that that line's original indentation is off.

> A client continues to try and connect to a powered down regionserver
> --------------------------------------------------------------------
>
>                 Key: HBASE-4168
>                 URL: https://issues.apache.org/jira/browse/HBASE-4168
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Anirudh Todi
>            Assignee: Anirudh Todi
>            Priority: Critical
>         Attachments: HBASE-4168(2).patch, HBASE-4168(3).patch, HBASE-4168(4).patch, HBASE-4168(5).patch, HBASE-4168(6).patch, HBASE-4168(7).patch, HBASE-4168-revised.patch, HBASE-4168.patch, hbase-hadoop-master-msgstore232.snc4.facebook.com.log
>
>
> Experiment-1
> Started a dev cluster - META is on the same regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META is able to migrate to a new regionserver and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-2
> Started a dev cluster - META is on a different regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-3
> Started a dev cluster - META is on a different regionserver as my key-value. I power down the machine hosting this regionserver.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-4 (This is the problematic one)
> Started a dev cluster - META is on the same regionserver as my key-value. I power down the machine hosting this regionserver.
> The META is able to migrate to a new regionserver - however - it takes a really long time (~30 minutes)
> The regions on that regionserver DONOT reopen (I waited for 1 hour)
> The client is able to find the new location of the META, however, the META keeps redirecting the client to powered down
> regionserver as the location of the key-value it is trying to get. Thus the client's get is unsuccessful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4168) A client continues to try and connect to a powered down regionserver

Posted by "Anirudh Todi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anirudh Todi updated HBASE-4168:
--------------------------------

    Attachment: HBASE-4168(4).patch
                HBASE-4168(3).patch

Hi Ted,

Have attached two revised patches.

In both the patches, I don't throw an exception. Instead I return false.

In HBASE-4168(3).patch - I have stuck to the original structure of the code for parity

In HBASE-4168(4).patch - it's changed a little. Since we're always returning false - I've gotten rid of some of the if-else clauses.

> A client continues to try and connect to a powered down regionserver
> --------------------------------------------------------------------
>
>                 Key: HBASE-4168
>                 URL: https://issues.apache.org/jira/browse/HBASE-4168
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Anirudh Todi
>            Assignee: Anirudh Todi
>            Priority: Critical
>         Attachments: HBASE-4168(2).patch, HBASE-4168(3).patch, HBASE-4168(4).patch, HBASE-4168-revised.patch, HBASE-4168.patch, hbase-hadoop-master-msgstore232.snc4.facebook.com.log
>
>
> Experiment-1
> Started a dev cluster - META is on the same regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META is able to migrate to a new regionserver and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-2
> Started a dev cluster - META is on a different regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-3
> Started a dev cluster - META is on a different regionserver as my key-value. I power down the machine hosting this regionserver.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-4 (This is the problematic one)
> Started a dev cluster - META is on the same regionserver as my key-value. I power down the machine hosting this regionserver.
> The META is able to migrate to a new regionserver - however - it takes a really long time (~30 minutes)
> The regions on that regionserver DONOT reopen (I waited for 1 hour)
> The client is able to find the new location of the META, however, the META keeps redirecting the client to powered down
> regionserver as the location of the key-value it is trying to get. Thus the client's get is unsuccessful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4168) A client continues to try and connect to a powered down regionserver

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081383#comment-13081383 ] 

Ted Yu commented on HBASE-4168:
-------------------------------

I think reconnecting to the META makes sense.

Can you add 'cause != null && ' to the above line and show us what IOException was thrown ?

Thanks

> A client continues to try and connect to a powered down regionserver
> --------------------------------------------------------------------
>
>                 Key: HBASE-4168
>                 URL: https://issues.apache.org/jira/browse/HBASE-4168
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Anirudh Todi
>            Assignee: Anirudh Todi
>            Priority: Minor
>         Attachments: HBASE-4168(2).patch, HBASE-4168-revised.patch, HBASE-4168.patch, hbase-hadoop-master-msgstore232.snc4.facebook.com.log
>
>
> Experiment-1
> Started a dev cluster - META is on the same regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META is able to migrate to a new regionserver and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-2
> Started a dev cluster - META is on a different regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-3
> Started a dev cluster - META is on a different regionserver as my key-value. I power down the machine hosting this regionserver.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-4 (This is the problematic one)
> Started a dev cluster - META is on the same regionserver as my key-value. I power down the machine hosting this regionserver.
> The META is able to migrate to a new regionserver - however - it takes a really long time (~30 minutes)
> The regions on that regionserver DONOT reopen (I waited for 1 hour)
> The client is able to find the new location of the META, however, the META keeps redirecting the client to powered down
> regionserver as the location of the key-value it is trying to get. Thus the client's get is unsuccessful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4168) A client continues to try and connect to a powered down regionserver

Posted by "Anirudh Todi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anirudh Todi updated HBASE-4168:
--------------------------------

    Attachment: HBASE-4168(2).patch

@Ted - the patch I submitted is from the open-source trunk which I checked out here - https://svn.apache.org/repos/asf/hbase/trunk/

I see your source of confusion now. In trunk's CatalogTracker, line 469 is:

{noformat}
} else if (cause != null && cause.getMessage() != null
{noformat}

the internal branch had:

{noformat}
} else if (cause.getMessage() != null)
{noformat}

and when I conducted Experiment-4 using the internal branch, cause turned out to be null - and I received a NullPointerException at that line

However, would it still be better to return false and retry connecting to the META instead of throwing an exception there?
I have uploaded a new patch in which I am handling the IOException unwrapped from RemoteException in a similar manner.

> A client continues to try and connect to a powered down regionserver
> --------------------------------------------------------------------
>
>                 Key: HBASE-4168
>                 URL: https://issues.apache.org/jira/browse/HBASE-4168
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Anirudh Todi
>            Assignee: Anirudh Todi
>            Priority: Minor
>         Attachments: HBASE-4168(2).patch, HBASE-4168-revised.patch, HBASE-4168.patch, hbase-hadoop-master-msgstore232.snc4.facebook.com.log
>
>
> Experiment-1
> Started a dev cluster - META is on the same regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META is able to migrate to a new regionserver and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-2
> Started a dev cluster - META is on a different regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-3
> Started a dev cluster - META is on a different regionserver as my key-value. I power down the machine hosting this regionserver.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-4 (This is the problematic one)
> Started a dev cluster - META is on the same regionserver as my key-value. I power down the machine hosting this regionserver.
> The META is able to migrate to a new regionserver - however - it takes a really long time (~30 minutes)
> The regions on that regionserver DONOT reopen (I waited for 1 hour)
> The client is able to find the new location of the META, however, the META keeps redirecting the client to powered down
> regionserver as the location of the key-value it is trying to get. Thus the client's get is unsuccessful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4168) A client continues to try and connect to a powered down regionserver

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4168:
--------------------------

    Priority: Critical  (was: Minor)

> A client continues to try and connect to a powered down regionserver
> --------------------------------------------------------------------
>
>                 Key: HBASE-4168
>                 URL: https://issues.apache.org/jira/browse/HBASE-4168
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Anirudh Todi
>            Assignee: Anirudh Todi
>            Priority: Critical
>         Attachments: HBASE-4168(2).patch, HBASE-4168-revised.patch, HBASE-4168.patch, hbase-hadoop-master-msgstore232.snc4.facebook.com.log
>
>
> Experiment-1
> Started a dev cluster - META is on the same regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META is able to migrate to a new regionserver and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-2
> Started a dev cluster - META is on a different regionserver as my key-value. I kill the regionserver process but donot power down the machine.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-3
> Started a dev cluster - META is on a different regionserver as my key-value. I power down the machine hosting this regionserver.
> The META remains where it is and the regions are also able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get it.
> Experiment-4 (This is the problematic one)
> Started a dev cluster - META is on the same regionserver as my key-value. I power down the machine hosting this regionserver.
> The META is able to migrate to a new regionserver - however - it takes a really long time (~30 minutes)
> The regions on that regionserver DONOT reopen (I waited for 1 hour)
> The client is able to find the new location of the META, however, the META keeps redirecting the client to powered down
> regionserver as the location of the key-value it is trying to get. Thus the client's get is unsuccessful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira