You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Ted Yu (JIRA)" <ji...@apache.org> on 2011/07/12 18:12:00 UTC

[jira] [Created] (HBASE-4087) HConnectionManager should perform validation of connection it hands out

HConnectionManager should perform validation of connection it hands out
-----------------------------------------------------------------------

                 Key: HBASE-4087
                 URL: https://issues.apache.org/jira/browse/HBASE-4087
             Project: HBase
          Issue Type: Bug
            Reporter: Ted Yu
            Priority: Critical
             Fix For: 0.92.0


Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
There're at least two ways.
1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.

The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4087) HBaseAdmin should perform validation of connection it holds

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4087:
--------------------------

    Status: Patch Available  (was: Open)

> HBaseAdmin should perform validation of connection it holds
> -----------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087-v2.txt, 4087-v3.txt, 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4087) HBaseAdmin should perform validation of connection it holds

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065286#comment-13065286 ] 

Ted Yu commented on HBASE-4087:
-------------------------------

bq. 4. deleteStaleConnection destroys the good connection created in step 2, from underneath T1.
In patch versions 2 and 3, I have:
{code}
  public static void deleteStaleConnection(HConnection connection) {
{code}
When T2 tries to destroy old handle, nothing would happen because the loop in deleteConnection() wouldn't find the old connection.

My description about client B was not accurate. But we now agree that not waiting is a better solution.

> HBaseAdmin should perform validation of connection it holds
> -----------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087-v2.txt, 4087-v3.txt, 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4087) HBaseAdmin should perform validation of connection it holds

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4087:
--------------------------

    Summary: HBaseAdmin should perform validation of connection it holds  (was: HConnectionManager should perform validation of connection it hands out)

> HBaseAdmin should perform validation of connection it holds
> -----------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087-v2.txt, 4087-v3.txt, 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4087) HBaseAdmin should perform validation of connection it holds

Posted by "M. C. Srivas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065308#comment-13065308 ] 

M. C. Srivas commented on HBASE-4087:
-------------------------------------

Missed that, thanks for pointing it out. Looks like we don't need a connection#state either.

> HBaseAdmin should perform validation of connection it holds
> -----------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087-v2.txt, 4087-v3.txt, 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4087) HBaseAdmin should perform validation of connection it holds

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066322#comment-13066322 ] 

Ted Yu commented on HBASE-4087:
-------------------------------

I will upload a new patch with retry logic.

If I change the signature to throwing IOE, then I also have to change the signature for this method which constructs a new HBA:
{code}
  public static void checkHBaseAvailable(Configuration conf)
  throws MasterNotRunningException, ZooKeeperConnectionException {
{code}
I wanted to limit the scope of the patch since I only handle the exception in the ctor.

> HBaseAdmin should perform validation of connection it holds
> -----------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087-v2.txt, 4087-v3.txt, 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4087) HConnectionManager should perform validation of connection it hands out

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4087:
--------------------------

    Attachment: 4087.txt

> HConnectionManager should perform validation of connection it hands out
> -----------------------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4087) HConnectionManager should perform validation of connection it hands out

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4087:
--------------------------

    Attachment: 4087-v3.txt

Version 3 narrows the scope of change in HBA.

TestMasterRestartAfterDisablingTable passes based on this patch.

> HConnectionManager should perform validation of connection it hands out
> -----------------------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087-v2.txt, 4087-v3.txt, 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4087) HConnectionManager should perform validation of connection it hands out

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064688#comment-13064688 ] 

stack commented on HBASE-4087:
------------------------------

@Srivas Yes, we should bury it under the API if possible; add an isGood method to Connection Interface. When connection goes bad, it marks itself so and recourse is to go get a new one (This may not be possible).

@Ted so we need to change our APIs so that they all can catch IOE?  How then to tell difference between retryable IOE and one that makes the connection go 'bad'?  We have already the Retryable exception interface that we have some forms of IOE implement.

Is it true that the code previous to hbase-3777 had this same issue?

> HConnectionManager should perform validation of connection it hands out
> -----------------------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087-v2.txt, 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4087) HConnectionManager should perform validation of connection it hands out

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4087:
--------------------------

    Attachment:     (was: 4087.txt)

> HConnectionManager should perform validation of connection it hands out
> -----------------------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4087) HBaseAdmin should perform validation of connection it holds

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4087:
--------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

> HBaseAdmin should perform validation of connection it holds
> -----------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087-v2.txt, 4087-v3.txt, 4087-v4.txt, 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4087) HBaseAdmin should perform validation of connection it holds

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065310#comment-13065310 ] 

Ted Yu commented on HBASE-4087:
-------------------------------

If I get +1 vote(s) for version 3, I would check in that fix which would unblock HBASE-4052.

> HBaseAdmin should perform validation of connection it holds
> -----------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087-v2.txt, 4087-v3.txt, 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HBASE-4087) HBaseAdmin should perform validation of connection it holds

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu reassigned HBASE-4087:
-----------------------------

    Assignee: Ted Yu

> HBaseAdmin should perform validation of connection it holds
> -----------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087-v2.txt, 4087-v3.txt, 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4087) HBaseAdmin should perform validation of connection it holds

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066303#comment-13066303 ] 

stack commented on HBASE-4087:
------------------------------

This looks good.  What you think about the retry Srivas suggests?  And is this patch for 0.90 or for trunk?  If trunk, can we change the signature to be IOE so exception doesn't arrive at client as a UndeclaredThrowableException?

> HBaseAdmin should perform validation of connection it holds
> -----------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087-v2.txt, 4087-v3.txt, 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4087) HConnectionManager should perform validation of connection it hands out

Posted by "M. C. Srivas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063983#comment-13063983 ] 

M. C. Srivas commented on HBASE-4087:
-------------------------------------

I prefer option 2. When the HBaseClient fails to connect to the master, make it come back once more to the HConnectionManager (via a different API that also provides the bad handle), so the HConnectionManager can reset its cache. Periodic checks in a bg thread is not scalable (causes needless load on the server), and does not really work. For example, what happens if the server crashed right after the check? We would still need option 2 to say "invalidate now and get me a new handle" in the client in order to make immediate progress.


> HConnectionManager should perform validation of connection it hands out
> -----------------------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4087) HBaseAdmin should perform validation of connection it holds

Posted by "M. C. Srivas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066087#comment-13066087 ] 

M. C. Srivas commented on HBASE-4087:
-------------------------------------

+1 for the stuff about clearing the stale connection inside the HConnectionManager. Whether we should retry only once, or in a loop until we get a good connection ... I will defer to Stack.

> HBaseAdmin should perform validation of connection it holds
> -----------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087-v2.txt, 4087-v3.txt, 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4087) HBaseAdmin should perform validation of connection it holds

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066366#comment-13066366 ] 

Hudson commented on HBASE-4087:
-------------------------------

Integrated in HBase-TRUNK #2036 (See [https://builds.apache.org/job/HBase-TRUNK/2036/])
    HBASE-4087  HBaseAdmin should perform validation of connection it holds

tedyu : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* /hbase/trunk/CHANGES.txt


> HBaseAdmin should perform validation of connection it holds
> -----------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087-v2.txt, 4087-v3.txt, 4087-v4.txt, 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4087) HConnectionManager should perform validation of connection it hands out

Posted by "M. C. Srivas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064403#comment-13064403 ] 

M. C. Srivas commented on HBASE-4087:
-------------------------------------

@stack: yes, its ugly but necessary. If the multi-master failover needs to work, the hbase client must behave in a "nfs hard-mount" manner, ie, keep trying till a master comes online. Perhaps the retry can be buried under the API? The retry should fail if ZK itself is not reachable.

> HConnectionManager should perform validation of connection it hands out
> -----------------------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4087) HConnectionManager should perform validation of connection it hands out

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13063985#comment-13063985 ] 

stack commented on HBASE-4087:
------------------------------

-1 on option 1.  Why can't we just let the exception bubble up when master connection is bad.  I suppose this means that we need to invalidate the object that is in cache and renew it on exception -- but not all IOEs?

> HConnectionManager should perform validation of connection it hands out
> -----------------------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4087) HConnectionManager should perform validation of connection it hands out

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4087:
--------------------------

    Attachment: 4087.txt
                4087.txt

First attempt at solving this issue.
When connection.getMaster() throws UndeclaredThrowableException (corresponding to Connection refused error), we close the stale connection and reestablish new connection.
I have to declare HBaseAdmin.connection as not final.

TestMasterRestartAfterDisablingTable (from HBASE-4052) passes with this patch.

> HConnectionManager should perform validation of connection it hands out
> -----------------------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4087) HConnectionManager should perform validation of connection it hands out

Posted by "M. C. Srivas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064629#comment-13064629 ] 

M. C. Srivas commented on HBASE-4087:
-------------------------------------

Hi Ted,  the two versions seem very similar. I was expecting some code like the following in HConnectionManager#deleteConnection:

{noformat}
     synchronized (HBASE_INSTANCES) {
       HConnectionImplementation connection = HBASE_INSTANCES
           .get(connectionKey);
       if (connection != null) {
         connection.decCount();
+        HBASE_INSTANCES.notify();
         if (connection.isZeroReference() || staleConnection) {
+          while (! connection.isZeroReference()) {
+             HBASE_INSTANCES.wait();
+          }
           HBASE_INSTANCES.remove(connectionKey);
           connection.close(stopProxy);
         } else if (stopProxy) {

{noformat}

Is there a problem with the call to HBASE_INSTANCES.wait()? I mean, will it hang the system as the waiting thread may hold other synchronized-blocks while calling into HConnectionManager#deleteConnection? If so, my change above is bogus, and we will need some sort of  "connection#state" to keep track of the fact that the connection is bad, but there are still references to it.



> HConnectionManager should perform validation of connection it hands out
> -----------------------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087-v2.txt, 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4087) HBaseAdmin should perform validation of connection it holds

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4087:
--------------------------

    Attachment: 4087-v4.txt

Patch version 4 added retry logic.

TestMasterRestartAfterDisablingTable passed.

> HBaseAdmin should perform validation of connection it holds
> -----------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087-v2.txt, 4087-v3.txt, 4087-v4.txt, 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4087) HConnectionManager should perform validation of connection it hands out

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064364#comment-13064364 ] 

stack commented on HBASE-4087:
------------------------------

This issue is ugly.  This patch is incomplete no?  Any use of connection needs to catch this undeclared exception and be able to mark it bad with HConnectionManager.  Right?

> HConnectionManager should perform validation of connection it hands out
> -----------------------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4087) HBaseAdmin should perform validation of connection it holds

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066325#comment-13066325 ] 

stack commented on HBASE-4087:
------------------------------

I think v4 good for now.  Go commit Ted.

> HBaseAdmin should perform validation of connection it holds
> -----------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087-v2.txt, 4087-v3.txt, 4087-v4.txt, 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4087) HBaseAdmin should perform validation of connection it holds

Posted by "M. C. Srivas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065278#comment-13065278 ] 

M. C. Srivas commented on HBASE-4087:
-------------------------------------

@Ted:
{quote}
First, waiting for reference count to reach zero wasn't part of the original semantics of obtaining connection. Suppose there were two clients A and B. Client A is waiting in the new while loop for reference count to reach zero. What if client B had a bug and crashed ?
{quote}

If many threads share a connection, the reference count protects the connection from being destroyed until the last reference is dropped. We need to continue to preserve that guarantee. With the change being proposed, if any thread calls HConnectionManager#dropStaleConnection, the connection gets destroyed from underneath other threads. The sequence of steps that causes this is as follows:

1. threads T1 and T2 both get connections using the same conf, thus are returned the same connection.
2. T1 gets a failure, so it deletes the stale connection and creates a new one (thus HBASE_INSTANCES.get() will return the new good connection, since the HConnectionKey is identical).
3. T2 still has a handle on the old one, and gets a failure, and calls deleteStaleConnection.
4. deleteStaleConnection destroys the good connection created in step 2, from underneath T1.

So either we wait for the last ref to become zero, or set some sort of connection#state == INVALID if you'd rather not wait. I think the second choice (not waiting) seems better.

I didn't follow your second comment about client A and client B. Both are in the same JVM, so if one crashes, the other would too, would it not?


> HBaseAdmin should perform validation of connection it holds
> -----------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087-v2.txt, 4087-v3.txt, 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4087) HBaseAdmin should perform validation of connection it holds

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066334#comment-13066334 ] 

Ted Yu commented on HBASE-4087:
-------------------------------

Integrated to TRUNK.

Thanks for the review Srivas and Stack.

> HBaseAdmin should perform validation of connection it holds
> -----------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087-v2.txt, 4087-v3.txt, 4087-v4.txt, 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4087) HConnectionManager should perform validation of connection it hands out

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4087:
--------------------------

    Attachment: 4087-v2.txt

> HConnectionManager should perform validation of connection it hands out
> -----------------------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087-v2.txt, 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4087) HConnectionManager should perform validation of connection it hands out

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4087:
--------------------------

    Attachment:     (was: 4087-v2.txt)

> HConnectionManager should perform validation of connection it hands out
> -----------------------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087-v2.txt, 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4087) HConnectionManager should perform validation of connection it hands out

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064723#comment-13064723 ] 

Ted Yu commented on HBASE-4087:
-------------------------------

There is one enhancement I can make on top of patch version 2.
We can add the following method to HCM:
{code}
HConnection replaceStaleConnection(this.conf);
{code}
This method would retrieve a new connection and replace the old connection corresponding to the passed conf object with the new one.
Such optimization is minor, considering that we have to wrap every connection call in client in a similar manner to that in patch version 2.

> HConnectionManager should perform validation of connection it hands out
> -----------------------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087-v2.txt, 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4087) HConnectionManager should perform validation of connection it hands out

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4087:
--------------------------

    Attachment:     (was: 4087.txt)

> HConnectionManager should perform validation of connection it hands out
> -----------------------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-4087) HConnectionManager should perform validation of connection it hands out

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Yu updated HBASE-4087:
--------------------------

    Attachment: 4087-v2.txt

> HConnectionManager should perform validation of connection it hands out
> -----------------------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087-v2.txt, 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4087) HConnectionManager should perform validation of connection it hands out

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064652#comment-13064652 ] 

Ted Yu commented on HBASE-4087:
-------------------------------

First, waiting for reference count to reach zero wasn't part of the original semantics of obtaining connection.
Suppose there were two clients A and B. Client A is waiting in the new while loop for reference count to reach zero. What if client B had a bug and crashed ?

Patch version 2 directly removes the stale connection when the first client tried to connect to a dead server. Other clients wouldn't remove the new connection handed out to the first client. Since they share same config with the first client, they would retrieve the new connection directly.

Bookkeeping using connection#state would also work. But the above handling seems cleaner.

> HConnectionManager should perform validation of connection it hands out
> -----------------------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087-v2.txt, 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4087) HConnectionManager should perform validation of connection it hands out

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064461#comment-13064461 ] 

Ted Yu commented on HBASE-4087:
-------------------------------

Patch version 2 addresses Srivas' comment.
It is incomplete. I want to get confirmation that similar changes should be made to the files I mentioned in my email to dev@

The undeclaredThrowable field of UndeclaredThrowableException is an IOException, similar to the following:
{code}
java.io.IOException: Call to /192.168.2.107:60697 failed on local exception: java.io.IOException: Connection reset by peer
{code}

A test for this JIRA has been added in HBASE-4052: TestMasterRestartAfterDisablingTable. I used that test to obtain the above information.
Without fix for this JIRA, TestMasterRestartAfterDisablingTable wouldn't pass.

> HConnectionManager should perform validation of connection it hands out
> -----------------------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087-v2.txt, 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4087) HConnectionManager should perform validation of connection it hands out

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064362#comment-13064362 ] 

stack commented on HBASE-4087:
------------------------------

Why we getting UndeclaredThrowableException?  Whats being thrown that is not part of the interface?  Connection refused?  Can we add that?

Any chance of a test Ted?  A unit test where we do some function on master with HBA and then we kill and start new master?  See if HBA keeps going?



> HConnectionManager should perform validation of connection it hands out
> -----------------------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4087) HConnectionManager should perform validation of connection it hands out

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065000#comment-13065000 ] 

Ted Yu commented on HBASE-4087:
-------------------------------

In TRUNK build 2027, TestMasterFailover passed.
This means we don't need to perform full-scale surgery.

Keeping the changes in HCM along with HBA constructor should be enough to unblock test in HBASE-4052.

> HConnectionManager should perform validation of connection it hands out
> -----------------------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087-v2.txt, 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4087) HConnectionManager should perform validation of connection it hands out

Posted by "M. C. Srivas (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064346#comment-13064346 ] 

M. C. Srivas commented on HBASE-4087:
-------------------------------------

There seems to be a race in HConnectionManager#deleteConnection

   if (connection.isZeroReference() || staleConnection) { delete the connection ... } 

When staleConnection is true, the code should wait for reference to become zero, else the connection gets deleted from underneath another thread.

> HConnectionManager should perform validation of connection it hands out
> -----------------------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-4087) HConnectionManager should perform validation of connection it hands out

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13064705#comment-13064705 ] 

Ted Yu commented on HBASE-4087:
-------------------------------

In case of server outage/restart, declared exceptions wouldn't be thrown:
{code}
  public HBaseAdmin(Configuration c)
  throws MasterNotRunningException, ZooKeeperConnectionException {
{code}
I need to dig deeper into current failure in TestMasterFailover to answer the last question above.

I could use the following structure in place of current:
{code}
  Foo result = new RetryOnce<Foo>() {
    Foo doIt(HConnection connection) {
      return connection.foo();
    }
  }.run();

  abstract class RetryOnce<T> {
    T run() {
      try {
        return doIt(connection);
      }
      catch (UndeclaredThrowableException ute) {
        HConnectionManager.deleteStaleConnection(connection);
        connection = HConnectionManager.getConnection(conf);
        return doIt(connection);
      }
    }
    abstract T doIt(HConnection connection);
  }
{code}
I believe Karthick proposed similar construct when he was implementing HBASE-3777 but the response was lukewarm. Further only two lines were saved in the above construct.

The challenge here is that we should expect client to cache the connection handed out to them, making burying it under the API very hard to achieve.


> HConnectionManager should perform validation of connection it hands out
> -----------------------------------------------------------------------
>
>                 Key: HBASE-4087
>                 URL: https://issues.apache.org/jira/browse/HBASE-4087
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 4087-v2.txt, 4087.txt
>
>
> Through HBASE-3777, HConnectionManager reuses the connection to HBase servers.
> One challenge, discovered in troubleshooting HBASE-4052, is how we invalidate connection(s) to server which gets restarted.
> There're at least two ways.
> 1. HConnectionManager utilizes background thread(s) to periodically perform validation of connections in HBASE_INSTANCES and remove stale connection(s).
> 2. Allow HBaseClient (including HBaseAdmin) to provide feedback to HConnectionManager.
> The solution can be a combination of both of the above.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira