You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Jieshan Bean (Created) (JIRA)" <ji...@apache.org> on 2012/01/09 09:52:39 UTC

[jira] [Created] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

HConnection re-creation in HTable after HConnection abort
---------------------------------------------------------

                 Key: HBASE-5153
                 URL: https://issues.apache.org/jira/browse/HBASE-5153
             Project: HBase
          Issue Type: Bug
          Components: client
    Affects Versions: 0.90.4
            Reporter: Jieshan Bean
            Assignee: Jieshan Bean
             Fix For: 0.90.6


HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.

It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.

Actually, there's two aproach to solve this:
1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
2. In HBase Client side, catch this exception, and re-create connection.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Jieshan Bean (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185601#comment-13185601 ] 

Jieshan Bean commented on HBASE-5153:
-------------------------------------

Thanks, Ted.

Without HBASE-3065, 0.90 doesn't handle the ConnectionLossException correctly. Consider the below case:
1. Somewhere trigger a HConnection#abort. 
2. Suppose the check of "if (t instanceof KeeperException.SessionExpiredException)" is true. Then called the resetZooKeeperTrackers().
3. A ConnectionLossException occur during ZookeeperNodeTracker#start. then trigger a new HConnection#abort. At this scenario, the previous abort may print a log of 
"Reconnected successfully. This disconnect could have been caused by a network partition or a long-running GC pause......"
4. The new abort carry a Throwable with a type which is not KeeperException.SessionExpiredException. so this time abort directly.

It seems a recursion here.
 
Either re-use the old connection by resetZooKeeperTrackers, or re-create the connection, the ZookeeperWatcher will be a new one. So  I still think the patch for 0.90 is reasonable.

Trunk patch will be made big changes.

So any other good suggestions? Thanks.

                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Jieshan Bean (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194426#comment-13194426 ] 

Jieshan Bean commented on HBASE-5153:
-------------------------------------

Thanks, Ted...I will take a look at that stack trace.
 
If a ZooKeeperConnectionException thrown by the below code:
{noformat}
       try {
          if (setupZookeeperTrackers(isLastTime)) {
            break;
          }
        } catch (ZooKeeperConnectionException zkce) {
          if (isLastTime) {
            throw zkce;
          }
        }
{noformat}

If will be catched in abort method, then calling LOG.fatal(msg, t);

No problem here. Don't know whether I get you correctly:(.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Jieshan Bean (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jieshan Bean updated HBASE-5153:
--------------------------------

    Attachment: HBASE-5153-trunk-v2.patch
                HBASE-5153-V4-90.patch

I updated the patches. 
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Jieshan Bean (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jieshan Bean updated HBASE-5153:
--------------------------------

    Attachment:     (was: HBASE-5153.patch)
    
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Jieshan Bean (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jieshan Bean updated HBASE-5153:
--------------------------------

    Attachment: HBASE-5153-V6-90.txt

Changes:
1. Add a method "unregisterListener" in ZookeeperWatcher. This will be called in ZookeeperNodeTracker's stop method.
2. Stop the trackers if start failed during the retries.

Thank you , Ted.

@Lars, I will make the patches for trunk and 92, patch for 90 seems more urgent:)..Thank you.  
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194801#comment-13194801 ] 

Zhihong Yu commented on HBASE-5153:
-----------------------------------

+1 on addendum for 0.90
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, HBase-5153-90-addendum.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184092#comment-13184092 ] 

Zhihong Yu commented on HBASE-5153:
-----------------------------------

@Jieshan:
Can you prepare a patch for trunk ?
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195445#comment-13195445 ] 

ramkrishna.s.vasudevan commented on HBASE-5153:
-----------------------------------------------

@Lars
I understand your point.  But as this JIRA is now committed in 0.90 so this addendum is needed.
I think we can ensure such things doesn't happen for other JIRAs.
Good on you Lars.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-92.txt, 5153-trunk-minimal.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, HBASE-5153_addendum_0.90_1.patch, HBase-5153-90-addendum.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Jieshan Bean (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jieshan Bean updated HBASE-5153:
--------------------------------

    Attachment: HBase-5153-90-addendum.patch

This is the addendum for 90.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, HBase-5153-90-addendum.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5153:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510806/5153-92.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 15 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/787//console

This message is automatically generated.)
    
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.92.1, 0.90.6
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Lars Hofhansl (Issue Comment Edited) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186271#comment-13186271 ] 

Lars Hofhansl edited comment on HBASE-5153 at 1/14/12 5:33 PM:
---------------------------------------------------------------

Thanks Jieshan. This patch is great.
+1 after Ted's comments are fixed.

Are you planning to make a 0.92 and trunk patch as well?
(After Stacks changes in trunk that might be not completely trivial).
                
      was (Author: lhofhansl):
    Thanks Jieshan. This patch is great.
+1 after Ted's comments are fixed.

Are you planning to make a 0.92 and trunk patch as well?
(Aft
                  
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Jieshan Bean (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jieshan Bean updated HBASE-5153:
--------------------------------

    Attachment: TestResults-hbase5153.out

Ran the tests again, got the same results: 5 tests failed due to the hostName problem. Please find the results from the attachment "TestResults-hbase5153.out".
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187684#comment-13187684 ] 

Hudson commented on HBASE-5153:
-------------------------------

Integrated in HBase-0.92-security #81 (See [https://builds.apache.org/job/HBase-0.92-security/81/])
    HBASE-5153  Add retry logic in HConnectionImplementation#resetZooKeeperTrackers (Jieshan)

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/ClosedConnectionException.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/ConnectionUtils.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithAbort.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithRemove.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestMasterAddressManager.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperNodeTracker.java

                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.92.1, 0.90.7
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182467#comment-13182467 ] 

Zhihong Yu commented on HBASE-5153:
-----------------------------------

Can you add a unit test ?
Testing this in cluster is also desirable. 
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195444#comment-13195444 ] 

Lars Hofhansl commented on HBASE-5153:
--------------------------------------

@Ram: Sure. I now feel the whole logic is too complicated, but we'll not fix that any time soon. So +1 on addendum.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-92.txt, 5153-trunk-minimal.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, HBASE-5153_addendum_0.90_1.patch, HBase-5153-90-addendum.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195241#comment-13195241 ] 

Lars Hofhansl commented on HBASE-5153:
--------------------------------------

That is just guard against closed = true in HConnectionImplementation.abort().
abort probably not set closed to true, but then a bunch of other places would need to be changed too.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-92.txt, 5153-trunk-minimal.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, HBase-5153-90-addendum.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187495#comment-13187495 ] 

Hudson commented on HBASE-5153:
-------------------------------

Integrated in HBase-TRUNK-security #79 (See [https://builds.apache.org/job/HBase-TRUNK-security/79/])
    HBASE-5153  Add retry logic in HConnectionImplementation#resetZooKeeperTrackers (Jieshan)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ClosedConnectionException.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MasterSchemaChangeTracker.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/SchemaChangeTracker.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithAbort.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithRemove.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMasterAddressManager.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperNodeTracker.java

                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.92.1, 0.90.7
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185456#comment-13185456 ] 

Lars Hofhansl commented on HBASE-5153:
--------------------------------------

Is the only time when this happens a loss of the ZK connection?
All that a new HConnectionImplementation is doing is a call to setupZookeeperTrackers.
HConnectionImplementation.abort already does this. Why would that succeed in the new Connection? Is it because we retry?

Maybe HConnectionImplementation.resetZooKeeperTrackers should have the retry logic.

-1 on current patches.

                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187367#comment-13187367 ] 

Hadoop QA commented on HBASE-5153:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510779/5153-trunk.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 18 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -144 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 83 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.replication.TestReplication
                  org.apache.hadoop.hbase.mapreduce.TestImportTsv
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/783//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/783//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/783//console

This message is automatically generated.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.92.1, 0.90.6
>
>         Attachments: 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "ramkrishna.s.vasudevan (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-5153:
------------------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 0.90.7)
                   0.90.6
           Status: Resolved  (was: Patch Available)

This got committed to 0.90.6.
Sorry for not resolving it at that time.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-92.txt, 5153-trunk-minimal.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, HBASE-5153_addendum_0.90_1.patch, HBase-5153-90-addendum.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194805#comment-13194805 ] 

ramkrishna.s.vasudevan commented on HBASE-5153:
-----------------------------------------------

@Ted
Thanks for the review.
@Lars
Can you review the 0.90 patch and provide your comments.
If it is ok i can integrate to 0.90 and take another RC. :)

                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, HBase-5153-90-addendum.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5153:
------------------------------

    Fix Version/s: 0.92.1
                   0.94.0
    
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.92.1, 0.90.6
>
>         Attachments: 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186172#comment-13186172 ] 

Hadoop QA commented on HBASE-5153:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510573/HBASE-5153-V5-90.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/759//console

This message is automatically generated.
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187585#comment-13187585 ] 

Hudson commented on HBASE-5153:
-------------------------------

Integrated in HBase-0.92 #246 (See [https://builds.apache.org/job/HBase-0.92/246/])
    HBASE-5153  Add retry logic in HConnectionImplementation#resetZooKeeperTrackers (Jieshan)

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/ClosedConnectionException.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/ConnectionUtils.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithAbort.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithRemove.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestMasterAddressManager.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperNodeTracker.java

                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.92.1, 0.90.7
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Jieshan Bean (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jieshan Bean updated HBASE-5153:
--------------------------------

    Attachment:     (was: HBASE-5153-V4-90.patch)
    
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Jieshan Bean (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184705#comment-13184705 ] 

Jieshan Bean commented on HBASE-5153:
-------------------------------------

@Ted, the patch for TRUNK seems very different, and i still need some time to check it. hope i can provide today:)

@Stack, I think ConnectionUtils is reasonable. I can add it:). I will update the patch.

Thank you all.
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194414#comment-13194414 ] 

Zhihong Yu commented on HBASE-5153:
-----------------------------------

See the stack trace I pasted here:
https://issues.apache.org/jira/browse/HBASE-5153?focusedCommentId=13187774&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13187774

bq. If this exception happened for long time, Zookeeper must has some problem.
We should be prepared when the above indeed happens. I am sure this scenario is possible.

See also this part of the code:
{code}
+        } catch (ZooKeeperConnectionException zkce) {
+          if (isLastTime) {
+            throw zkce;
+          }
+        }
{code}

                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194357#comment-13194357 ] 

Lars Hofhansl commented on HBASE-5153:
--------------------------------------

So here's the problem. This is hanging while validating that HBase is not running via HBaseAdmin.checkHBaseAvailable, which just attempts to create a new HBaseAdmin after it sets hbase.client.retries.number to 1. However HConnectionImpl caches hbase.client.retries.number in numRetries, and hence if ZK is not running resetZooKeeperTrackersWithRetries will retry for a while.
The simplest fix would be for resetZooKeeperTrackersWithRetries to ignore he cached setting and to retrieve the value again from the setting. While I am at it, I'll also add another option to a different number of retries here.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184211#comment-13184211 ] 

stack commented on HBASE-5153:
------------------------------

Patch looks good.  I like your addition of a specific Exception for closed state.

Does this have to be public Jieshan?

{code}
getRegionServerWithRetries
{code}

Same for processBatch and getRegionLocation.

If public should be in HTableInterface but they seem implementation methods rather than something that should be part of public interface.

A style nit -- i.e. not important but if you are going to redo the patch you miight want to address it -- is that you do this in handleConnectionClosedException....

{code}
+    if (ioe instanceof ConnectionClosedException) {
{code}

and the whole method is dealing with the case where above is true.  I'd suggest that you might do:

{code}
if (!(ioe instanceof ConnectionClosedException)) return;
{code}

... then you save a whole indent and its clear that the method is all about dealing with ConnectionClosedException.

Is it right including this in HTable?

{code}
getPauseTime
{code}

In trunk that is in a new ConnectionUtils class.  Maybe you have to do it for 0.90?

I'm wondering if the class ConnectionClosedException needs to be public also?  Its only used in this package, right?



                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5153:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12512100/HBase-5153-90-addendum.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/858//console

This message is automatically generated.)
    
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, HBase-5153-90-addendum.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5153:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510179/HBASE-5153-V3.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/727//console

This message is automatically generated.)
    
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Jieshan Bean (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194519#comment-13194519 ] 

Jieshan Bean commented on HBASE-5153:
-------------------------------------

@Lars:
During the retry, if we get any exceptions, Zookeeper and the Trackers also need to close.
{noformat}
+          try {
+            setupZookeeperTrackers();
+            break;
+          } catch (ZooKeeperConnectionException zkce) {
+            if (tries >= this.numRetries) {
+              throw zkce;
+            }
+          }
{noformat}
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194904#comment-13194904 ] 

Lars Hofhansl commented on HBASE-5153:
--------------------------------------

+1 on addendum. Technically we should change ZooKeeperNodeTracker.start back to be void, but that is not necessary.
I'll update the trunk patch soon.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, HBase-5153-90-addendum.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194437#comment-13194437 ] 

Lars Hofhansl commented on HBASE-5153:
--------------------------------------

There was also some weird stuff in HBaseAdmin.checkHBaseAvailable. It did set the client retry to 1, but left the the retry count on in RecoverableZookeeper, which leads to long, unnecessary waits.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Jieshan Bean (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jieshan Bean updated HBASE-5153:
--------------------------------

    Attachment: HBASE-5153-V5-90.patch

This is the new patch for 90 according to Lars' suggestion. Tell me if anything still need to change.

Thank you Ted, Lars & Ram.
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187277#comment-13187277 ] 

Lars Hofhansl commented on HBASE-5153:
--------------------------------------

+1 on latest patch
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5153:
------------------------------

    Fix Version/s:     (was: 0.92.1)
                       (was: 0.94.0)

Dropping Fix versions as Lars suggested.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, HBase-5153-90-addendum.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185455#comment-13185455 ] 

Lars Hofhansl commented on HBASE-5153:
--------------------------------------

The trunk and 0.92 patch won't work with HBASE-4805.
I would rather see a way to be able to reuse the connection rather than getting a new one as this defeats the whole idea of HBASE-4805.

                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187845#comment-13187845 ] 

Lars Hofhansl commented on HBASE-5153:
--------------------------------------

Was that repeatable? Might be a problem with the test (relying on the fact that an HConnection would just give up after one try).
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.92.1, 0.90.6
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194189#comment-13194189 ] 

Lars Hofhansl commented on HBASE-5153:
--------------------------------------

Also it occurred to me that another nice change would be to be able specify the retry count for resetZooKeeperTrackersWithRetries different from the other operations. 
The thinking is this:
While the ZK is not reachable the HConnection (and any other HConnection) is essentially not usable. In some settings it might be good to have the connection just sit there, and retry until the connection is bad. Maybe for another jira.

Where are we with this generally?
Is it just TestMergeTool hanging? If so I'll have a look at it today.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187848#comment-13187848 ] 

Zhihong Yu commented on HBASE-5153:
-----------------------------------

TestMergeTool hung in several builds.
That was why I ran it on MacBook.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.92.1, 0.90.6
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5153:
------------------------------

    Hadoop Flags: Reviewed
    
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Jieshan Bean (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jieshan Bean updated HBASE-5153:
--------------------------------

    Attachment: HBASE-5153.patch

Thank you, Rama. I update the patch:)
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194193#comment-13194193 ] 

Zhihong Yu commented on HBASE-5153:
-----------------------------------

There were two failed tests:
https://builds.apache.org/job/HBase-0.92-security/81/

If you can resolve the hanging TestMergeTool, that would be great.

I am on-call this week, FYI
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182433#comment-13182433 ] 

ramkrishna.s.vasudevan commented on HBASE-5153:
-----------------------------------------------

Your idea is correct.
Few comments
{code}
Copyright 2010 The Apache Software Foundation
{code}
This can be removed.
Also add javadoc for the new class added.
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185453#comment-13185453 ] 

Zhihong Yu commented on HBASE-5153:
-----------------------------------

+1 on latest patches.
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185040#comment-13185040 ] 

Hadoop QA commented on HBASE-5153:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510383/HBASE-5153-V4-90.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/747//console

This message is automatically generated.
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Jieshan Bean (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jieshan Bean updated HBASE-5153:
--------------------------------

    Attachment: HBASE-5153-V3.patch
    
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187347#comment-13187347 ] 

Lars Hofhansl commented on HBASE-5153:
--------------------------------------

+1 on trunk patch. 0.92 is similar?
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.92.1, 0.90.6
>
>         Attachments: 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5153:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12509917/HBASE-5153-V2.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/706//console

This message is automatically generated.)
    
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5153-V2.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186719#comment-13186719 ] 

Zhihong Yu commented on HBASE-5153:
-----------------------------------

Minor comment:
In setupZookeeperTrackers(), after calling stop(), the masterAddressTracker / rootRegionTracker should be set to null.

Please run patch v6 over test suite.

Thanks
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185716#comment-13185716 ] 

ramkrishna.s.vasudevan commented on HBASE-5153:
-----------------------------------------------

No problem Lars.  Its good to get the best suggestions and ideas. :)
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195240#comment-13195240 ] 

Hadoop QA commented on HBASE-5153:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12512248/5153-trunk-minimal.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 javadoc.  The javadoc tool appears to have generated -140 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.client.TestAdmin
                  org.apache.hadoop.hbase.io.hfile.TestHFileBlock
                  org.apache.hadoop.hbase.mapreduce.TestImportTsv
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/861//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/861//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/861//console

This message is automatically generated.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-92.txt, 5153-trunk-minimal.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, HBase-5153-90-addendum.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5153:
---------------------------------

    Attachment: 5153-trunk-v2.txt

Please let me know what you think of 5153-trunk-v2.txt... Thanks
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Jieshan Bean (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jieshan Bean updated HBASE-5153:
--------------------------------

    Summary: Add retry logic in HConnectionImplementation#resetZooKeeperTrackers  (was: HConnection re-creation in HTable after HConnection abort)
    
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Krystian Nowak (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194176#comment-13194176 ] 

Krystian Nowak commented on HBASE-5153:
---------------------------------------

FYI: [^HBASE-5153-V6-90-minorchange.patch] also fixes HBASE-5289 (linked)
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185619#comment-13185619 ] 

Zhihong Yu commented on HBASE-5153:
-----------------------------------

I think we can solve HBASE-5084 first by not bundling the HConnection and ExecutorService.
After that, this JIRA's patch for TRUNK can be modified to take effect when HConnection is managed internally.
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185135#comment-13185135 ] 

Hadoop QA commented on HBASE-5153:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510397/5153-trunk.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -147 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 80 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.replication.TestReplicationPeer
                  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestImportTsv

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/748//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/748//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/748//console

This message is automatically generated.
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187810#comment-13187810 ] 

Zhihong Yu commented on HBASE-5153:
-----------------------------------

Reverted from 0.92 and TRUNK due to failed Jenkins builds.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.92.1, 0.90.7
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5153:
------------------------------

    Status: Patch Available  (was: Open)
    
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187439#comment-13187439 ] 

Hadoop QA commented on HBASE-5153:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510806/5153-92.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 15 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/787//console

This message is automatically generated.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.92.1, 0.90.6
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Jieshan Bean (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194451#comment-13194451 ] 

Jieshan Bean commented on HBASE-5153:
-------------------------------------

@Lars:
Adding a "isResettingZKTrackers" sounds good to me. One doubt: Is it necessary to add the keyword of "volatile"?
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Jieshan Bean (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194406#comment-13194406 ] 

Jieshan Bean commented on HBASE-5153:
-------------------------------------

It should not leading to an endless loop. Unless, each retry will get a ZookeeperLossException. If this exception happened for long time, Zookeeper must has some problem. so when create a new Zookeeper instance, it already thrown a Exception. So it won't be an endless loop:
{noformat}
if ((t instanceof KeeperException.SessionExpiredException)
          || (t instanceof KeeperException.ConnectionLossException)) {
        try {
          LOG.info("This client just lost it's session with ZooKeeper, trying" +
              " to reconnect.");
          resetZooKeeperTrackersWithRetries();
          LOG.info("Reconnected successfully. This disconnect could have been" +
              " caused by a network partition or a long-running GC pause," +
              " either way it's recommended that you verify your environment.");
          return;
        } catch (ZooKeeperConnectionException e) {
          LOG.error("Could not reconnect to ZooKeeper after session" +
              " expiration, aborting");
          t = e;
        }
      }
      if (t != null) LOG.fatal(msg, t);
      else LOG.fatal(msg);
      HConnectionManager.deleteStaleConnection(this);
{noformat}
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194347#comment-13194347 ] 

Lars Hofhansl commented on HBASE-5153:
--------------------------------------

So is this change in 0.90 now? I'm confused. Should revert it from there too, I guess.
I will see what's up with TestMergeTool in trunk now.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Jieshan Bean (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jieshan Bean updated HBASE-5153:
--------------------------------

    Attachment:     (was: HBASE-5153-V2.patch)
    
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188384#comment-13188384 ] 

Hudson commented on HBASE-5153:
-------------------------------

Integrated in HBase-TRUNK-security #81 (See [https://builds.apache.org/job/HBase-TRUNK-security/81/])
    HBASE-5153 revert due to failed Jenkins builds

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ClosedConnectionException.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MasterSchemaChangeTracker.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/SchemaChangeTracker.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithAbort.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithRemove.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMasterAddressManager.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperNodeTracker.java

                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.92.1, 0.90.6
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186238#comment-13186238 ] 

Zhihong Yu commented on HBASE-5153:
-----------------------------------

For patch v5, setupZookeeperTrackers():
{code}
-      this.rootRegionTracker.start();
+      return this.rootRegionTracker.start(allowAbort)
+          && this.masterAddressTracker.start(allowAbort);
{code}
If rootRegionTracker.start() returns false, masterAddressTracker.start() would be skipped.
Since setupZookeeperTrackers() is called by HConnectionImplementation ctor, I suggest making setupZookeeperTrackers() reentrant (checking masterAddressTracker and rootRegionTracker against null) and calling it in HConnectionImplementation ctor.
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187774#comment-13187774 ] 

Zhihong Yu commented on HBASE-5153:
-----------------------------------

When I ran TestMergeTool on TRUNK, I got:
{code}
"main" prio=5 tid=102801000 nid=0x100601000 waiting on condition [1005fb000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at java.lang.Thread.sleep(Thread.java:302)
	at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:328)
	at org.apache.hadoop.hbase.util.RetryCounter.sleepUntilNextRetry(RetryCounter.java:55)
	at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:171)
	at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:230)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:80)
	- locked <784a83930> (a org.apache.hadoop.hbase.MasterAddressTracker)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:590)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackersWithRetries(HConnectionManager.java:625)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1711)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:93)
	- locked <7854b9898> (a org.apache.hadoop.hbase.MasterAddressTracker)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:590)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackersWithRetries(HConnectionManager.java:625)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1711)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:93)
	- locked <7854b98e0> (a org.apache.hadoop.hbase.MasterAddressTracker)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:590)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackersWithRetries(HConnectionManager.java:625)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1711)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:93)
	- locked <7854b9928> (a org.apache.hadoop.hbase.MasterAddressTracker)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:590)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackersWithRetries(HConnectionManager.java:625)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1711)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:93)
	- locked <78f991ad0> (a org.apache.hadoop.hbase.MasterAddressTracker)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:590)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackersWithRetries(HConnectionManager.java:625)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1711)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:93)
	- locked <790b9f540> (a org.apache.hadoop.hbase.MasterAddressTracker)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:590)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackersWithRetries(HConnectionManager.java:625)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1711)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:93)
	- locked <790b9f560> (a org.apache.hadoop.hbase.MasterAddressTracker)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:590)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackersWithRetries(HConnectionManager.java:625)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1711)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:93)
	- locked <790b9f580> (a org.apache.hadoop.hbase.MasterAddressTracker)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:590)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackersWithRetries(HConnectionManager.java:625)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1711)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:93)
	- locked <790b9f5a0> (a org.apache.hadoop.hbase.MasterAddressTracker)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:590)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackersWithRetries(HConnectionManager.java:625)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1711)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:93)
	- locked <78f98ae20> (a org.apache.hadoop.hbase.MasterAddressTracker)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:590)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackersWithRetries(HConnectionManager.java:625)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1711)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:93)
	- locked <78f98ae40> (a org.apache.hadoop.hbase.MasterAddressTracker)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:590)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackersWithRetries(HConnectionManager.java:625)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1711)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:93)
	- locked <78f98ae60> (a org.apache.hadoop.hbase.MasterAddressTracker)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:590)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackersWithRetries(HConnectionManager.java:625)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1711)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:93)
	- locked <78e8b6ac8> (a org.apache.hadoop.hbase.MasterAddressTracker)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:590)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackersWithRetries(HConnectionManager.java:625)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1711)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:93)
	- locked <78e8b6ae8> (a org.apache.hadoop.hbase.MasterAddressTracker)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:590)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackersWithRetries(HConnectionManager.java:625)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1711)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:93)
	- locked <78e8b6b08> (a org.apache.hadoop.hbase.MasterAddressTracker)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:590)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackersWithRetries(HConnectionManager.java:625)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1711)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:93)
	- locked <790d0e640> (a org.apache.hadoop.hbase.MasterAddressTracker)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:590)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackersWithRetries(HConnectionManager.java:625)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1711)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:93)
	- locked <790d0e660> (a org.apache.hadoop.hbase.MasterAddressTracker)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:590)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackersWithRetries(HConnectionManager.java:625)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1711)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:93)
	- locked <790d0e600> (a org.apache.hadoop.hbase.MasterAddressTracker)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:590)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackersWithRetries(HConnectionManager.java:625)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1711)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:93)
	- locked <790d0e620> (a org.apache.hadoop.hbase.MasterAddressTracker)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:590)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackersWithRetries(HConnectionManager.java:625)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1711)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:93)
	- locked <790d0e5c0> (a org.apache.hadoop.hbase.MasterAddressTracker)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:590)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackersWithRetries(HConnectionManager.java:625)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1711)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:93)
	- locked <790d0e5e0> (a org.apache.hadoop.hbase.MasterAddressTracker)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:590)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackersWithRetries(HConnectionManager.java:625)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1711)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:93)
	- locked <790d0e580> (a org.apache.hadoop.hbase.MasterAddressTracker)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:590)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackersWithRetries(HConnectionManager.java:625)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1711)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:93)
	- locked <790d0e5a0> (a org.apache.hadoop.hbase.MasterAddressTracker)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:590)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackersWithRetries(HConnectionManager.java:625)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1711)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:93)
	- locked <790d0e540> (a org.apache.hadoop.hbase.MasterAddressTracker)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:590)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackersWithRetries(HConnectionManager.java:625)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1711)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:93)
	- locked <790d0e560> (a org.apache.hadoop.hbase.MasterAddressTracker)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:590)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.resetZooKeeperTrackersWithRetries(HConnectionManager.java:625)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.abort(HConnectionManager.java:1711)
	at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:93)
	- locked <790d0e500> (a org.apache.hadoop.hbase.MasterAddressTracker)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:590)
	- locked <790b70180> (a org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.<init>(HConnectionManager.java:577)
	at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:184)
	- locked <790c46258> (a org.apache.hadoop.hbase.client.HConnectionManager$1)
	at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:98)
	at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:1570)
	at org.apache.hadoop.hbase.util.Merge.run(Merge.java:94)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.hadoop.hbase.util.TestMergeTool.mergeAndVerify(TestMergeTool.java:190)
	at org.apache.hadoop.hbase.util.TestMergeTool.testMergeTool(TestMergeTool.java:268)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at junit.framework.TestCase.runTest(TestCase.java:168)
	at junit.framework.TestCase.runBare(TestCase.java:134)
	at junit.framework.TestResult$1.protect(TestResult.java:110)
	at junit.framework.TestResult.runProtected(TestResult.java:128)
	at junit.framework.TestResult.run(TestResult.java:113)
	at junit.framework.TestCase.run(TestCase.java:124)
	at junit.framework.TestSuite.runTest(TestSuite.java:243)
	at junit.framework.TestSuite.run(TestSuite.java:238)
	at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
	at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123)
	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164)
	at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110)
	at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175)
	at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:81)
	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68)
{code}
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.92.1, 0.90.7
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5153:
------------------------------

    Attachment: 5153-92.txt

Patch for 0.92 branch
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.92.1, 0.90.6
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195188#comment-13195188 ] 

Lars Hofhansl commented on HBASE-5153:
--------------------------------------

Noticed one flaw in the 0.90 addendum patch. Should be
{code}
} catch (ZooKeeperConnectionException zkce) {
  if (tries+1 >= this.numRetries) {
    throw zkce;
  }
}
{code}
(note the +1 in there)
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-92.txt, 5153-trunk-minimal.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, HBase-5153-90-addendum.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Jieshan Bean (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186721#comment-13186721 ] 

Jieshan Bean commented on HBASE-5153:
-------------------------------------

The test is still running. Before I get the results, I want your comments:).
Thank you.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182481#comment-13182481 ] 

Hadoop QA commented on HBASE-5153:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12509890/HBASE-5153.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/704//console

This message is automatically generated.
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5153:
------------------------------

    Fix Version/s:     (was: 0.90.6)
                   0.90.7
    
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.92.1, 0.90.7
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184033#comment-13184033 ] 

Hadoop QA commented on HBASE-5153:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510179/HBASE-5153-V3.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/727//console

This message is automatically generated.
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182538#comment-13182538 ] 

Hadoop QA commented on HBASE-5153:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12509917/HBASE-5153-V2.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/705//console

This message is automatically generated.
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5153-V2.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "stack (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-5153:
-------------------------

    Fix Version/s:     (was: 0.90.6)
                   0.90.7
    
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.7
>
>         Attachments: 5153-92.txt, 5153-trunk-minimal.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, HBASE-5153_addendum_0.90_1.patch, HBase-5153-90-addendum.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Jieshan Bean (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194412#comment-13194412 ] 

Jieshan Bean commented on HBASE-5153:
-------------------------------------

As discussed with Ted. Trunk and 92 already including a retry logic in RecoverableZooKeeper. So that makes the retry logic in resetZooKeeperTrackersWithRetries less important.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194913#comment-13194913 ] 

Lars Hofhansl commented on HBASE-5153:
--------------------------------------

Actually then the addendum is not right either. If ZooKeeperNodeTracker.start gets an exception it'll call abort (and hence not return false)
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, HBase-5153-90-addendum.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Jieshan Bean (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jieshan Bean updated HBASE-5153:
--------------------------------

    Attachment: HBASE-5153-V2.patch
    
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5153-V2.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185709#comment-13185709 ] 

Lars Hofhansl commented on HBASE-5153:
--------------------------------------

Thanks Ram (and Jieshan and Ted).
In my previous comments I did not mean to be abrasive. Thanks for finding this problem and working on a patch for it (I completely missed this when I worked on HBASE-4805).

                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186241#comment-13186241 ] 

Zhihong Yu commented on HBASE-5153:
-----------------------------------

{code}
+   * Close the original connection and create a new one.
+   * @throws ZooKeeperConnectionException  if unable to connect to zookeeper
+   */
+  public void resetConnection() throws ZooKeeperConnectionException;
{code}
The first line should read 'Closes the original connection and creates a new one.'
Since resetZooKeeperTrackersWithRetries() is called underneath, I think naming this new method resetZooKeeperTrackersWithRetries() seems better.
{code}
+/**
+ * Thrown when HConnection has been closed.
+ */
+public class InvalidConnectionException extends IOException {
{code}
Going over the patch, the new exception is indeed used to indicate closed connection. I think naming it ClosedConnectionException is better.
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Jieshan Bean (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194541#comment-13194541 ] 

Jieshan Bean commented on HBASE-5153:
-------------------------------------

@Lars:
One more doubt basing on the previous comment, regading on the below code:
{noformat}
+          try {
+            setupZookeeperTrackers();
+            break;
+          } catch (ZooKeeperConnectionException zkce) {
+            if (tries >= this.numRetries) {
+              throw zkce;
+            }
+          }
{noformat}
if ZookeeperNodeTracker#start get an exception, then catches and calls abort(Doesn't throw it). The above retries will be break under this situation. So this retries takes no effects.Correct me if am wrong. 
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188350#comment-13188350 ] 

Hudson commented on HBASE-5153:
-------------------------------

Integrated in HBase-0.92-security #83 (See [https://builds.apache.org/job/HBase-0.92-security/83/])
    HBASE-5153 revert due to failed Jenkins builds

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/ClosedConnectionException.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/ConnectionUtils.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithAbort.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithRemove.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestMasterAddressManager.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperNodeTracker.java

                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.92.1, 0.90.6
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194453#comment-13194453 ] 

Lars Hofhansl commented on HBASE-5153:
--------------------------------------

@Jieshan: Hmm... I do see the endless loop in the debugger. It happens from HBaseAdmin.checkHBaseAvailable when HBase is actually down. We get into resetZooKeeperTrackersWithRetries, which calls setupZookeeperTrackers, which causes abort to be called, which calls resetZooKeeperTrackersWithRetries.
I am not sure getZooKeeperWatcher() would throw if ZK is not available.

@Ted: calling this.zooKeeper.close() (if it is not null) first seems prudent.

@Jieshan: Good catch, yes it should be volatile.

                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "ramkrishna.s.vasudevan (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-5153:
------------------------------------------

    Attachment: HBASE-5153_addendum_0.90_1.patch

@Lars
Addressing your comments.
Based on your feedback will commit the patch if it is ok.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-92.txt, 5153-trunk-minimal.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, HBASE-5153_addendum_0.90_1.patch, HBase-5153-90-addendum.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186274#comment-13186274 ] 

Lars Hofhansl commented on HBASE-5153:
--------------------------------------

Oh, and should probably change the title of this jira.
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Jieshan Bean (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194445#comment-13194445 ] 

Jieshan Bean commented on HBASE-5153:
-------------------------------------

"The endless loop happens when ZK is actually down."
If ZK is actually down, the below code will throw a Exception:
 this.zooKeeper = getZooKeeperWatcher();
Then catched by the below code:
{noformat}
  try {
          LOG.info("This client just lost it's session with ZooKeeper, trying" +
              " to reconnect.");
          resetZooKeeperTrackersWithRetries();
          LOG.info("Reconnected successfully. This disconnect could have been" +
              " caused by a network partition or a long-running GC pause," +
              " either way it's recommended that you verify your environment.");
          return;
        } catch (ZooKeeperConnectionException e) {
          LOG.error("Could not reconnect to ZooKeeper after session" +
              " expiration, aborting");
          t = e;
        }
      if (t != null) LOG.fatal(msg, t);
      else LOG.fatal(msg);
      HConnectionManager.deleteStaleConnection(this);
{noformat}

It should not be a endless loop. Does that make sense?

                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187011#comment-13187011 ] 

Zhihong Yu commented on HBASE-5153:
-----------------------------------

@Jieshan:
Please find a machine which has access to internet to run the test suite.
maven needs to download artifacts.

I ran the 5 tests on MacBook and they passed.
{code}
  839  mt -Dtest=TestClockSkewDetection
  840  mt -Dtest=TestScanner
  841  mt -Dtest=TestCatalogTrackerOnCluster
  842  mt -Dtest=TestCatalogTracker
{code}
+1 on latest patch.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194373#comment-13194373 ] 

Zhihong Yu commented on HBASE-5153:
-----------------------------------

Thanks for tracking down the issue, Lars.
If you can upload the latest 5153-trunk.txt to reviewboard first followed by your new patch, that would help us know your changes easily.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187989#comment-13187989 ] 

Hudson commented on HBASE-5153:
-------------------------------

Integrated in HBase-0.92 #248 (See [https://builds.apache.org/job/HBase-0.92/248/])
    HBASE-5153 revert due to failed Jenkins builds

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/ClosedConnectionException.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/ConnectionUtils.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithAbort.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithRemove.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/regionserver/TestMasterAddressManager.java
* /hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperNodeTracker.java

                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.92.1, 0.90.6
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195443#comment-13195443 ] 

Hadoop QA commented on HBASE-5153:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12512291/HBASE-5153_addendum_0.90_1.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/864//console

This message is automatically generated.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-92.txt, 5153-trunk-minimal.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, HBASE-5153_addendum_0.90_1.patch, HBase-5153-90-addendum.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182542#comment-13182542 ] 

Hadoop QA commented on HBASE-5153:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12509917/HBASE-5153-V2.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/706//console

This message is automatically generated.
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5153-V2.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Lars Hofhansl (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-5153:
---------------------------------

    Attachment: 5153-trunk-minimal.txt

Here's a minimal patch for trunk.

If the HConnection is not usable for any reason, the using application can attempt to reset it.
I kept CloseConnectionException and the idea that ZooKeeperWatcher can unregister a listener.

If ZK is really not available that is probably an event that the application has to deal with anyway.

Long term HConnection should *not* hold persistent connections to ZK anyway, but only establish a connection when needed.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-92.txt, 5153-trunk-minimal.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, HBase-5153-90-addendum.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194457#comment-13194457 ] 

Hadoop QA commented on HBASE-5153:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12512077/5153-trunk-v2.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -140 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 162 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.io.hfile.TestHFileBlock
                  org.apache.hadoop.hbase.mapreduce.TestImportTsv
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/857//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/857//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/857//console

This message is automatically generated.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186271#comment-13186271 ] 

Lars Hofhansl commented on HBASE-5153:
--------------------------------------

Thanks Jieshan. This patch is great.
+1 after Ted's comments are fixed.

Are you planning to make a 0.92 and trunk patch as well?
(Aft
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5153:
------------------------------

    Attachment: 5153-trunk.txt

Patch for TRUNK.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.92.1, 0.90.6
>
>         Attachments: 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186718#comment-13186718 ] 

Hadoop QA commented on HBASE-5153:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510667/HBASE-5153-V6-90.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/771//console

This message is automatically generated.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186809#comment-13186809 ] 

Hadoop QA commented on HBASE-5153:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510683/HBASE-5153-V6-90-minorchange.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/775//console

This message is automatically generated.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189005#comment-13189005 ] 

Hudson commented on HBASE-5153:
-------------------------------

Integrated in HBase-TRUNK #2639 (See [https://builds.apache.org/job/HBase-TRUNK/2639/])
    HBASE-5153 revert due to failed Jenkins builds

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ClosedConnectionException.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MasterSchemaChangeTracker.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/SchemaChangeTracker.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithAbort.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithRemove.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMasterAddressManager.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperNodeTracker.java

                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.92.1, 0.90.6
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186989#comment-13186989 ] 

Hadoop QA commented on HBASE-5153:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510715/TestResults-hbase5153.out
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 1 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/778//console

This message is automatically generated.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5153:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12509890/HBASE-5153.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/704//console

This message is automatically generated.)
    
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5153-V2.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185700#comment-13185700 ] 

ramkrishna.s.vasudevan commented on HBASE-5153:
-----------------------------------------------

@Lars
Ok Lars.  Tomorrow may be we can upload a new patch with a retry logic in resetZooKeeperTrackers.  I will discuss with Jieshan tomorrow. Thanks Lars.
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Jieshan Bean (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13183290#comment-13183290 ] 

Jieshan Bean commented on HBASE-5153:
-------------------------------------

Only do that in flushCommits maybe not enough. I'll go though the code and give a more considerate approach. and also will give a patch for TRUNK.
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5153-V2.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Jieshan Bean (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186730#comment-13186730 ] 

Jieshan Bean commented on HBASE-5153:
-------------------------------------

I will upload the new patch after the tests finish(Including this minor change), Thanks, Ted.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Jieshan Bean (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194447#comment-13194447 ] 

Jieshan Bean commented on HBASE-5153:
-------------------------------------

bq. Are you guys saying we do not need this in 0.92+?

We need this, but the patch for 0.92+ should take notice of this:)
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Jieshan Bean (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jieshan Bean updated HBASE-5153:
--------------------------------

    Attachment: HBASE-5153.patch

Plz share your comments, thank you.
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185426#comment-13185426 ] 

Hadoop QA commented on HBASE-5153:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510458/HBASE-5153-trunk-v2.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 javadoc.  The javadoc tool appears to have generated -147 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 80 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/751//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/751//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/751//console

This message is automatically generated.
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5153:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510715/TestResults-hbase5153.out
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 1 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/778//console

This message is automatically generated.)
    
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5153:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510573/HBASE-5153-V5-90.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/759//console

This message is automatically generated.)
    
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187440#comment-13187440 ] 

Zhihong Yu commented on HBASE-5153:
-----------------------------------

Integrated to 0.92 branch.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.92.1, 0.90.6
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195019#comment-13195019 ] 

Lars Hofhansl commented on HBASE-5153:
--------------------------------------

I take this back... The 0.90 addendum is good. If abort is called from the reset method it'll return right away, and in ZooKeeperNodeTracker.start fall through to return false.

This entire thing is too complicated. Let's hold off on the 0.92 and trunk patches, but leave it in 0.90 (although I am not a big fan of diverging code bases either).
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, HBase-5153-90-addendum.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194450#comment-13194450 ] 

Zhihong Yu commented on HBASE-5153:
-----------------------------------

{code}
+        if (isResettingZKTrackers) {
+          return;
+        }
{code}
I was thinking about something similar to the above.

In resetZooKeeperTrackersWithRetries(), shall we call zooKeeper.close() before resetting zooKeeper?
{code}
        this.zooKeeper.close();
        this.zooKeeper = null;
{code}

Thanks for the help.

I think we need an addendum for 0.90
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Jieshan Bean (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jieshan Bean updated HBASE-5153:
--------------------------------

    Attachment: HBASE-5153-V6-90-minorchange.patch
    
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Jieshan Bean (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jieshan Bean updated HBASE-5153:
--------------------------------

    Attachment: HBASE-5153-V2.patch
    
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5153:
------------------------------

    Attachment: 5153-trunk.txt

Jieshan's patch for TRUNK.
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Jieshan Bean (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187442#comment-13187442 ] 

Jieshan Bean commented on HBASE-5153:
-------------------------------------

Thank you, Ted.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.92.1, 0.90.6
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187475#comment-13187475 ] 

Hudson commented on HBASE-5153:
-------------------------------

Integrated in HBase-TRUNK #2635 (See [https://builds.apache.org/job/HBase-TRUNK/2635/])
    HBASE-5153  Add retry logic in HConnectionImplementation#resetZooKeeperTrackers (Jieshan)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ClosedConnectionException.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MasterSchemaChangeTracker.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/SchemaChangeTracker.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithAbort.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithRemove.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMasterAddressManager.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperNodeTracker.java

                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.92.1, 0.90.7
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187446#comment-13187446 ] 

ramkrishna.s.vasudevan commented on HBASE-5153:
-----------------------------------------------

Thanks a lot Ted. Nice of you .:)
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.92.1, 0.90.7
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185571#comment-13185571 ] 

Zhihong Yu commented on HBASE-5153:
-----------------------------------

@Lars:
Hbase-4805 is not in 0.90
So in spirit, this Jira should be able to go into 0.90
What do you think ?
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194906#comment-13194906 ] 

Lars Hofhansl commented on HBASE-5153:
--------------------------------------

@Jieshan... I see your point. Need to think about that.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, HBase-5153-90-addendum.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Jieshan Bean (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185487#comment-13185487 ] 

Jieshan Bean commented on HBASE-5153:
-------------------------------------

Maybe retry in resetZooKeeperTrackers is better. We try our best to re-use the original connection. The worst case is even if we retried max times and still fail, then abort, but i think we're responsible for letting the user level know this. 
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185668#comment-13185668 ] 

Lars Hofhansl commented on HBASE-5153:
--------------------------------------

@Ted: This could go into 0.90. Downside then is that 0.90 would behave quite differently.
I feel ambivalent about HBASE-5084 (seems we're over engineering this, but if you think we need that, then let's fix that first and then revisit this).

@Jieshan & Ted: Maybe have some retry logic resetZooKeeperTrackers *and* give HConnection an extra "reset" method? That way app code can redo the reset if needed.

                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185117#comment-13185117 ] 

Zhihong Yu commented on HBASE-5153:
-----------------------------------

For the trunk patch:
{code}
+   * @param actions
+   *          The collection of actions.
+   * @param tableName
+   *          Name of the hbase table
{code}
The explanation should be on the same line as parameter name.
{code}
+   * Check the connection. If it has been closed, we should recreate the
+   * Connection. See HBASE-5153.
+   * 
+   * @param ioe
+   *          The IOE throwed by the HConnection methods.
{code}
Connection on second line above should start with lower case 'c'.
throwed should be thrown.
{code}
+        throw new RetriesExhaustedException(
+            "Retry to recreate connection in HTable, but failed.");
{code}
Please include the value of tries in the exception message.

Good work.
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Jieshan Bean (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jieshan Bean updated HBASE-5153:
--------------------------------

    Attachment: HBASE-5153-V4-90.patch
                HBASE-5153-trunk.patch
    
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195210#comment-13195210 ] 

Zhihong Yu commented on HBASE-5153:
-----------------------------------

For resetConnection() in the new patch, I see:
{code}
+      this.closed = false;
{code}
In HConnectionImplementation.close(), we have:
{code}
      if (master != null) {
        if (stopProxy) {
          HBaseRPC.stopProxy(master);
        }
        master = null;
{code}
This seems to imply that once a connection is closed, we cannot simply change this.closed to false in resetConnection().
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-92.txt, 5153-trunk-minimal.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, HBase-5153-90-addendum.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194443#comment-13194443 ] 

Lars Hofhansl commented on HBASE-5153:
--------------------------------------

bq. As discussed with Ted. Trunk and 92 already including a retry logic in RecoverableZooKeeper. So that makes the retry logic in resetZooKeeperTrackersWithRetries less important.

Are you guys saying we do not need this in 0.92+?
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5153:
------------------------------

    Fix Version/s:     (was: 0.90.7)
                   0.90.6
    
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.92.1, 0.90.6
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5153:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510667/HBASE-5153-V6-90.txt
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/771//console

This message is automatically generated.)
    
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182552#comment-13182552 ] 

Zhihong Yu commented on HBASE-5153:
-----------------------------------

{code}
+      if (this.closed)
+        throw new ConnectionClosedException(toString() + " closed");
{code}
Please enclose the throw statement in curly braces.

Please also prepare patch for TRUNK.
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5153-V2.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194441#comment-13194441 ] 

Lars Hofhansl commented on HBASE-5153:
--------------------------------------

@Jieshan: The endless loop happens when ZK is actually down.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194435#comment-13194435 ] 

Lars Hofhansl commented on HBASE-5153:
--------------------------------------

I have a new patch... Testing it now.
It is mostly what you have Jieshan, but it does not need all the changes to the ZookeeperNodeTracker and subclasses.
I'll attach it soon... Then please let me know what you think.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5153:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12509917/HBASE-5153-V2.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/705//console

This message is automatically generated.)
    
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5153-V2.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194599#comment-13194599 ] 

Hadoop QA commented on HBASE-5153:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12512100/HBase-5153-90-addendum.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/858//console

This message is automatically generated.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, HBase-5153-90-addendum.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Lars Hofhansl (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194388#comment-13194388 ] 

Lars Hofhansl commented on HBASE-5153:
--------------------------------------

Sure... There's a bit more to this too. resetZooKeeperTrackersWithRetries on its last try calls setupZookeeperTrackers with allow aborts, which will call resetZooKeeperTrackersWithRetries again. Leading to an endless loop. Need to think about how to refactor this.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187429#comment-13187429 ] 

Zhihong Yu commented on HBASE-5153:
-----------------------------------

Integrated to 0.90 and TRUNK.

Waiting for 0.92 security build to pass before integrating to 0.92

Thanks for the patch Jieshan.

Thanks for the review Lars.
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.92.1, 0.90.6
>
>         Attachments: 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

Posted by "Jieshan Bean (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182472#comment-13182472 ] 

Jieshan Bean commented on HBASE-5153:
-------------------------------------

Thanks, Ted. I'll add the unit test code immediately.
                
> HConnection re-creation in HTable after HConnection abort
> ---------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "ramkrishna.s.vasudevan (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-5153:
------------------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510683/HBASE-5153-V6-90-minorchange.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/775//console

This message is automatically generated.)
    
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Jieshan Bean (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194444#comment-13194444 ] 

Jieshan Bean commented on HBASE-5153:
-------------------------------------

Thanks Lars. It's nice of you:)
                
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.94.0, 0.90.6, 0.92.1
>
>         Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, TestResults-hbase5153.out
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5153:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12510383/HBASE-5153-V4-90.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/747//console

This message is automatically generated.)
    
> Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
> -------------------------------------------------------------------
>
>                 Key: HBASE-5153
>                 URL: https://issues.apache.org/jira/browse/HBASE-5153
>             Project: HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.90.4
>            Reporter: Jieshan Bean
>            Assignee: Jieshan Bean
>             Fix For: 0.90.6
>
>         Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch
>
>
> HBASE-4893 is related to this issue. In that issue, we know, if multi-threads share a same connection, once this connection got abort in one thread, the other threads will got a "HConnectionManager$HConnectionImplementation@18fb1f7 closed" exception.
> It solve the problem of "stale connection can't removed". But the orignal HTable instance cann't be continue to use. The connection in HTable should be recreated.
> Actually, there's two aproach to solve this:
> 1. In user code, once catch an IOE, close connection and re-create HTable instance. We can use this as a workaround.
> 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira