You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Jonathan Gray (JIRA)" <ji...@apache.org> on 2009/10/20 02:06:59 UTC

[jira] Created: (HBASE-1920) Client with cached region location pointing at dead server does not recover

Client with cached region location pointing at dead server does not recover
---------------------------------------------------------------------------

                 Key: HBASE-1920
                 URL: https://issues.apache.org/jira/browse/HBASE-1920
             Project: Hadoop HBase
          Issue Type: Bug
    Affects Versions: 0.20.1
            Reporter: Jonathan Gray
            Priority: Critical
             Fix For: 0.20.2, 0.21.0


Testing in HBASE-1908 uncovered an issue where I had a client that had cached a region location.  I killed that regionserver, waited for recovery/reassignment, and then tried to scan the table again from the same client w/o restarting it.

A client normally gets a NotServingRegionException when a region is reassigned, but since this server is dead the client just got Connection Refused type exceptions.  These didn't seem to trigger the client to ask META for a new region location.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1920) Client with cached region location pointing at dead server does not recover

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1920:
-------------------------

    Fix Version/s:     (was: 0.20.2)

Moving out of 0.20.2 after chatting w/ JGray.

> Client with cached region location pointing at dead server does not recover
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-1920
>                 URL: https://issues.apache.org/jira/browse/HBASE-1920
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.21.0
>
>
> Testing in HBASE-1908 uncovered an issue where I had a client that had cached a region location.  I killed that regionserver, waited for recovery/reassignment, and then tried to scan the table again from the same client w/o restarting it.
> A client normally gets a NotServingRegionException when a region is reassigned, but since this server is dead the client just got Connection Refused type exceptions.  These didn't seem to trigger the client to ask META for a new region location.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1920) Client with cached region location pointing at dead server does not recover

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773189#action_12773189 ] 

Jean-Daniel Cryans commented on HBASE-1920:
-------------------------------------------

Well it seems I'm not able anymore... Since I use java 7 on that cluster my master crashed and was just unable to reassign so I was getting retries because .META. was never updated. Now I retried scanning, killing (sometimes waiting, sometimes not for reassignment), then scanning again at least 10 times and it's always working just fine. 

> Client with cached region location pointing at dead server does not recover
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-1920
>                 URL: https://issues.apache.org/jira/browse/HBASE-1920
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.20.2, 0.21.0
>
>
> Testing in HBASE-1908 uncovered an issue where I had a client that had cached a region location.  I killed that regionserver, waited for recovery/reassignment, and then tried to scan the table again from the same client w/o restarting it.
> A client normally gets a NotServingRegionException when a region is reassigned, but since this server is dead the client just got Connection Refused type exceptions.  These didn't seem to trigger the client to ask META for a new region location.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1920) Client with cached region location pointing at dead server does not recover

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773154#action_12773154 ] 

Jean-Daniel Cryans commented on HBASE-1920:
-------------------------------------------

I was able to manufacture the problem by using the shell to scan, kill the RS, then scan again. I'll try to see what's the exact trace.

> Client with cached region location pointing at dead server does not recover
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-1920
>                 URL: https://issues.apache.org/jira/browse/HBASE-1920
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.20.2, 0.21.0
>
>
> Testing in HBASE-1908 uncovered an issue where I had a client that had cached a region location.  I killed that regionserver, waited for recovery/reassignment, and then tried to scan the table again from the same client w/o restarting it.
> A client normally gets a NotServingRegionException when a region is reassigned, but since this server is dead the client just got Connection Refused type exceptions.  These didn't seem to trigger the client to ask META for a new region location.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1920) Client with cached region location pointing at dead server does not recover

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773287#action_12773287 ] 

Jonathan Gray commented on HBASE-1920:
--------------------------------------

Will need to test more.  I believe I did have only one regionserver remaining but need to verify.

> Client with cached region location pointing at dead server does not recover
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-1920
>                 URL: https://issues.apache.org/jira/browse/HBASE-1920
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.20.2, 0.21.0
>
>
> Testing in HBASE-1908 uncovered an issue where I had a client that had cached a region location.  I killed that regionserver, waited for recovery/reassignment, and then tried to scan the table again from the same client w/o restarting it.
> A client normally gets a NotServingRegionException when a region is reassigned, but since this server is dead the client just got Connection Refused type exceptions.  These didn't seem to trigger the client to ask META for a new region location.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1920) Client with cached region location pointing at dead server does not recover

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773196#action_12773196 ] 

stack commented on HBASE-1920:
------------------------------

@jgray, does this require there to be one RS only?

> Client with cached region location pointing at dead server does not recover
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-1920
>                 URL: https://issues.apache.org/jira/browse/HBASE-1920
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.20.2, 0.21.0
>
>
> Testing in HBASE-1908 uncovered an issue where I had a client that had cached a region location.  I killed that regionserver, waited for recovery/reassignment, and then tried to scan the table again from the same client w/o restarting it.
> A client normally gets a NotServingRegionException when a region is reassigned, but since this server is dead the client just got Connection Refused type exceptions.  These didn't seem to trigger the client to ask META for a new region location.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1920) Client with cached region location pointing at dead server does not recover

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774430#action_12774430 ] 

stack commented on HBASE-1920:
------------------------------

@jgray Will we move out of 0.20.2?

> Client with cached region location pointing at dead server does not recover
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-1920
>                 URL: https://issues.apache.org/jira/browse/HBASE-1920
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Jonathan Gray
>            Priority: Critical
>             Fix For: 0.20.2, 0.21.0
>
>
> Testing in HBASE-1908 uncovered an issue where I had a client that had cached a region location.  I killed that regionserver, waited for recovery/reassignment, and then tried to scan the table again from the same client w/o restarting it.
> A client normally gets a NotServingRegionException when a region is reassigned, but since this server is dead the client just got Connection Refused type exceptions.  These didn't seem to trigger the client to ask META for a new region location.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.