You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org> on 2010/06/11 20:06:15 UTC

[jira] Created: (HBASE-2712) Cached region location that went stale won't recover if asking for first row

Cached region location that went stale won't recover if asking for first row
----------------------------------------------------------------------------

                 Key: HBASE-2712
                 URL: https://issues.apache.org/jira/browse/HBASE-2712
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.20.4
            Reporter: Jean-Daniel Cryans
            Assignee: Jean-Daniel Cryans
            Priority: Blocker
             Fix For: 0.20.5, 0.21.0


Let's say that:

 - A client cached the location of some region, not the first one in the table
 - The RS that was holding it fails
 - The first thing the client does after the failure is trying to reach the first row of that region

This will never recover, since HCM.deleteCachedLocation doesn't delete if the row we asked for is the first row in a region. This looks a lot like HBASE-1920, but there isn't enough information in that jira to say that it's the same thing.

This is a blocker, and it kills 0.20.5 RC2 (sorry).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2712) Cached region location that went stale won't recover if asking for first row

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877929#action_12877929 ] 

HBase Review Board commented on HBASE-2712:
-------------------------------------------

Message from: "Jean-Daniel Cryans" <jd...@apache.org>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/170/
-----------------------------------------------------------

(Updated 2010-06-11 12:14:26.490410)


Review request for hbase.


Changes
-------

Fixed the potential NPE, and this one is against trunk.


Summary
-------

Patch against branch that does some refactoring and that fixes the bug at the same time. Adds a unit test that doesn't pass without the refactoring.

Urgent to review.


This addresses bug HBASE-2712.
    http://issues.apache.org/jira/browse/HBASE-2712


Diffs (updated)
-----

  /trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 953830 
  /trunk/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java PRE-CREATION 

Diff: http://review.hbase.org/r/170/diff


Testing
-------


Thanks,

Jean-Daniel




> Cached region location that went stale won't recover if asking for first row
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-2712
>                 URL: https://issues.apache.org/jira/browse/HBASE-2712
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.20.5, 0.21.0
>
>
> Let's say that:
>  - A client cached the location of some region, not the first one in the table
>  - The RS that was holding it fails
>  - The first thing the client does after the failure is trying to reach the first row of that region
> This will never recover, since HCM.deleteCachedLocation doesn't delete if the row we asked for is the first row in a region. This looks a lot like HBASE-1920, but there isn't enough information in that jira to say that it's the same thing.
> This is a blocker, and it kills 0.20.5 RC2 (sorry).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2712) Cached region location that went stale won't recover if asking for first row

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877922#action_12877922 ] 

HBase Review Board commented on HBASE-2712:
-------------------------------------------

Message from: "Jonathan Gray" <jg...@apache.org>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/170/#review189
-----------------------------------------------------------



/branches/0.20/src/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/170/#comment866>

    This looks like a much better approach :)
    
    Isn't there an NPE if this returns null?


- Jonathan





> Cached region location that went stale won't recover if asking for first row
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-2712
>                 URL: https://issues.apache.org/jira/browse/HBASE-2712
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.20.5, 0.21.0
>
>
> Let's say that:
>  - A client cached the location of some region, not the first one in the table
>  - The RS that was holding it fails
>  - The first thing the client does after the failure is trying to reach the first row of that region
> This will never recover, since HCM.deleteCachedLocation doesn't delete if the row we asked for is the first row in a region. This looks a lot like HBASE-1920, but there isn't enough information in that jira to say that it's the same thing.
> This is a blocker, and it kills 0.20.5 RC2 (sorry).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-2712) Cached region location that went stale won't recover if asking for first row

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-2712:
--------------------------------------

    Attachment: HBASE-2712-trunk.patch

Final patch from rb.

> Cached region location that went stale won't recover if asking for first row
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-2712
>                 URL: https://issues.apache.org/jira/browse/HBASE-2712
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.20.5, 0.21.0
>
>         Attachments: HBASE-2712-trunk.patch
>
>
> Let's say that:
>  - A client cached the location of some region, not the first one in the table
>  - The RS that was holding it fails
>  - The first thing the client does after the failure is trying to reach the first row of that region
> This will never recover, since HCM.deleteCachedLocation doesn't delete if the row we asked for is the first row in a region. This looks a lot like HBASE-1920, but there isn't enough information in that jira to say that it's the same thing.
> This is a blocker, and it kills 0.20.5 RC2 (sorry).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-2712) Cached region location that went stale won't recover if asking for first row

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans resolved HBASE-2712.
---------------------------------------

    Hadoop Flags: [Reviewed]
      Resolution: Fixed

Committed to branch and trunk.

> Cached region location that went stale won't recover if asking for first row
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-2712
>                 URL: https://issues.apache.org/jira/browse/HBASE-2712
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.20.5, 0.21.0
>
>         Attachments: HBASE-2712-trunk.patch
>
>
> Let's say that:
>  - A client cached the location of some region, not the first one in the table
>  - The RS that was holding it fails
>  - The first thing the client does after the failure is trying to reach the first row of that region
> This will never recover, since HCM.deleteCachedLocation doesn't delete if the row we asked for is the first row in a region. This looks a lot like HBASE-1920, but there isn't enough information in that jira to say that it's the same thing.
> This is a blocker, and it kills 0.20.5 RC2 (sorry).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2712) Cached region location that went stale won't recover if asking for first row

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877924#action_12877924 ] 

HBase Review Board commented on HBASE-2712:
-------------------------------------------

Message from: "Jean-Daniel Cryans" <jd...@apache.org>


bq.  On 2010-06-11 11:52:48, stack wrote:
bq.  > /branches/0.20/src/java/org/apache/hadoop/hbase/client/HConnectionManager.java, line 850
bq.  > <http://review.hbase.org/r/170/diff/1/?file=1246#file1246line850>
bq.  >
bq.  >     getCachedLocation does the right thing?

Yep, that one did, so that's why we should reuse it instead.


bq.  On 2010-06-11 11:52:48, stack wrote:
bq.  > /branches/0.20/src/java/org/apache/hadoop/hbase/client/HConnectionManager.java, line 851
bq.  > <http://review.hbase.org/r/170/diff/1/?file=1246#file1246line851>
bq.  >
bq.  >     rl will never be null?

Doh


- Jean-Daniel


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/170/#review188
-----------------------------------------------------------





> Cached region location that went stale won't recover if asking for first row
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-2712
>                 URL: https://issues.apache.org/jira/browse/HBASE-2712
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.20.5, 0.21.0
>
>
> Let's say that:
>  - A client cached the location of some region, not the first one in the table
>  - The RS that was holding it fails
>  - The first thing the client does after the failure is trying to reach the first row of that region
> This will never recover, since HCM.deleteCachedLocation doesn't delete if the row we asked for is the first row in a region. This looks a lot like HBASE-1920, but there isn't enough information in that jira to say that it's the same thing.
> This is a blocker, and it kills 0.20.5 RC2 (sorry).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2712) Cached region location that went stale won't recover if asking for first row

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877921#action_12877921 ] 

HBase Review Board commented on HBASE-2712:
-------------------------------------------

Message from: stack@duboce.net

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/170/#review188
-----------------------------------------------------------



/branches/0.20/src/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/170/#comment865>

    getCachedLocation does the right thing?



/branches/0.20/src/java/org/apache/hadoop/hbase/client/HConnectionManager.java
<http://review.hbase.org/r/170/#comment864>

    rl will never be null?


- stack





> Cached region location that went stale won't recover if asking for first row
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-2712
>                 URL: https://issues.apache.org/jira/browse/HBASE-2712
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.20.5, 0.21.0
>
>
> Let's say that:
>  - A client cached the location of some region, not the first one in the table
>  - The RS that was holding it fails
>  - The first thing the client does after the failure is trying to reach the first row of that region
> This will never recover, since HCM.deleteCachedLocation doesn't delete if the row we asked for is the first row in a region. This looks a lot like HBASE-1920, but there isn't enough information in that jira to say that it's the same thing.
> This is a blocker, and it kills 0.20.5 RC2 (sorry).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2712) Cached region location that went stale won't recover if asking for first row

Posted by "Jonathan Gray (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877899#action_12877899 ] 

Jonathan Gray commented on HBASE-2712:
--------------------------------------

This could very much be what I tripped over in HBASE-1920.  Let's keep the other jira open for now.  Good catch JD.

> Cached region location that went stale won't recover if asking for first row
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-2712
>                 URL: https://issues.apache.org/jira/browse/HBASE-2712
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.20.5, 0.21.0
>
>
> Let's say that:
>  - A client cached the location of some region, not the first one in the table
>  - The RS that was holding it fails
>  - The first thing the client does after the failure is trying to reach the first row of that region
> This will never recover, since HCM.deleteCachedLocation doesn't delete if the row we asked for is the first row in a region. This looks a lot like HBASE-1920, but there isn't enough information in that jira to say that it's the same thing.
> This is a blocker, and it kills 0.20.5 RC2 (sorry).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2712) Cached region location that went stale won't recover if asking for first row

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877933#action_12877933 ] 

HBase Review Board commented on HBASE-2712:
-------------------------------------------

Message from: "Jonathan Gray" <jg...@apache.org>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/170/#review193
-----------------------------------------------------------

Ship it!


looks good to me

- Jonathan





> Cached region location that went stale won't recover if asking for first row
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-2712
>                 URL: https://issues.apache.org/jira/browse/HBASE-2712
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.20.5, 0.21.0
>
>
> Let's say that:
>  - A client cached the location of some region, not the first one in the table
>  - The RS that was holding it fails
>  - The first thing the client does after the failure is trying to reach the first row of that region
> This will never recover, since HCM.deleteCachedLocation doesn't delete if the row we asked for is the first row in a region. This looks a lot like HBASE-1920, but there isn't enough information in that jira to say that it's the same thing.
> This is a blocker, and it kills 0.20.5 RC2 (sorry).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-2712) Cached region location that went stale won't recover if asking for first row

Posted by "HBase Review Board (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877914#action_12877914 ] 

HBase Review Board commented on HBASE-2712:
-------------------------------------------

Message from: "Jean-Daniel Cryans" <jd...@apache.org>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/170/
-----------------------------------------------------------

Review request for hbase.


Summary
-------

Patch against branch that does some refactoring and that fixes the bug at the same time. Adds a unit test that doesn't pass without the refactoring.

Urgent to review.


This addresses bug HBASE-2712.
    http://issues.apache.org/jira/browse/HBASE-2712


Diffs
-----

  /branches/0.20/src/java/org/apache/hadoop/hbase/client/HConnectionManager.java 953796 
  /branches/0.20/src/test/org/apache/hadoop/hbase/client/TestHCM.java PRE-CREATION 

Diff: http://review.hbase.org/r/170/diff


Testing
-------


Thanks,

Jean-Daniel




> Cached region location that went stale won't recover if asking for first row
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-2712
>                 URL: https://issues.apache.org/jira/browse/HBASE-2712
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.4
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>            Priority: Blocker
>             Fix For: 0.20.5, 0.21.0
>
>
> Let's say that:
>  - A client cached the location of some region, not the first one in the table
>  - The RS that was holding it fails
>  - The first thing the client does after the failure is trying to reach the first row of that region
> This will never recover, since HCM.deleteCachedLocation doesn't delete if the row we asked for is the first row in a region. This looks a lot like HBASE-1920, but there isn't enough information in that jira to say that it's the same thing.
> This is a blocker, and it kills 0.20.5 RC2 (sorry).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.