You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2011/01/17 07:06:43 UTC

[jira] Created: (HBASE-3447) Split parents ending up deployed along with daughters

Split parents ending up deployed along with daughters
-----------------------------------------------------

                 Key: HBASE-3447
                 URL: https://issues.apache.org/jira/browse/HBASE-3447
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.90.0
            Reporter: Todd Lipcon
            Priority: Blocker


Testing rc3 got several regions in this state as reported by hbck:
ERROR: Region UNKNOWN_REGION on haus02.sf.cloudera.com:57020, key=9f2822a04028c86813fe71264da5c167, not on HDFS or in META but deployed on haus02.sf.cloudera.com:57020
(this without any injected failures or anything)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] [Commented] (HBASE-3447) Split parents ending up deployed along with daughters

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072092#comment-13072092 ] 

stack commented on HBASE-3447:
------------------------------

I think we should close this issue.  Todd suggests:

bq. But even if we fixed HBASE-3446, we'd have this same bug if the master had crashed in the middle of processing the shutdown, right?

This is now in place.  See AssignmentManager#joinCluster.  See how it calls processDeadServers.  This in turn calls the same code used when recovering shutdown servers over in ServerShutdownHandler.processDeadRegion.

Is the above enough to close out this issue?

> Split parents ending up deployed along with daughters
> -----------------------------------------------------
>
>                 Key: HBASE-3447
>                 URL: https://issues.apache.org/jira/browse/HBASE-3447
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: unknown-region.log
>
>
> Testing rc3 got several regions in this state as reported by hbck:
> ERROR: Region UNKNOWN_REGION on haus02.sf.cloudera.com:57020, key=9f2822a04028c86813fe71264da5c167, not on HDFS or in META but deployed on haus02.sf.cloudera.com:57020
> (this without any injected failures or anything)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3447) Split parents ending up deployed along with daughters

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983022#action_12983022 ] 

stack commented on HBASE-3447:
------------------------------

bq. But even if we fixed HBASE-3446, we'd have this same bug if the master had crashed in the middle of processing the shutdown, right?

You mean, the fixup needs to be run when master joins a cluster too?  That seems reasonable, a rare case, but yeah, somthing we should plug.

> Split parents ending up deployed along with daughters
> -----------------------------------------------------
>
>                 Key: HBASE-3447
>                 URL: https://issues.apache.org/jira/browse/HBASE-3447
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Priority: Blocker
>         Attachments: unknown-region.log
>
>
> Testing rc3 got several regions in this state as reported by hbck:
> ERROR: Region UNKNOWN_REGION on haus02.sf.cloudera.com:57020, key=9f2822a04028c86813fe71264da5c167, not on HDFS or in META but deployed on haus02.sf.cloudera.com:57020
> (this without any injected failures or anything)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-3447) Split parents ending up deployed along with daughters

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HBASE-3447:
-------------------------------

    Fix Version/s:     (was: 0.90.1)
                   0.92.0

Bump to 0.92, we'll move back to 0.90 if people are hitting this.

> Split parents ending up deployed along with daughters
> -----------------------------------------------------
>
>                 Key: HBASE-3447
>                 URL: https://issues.apache.org/jira/browse/HBASE-3447
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: unknown-region.log
>
>
> Testing rc3 got several regions in this state as reported by hbck:
> ERROR: Region UNKNOWN_REGION on haus02.sf.cloudera.com:57020, key=9f2822a04028c86813fe71264da5c167, not on HDFS or in META but deployed on haus02.sf.cloudera.com:57020
> (this without any injected failures or anything)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-3447) Split parents ending up deployed along with daughters

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026787#comment-13026787 ] 

Jean-Daniel Cryans commented on HBASE-3447:
-------------------------------------------

svarma (Suraj I believe) was mentioning on IRC that they got something like this issue on their dev cluster. Turns out they hit HBASE-3669 instead and then the disabled/dropped the table which left the regions assigned but not existing anywhere else. The confusion came from hbck saying "not on HDFS or in META but deployed on" for those few regions. They fixed it by bouncing the region servers.

> Split parents ending up deployed along with daughters
> -----------------------------------------------------
>
>                 Key: HBASE-3447
>                 URL: https://issues.apache.org/jira/browse/HBASE-3447
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: unknown-region.log
>
>
> Testing rc3 got several regions in this state as reported by hbck:
> ERROR: Region UNKNOWN_REGION on haus02.sf.cloudera.com:57020, key=9f2822a04028c86813fe71264da5c167, not on HDFS or in META but deployed on haus02.sf.cloudera.com:57020
> (this without any injected failures or anything)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3447) Split parents ending up deployed along with daughters

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13072160#comment-13072160 ] 

Ted Yu commented on HBASE-3447:
-------------------------------

HBASE-3446 is a blocker which will be fixed for 0.92

I think it is fine to close this JIRA.

> Split parents ending up deployed along with daughters
> -----------------------------------------------------
>
>                 Key: HBASE-3447
>                 URL: https://issues.apache.org/jira/browse/HBASE-3447
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: unknown-region.log
>
>
> Testing rc3 got several regions in this state as reported by hbck:
> ERROR: Region UNKNOWN_REGION on haus02.sf.cloudera.com:57020, key=9f2822a04028c86813fe71264da5c167, not on HDFS or in META but deployed on haus02.sf.cloudera.com:57020
> (this without any injected failures or anything)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (HBASE-3447) Split parents ending up deployed along with daughters

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-3447.
--------------------------

    Resolution: Fixed

Resolving.  Reopen if think otherwise.

> Split parents ending up deployed along with daughters
> -----------------------------------------------------
>
>                 Key: HBASE-3447
>                 URL: https://issues.apache.org/jira/browse/HBASE-3447
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.92.0
>
>         Attachments: unknown-region.log
>
>
> Testing rc3 got several regions in this state as reported by hbck:
> ERROR: Region UNKNOWN_REGION on haus02.sf.cloudera.com:57020, key=9f2822a04028c86813fe71264da5c167, not on HDFS or in META but deployed on haus02.sf.cloudera.com:57020
> (this without any injected failures or anything)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HBASE-3447) Split parents ending up deployed along with daughters

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982809#action_12982809 ] 

stack commented on HBASE-3447:
------------------------------

Any more in the log from around here?

{code}
logs/HM-haus01.sf.cloudera.com.log:2011-01-16 18:03:26,164 DEBUG oahh.M.handler.ServerShutdownHandler: Offlined and split region usertable,user136857679,1295149082811.9f2822a04028c86813fe71264da5c167.; checking daughter presence
{code}

The RS seems to report fine that the parent if split/offlined:

{code}
logs/RS-haus04.sf.cloudera.com.log:2011-01-16 17:51:44,112 INFO oahh.RS.CompactSplitThread: Region split, META updated, and report to master. Parent=usertable,user136857679,1295149082811.9f2822a04028c86813fe71264da5c167., new regions: usertable,user136857679,1295229096477.ea26a71066b5617d9d8c9a4394209895., usertable,user1372334755,1295229096477.9c05ac3446fd2ff2d0cc0ac78f063867.. Split took 7sec
{code}

... and the master on shutdown processing seems to recognize it as an offlined parent.

Is this the one you figured when we talked on IRC?



> Split parents ending up deployed along with daughters
> -----------------------------------------------------
>
>                 Key: HBASE-3447
>                 URL: https://issues.apache.org/jira/browse/HBASE-3447
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Priority: Blocker
>         Attachments: unknown-region.log
>
>
> Testing rc3 got several regions in this state as reported by hbck:
> ERROR: Region UNKNOWN_REGION on haus02.sf.cloudera.com:57020, key=9f2822a04028c86813fe71264da5c167, not on HDFS or in META but deployed on haus02.sf.cloudera.com:57020
> (this without any injected failures or anything)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-3447) Split parents ending up deployed along with daughters

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HBASE-3447:
-------------------------------

    Attachment: unknown-region.log

Here's a log for one of the regions.

> Split parents ending up deployed along with daughters
> -----------------------------------------------------
>
>                 Key: HBASE-3447
>                 URL: https://issues.apache.org/jira/browse/HBASE-3447
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Priority: Blocker
>         Attachments: unknown-region.log
>
>
> Testing rc3 got several regions in this state as reported by hbck:
> ERROR: Region UNKNOWN_REGION on haus02.sf.cloudera.com:57020, key=9f2822a04028c86813fe71264da5c167, not on HDFS or in META but deployed on haus02.sf.cloudera.com:57020
> (this without any injected failures or anything)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-3447) Split parents ending up deployed along with daughters

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12982820#action_12982820 ] 

Todd Lipcon commented on HBASE-3447:
------------------------------------

Ah, yes, this area of the logs is where HBASE-3446 happened. So the master didn't finish processing the shutdown.

But even if we fixed HBASE-3446, we'd have this same bug if the master had crashed in the middle of processing the shutdown, right?

> Split parents ending up deployed along with daughters
> -----------------------------------------------------
>
>                 Key: HBASE-3447
>                 URL: https://issues.apache.org/jira/browse/HBASE-3447
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Priority: Blocker
>         Attachments: unknown-region.log
>
>
> Testing rc3 got several regions in this state as reported by hbck:
> ERROR: Region UNKNOWN_REGION on haus02.sf.cloudera.com:57020, key=9f2822a04028c86813fe71264da5c167, not on HDFS or in META but deployed on haus02.sf.cloudera.com:57020
> (this without any injected failures or anything)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-3447) Split parents ending up deployed along with daughters

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3447:
-------------------------

    Fix Version/s: 0.90.1
         Assignee: stack

Bringing into 0.90.1.  Let me add the running of the fixup to master joining an already up cluster.

> Split parents ending up deployed along with daughters
> -----------------------------------------------------
>
>                 Key: HBASE-3447
>                 URL: https://issues.apache.org/jira/browse/HBASE-3447
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.1
>
>         Attachments: unknown-region.log
>
>
> Testing rc3 got several regions in this state as reported by hbck:
> ERROR: Region UNKNOWN_REGION on haus02.sf.cloudera.com:57020, key=9f2822a04028c86813fe71264da5c167, not on HDFS or in META but deployed on haus02.sf.cloudera.com:57020
> (this without any injected failures or anything)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.