You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Devaraj Das (JIRA)" <ji...@apache.org> on 2012/06/02 02:54:23 UTC

[jira] [Created] (HBASE-6152) Split abort is not handled properly

Devaraj Das created HBASE-6152:
----------------------------------

             Summary: Split abort is not handled properly
                 Key: HBASE-6152
                 URL: https://issues.apache.org/jira/browse/HBASE-6152
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.92.0
            Reporter: Devaraj Das
            Assignee: Devaraj Das


I ran into this:
1. RegionServer started to split a region(R), but the split was taking a long time, and hence the split was aborted
2. As part of cleanup, the RS deleted the ZK node that it created initially for R
3. The master (AssignmentManager) noticed the node deletion, and made R offline
4. The RS recovered from the failure, and at some point of time, tried to do the split again.
5. The master got an event RS_ZK_REGION_SPLITTING but the server gave an error like - "Received SPLIT for region R from server RS but it doesn't exist anymore,.."
6. The RS apparently did the split successfully this time, but is stuck on the master to delete the znode for the region. It kept on saying - "org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the master to process the split for R" and it was stuck there forever. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-6152) Split abort is not handled properly

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287838#comment-13287838 ] 

Enis Soztutar commented on HBASE-6152:
--------------------------------------

I think the problem is that the master offlines the region at step 3, however, the parent region is recovered, and onlined by RS. So all other region transitions fail for the master. 
                
> Split abort is not handled properly
> -----------------------------------
>
>                 Key: HBASE-6152
>                 URL: https://issues.apache.org/jira/browse/HBASE-6152
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>
> I ran into this:
> 1. RegionServer started to split a region(R), but the split was taking a long time, and hence the split was aborted
> 2. As part of cleanup, the RS deleted the ZK node that it created initially for R
> 3. The master (AssignmentManager) noticed the node deletion, and made R offline
> 4. The RS recovered from the failure, and at some point of time, tried to do the split again.
> 5. The master got an event RS_ZK_REGION_SPLIT but the server gave an error like - "Received SPLIT for region R from server RS but it doesn't exist anymore,.."
> 6. The RS apparently did the split successfully this time, but is stuck on the master to delete the znode for the region. It kept on saying - "org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the master to process the split for R" and it was stuck there forever. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (HBASE-6152) Split abort is not handled properly

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das resolved HBASE-6152.
--------------------------------

    Resolution: Duplicate

Thanks, Ramkrishna. Yeah that should solve the problem.
                
> Split abort is not handled properly
> -----------------------------------
>
>                 Key: HBASE-6152
>                 URL: https://issues.apache.org/jira/browse/HBASE-6152
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>         Attachments: HBASE-6152_0.94.patch
>
>
> I ran into this:
> 1. RegionServer started to split a region(R), but the split was taking a long time, and hence the split was aborted
> 2. As part of cleanup, the RS deleted the ZK node that it created initially for R
> 3. The master (AssignmentManager) noticed the node deletion, and made R offline
> 4. The RS recovered from the failure, and at some point of time, tried to do the split again.
> 5. The master got an event RS_ZK_REGION_SPLIT but the server gave an error like - "Received SPLIT for region R from server RS but it doesn't exist anymore,.."
> 6. The RS apparently did the split successfully this time, but is stuck on the master to delete the znode for the region. It kept on saying - "org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the master to process the split for R" and it was stuck there forever. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6152) Split abort is not handled properly

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-6152:
------------------------------------------

    Attachment: HBASE-6152_0.94.patch

Test case to reproduce this issue.  Infact this will happen in 0.92.0 and 0.92.1 and not in the latest code in 0.92 or 0.94.
Now the current code 
{code}
if (rs.isSplit() || rs.isSplitting()) {
{code}
does not have this line.  So it should not create a problem here.  It was removed as part of HBASE-6070. Prior to this it could have happened.  The same has been reproduced in the testcase.
                
> Split abort is not handled properly
> -----------------------------------
>
>                 Key: HBASE-6152
>                 URL: https://issues.apache.org/jira/browse/HBASE-6152
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>         Attachments: HBASE-6152_0.94.patch
>
>
> I ran into this:
> 1. RegionServer started to split a region(R), but the split was taking a long time, and hence the split was aborted
> 2. As part of cleanup, the RS deleted the ZK node that it created initially for R
> 3. The master (AssignmentManager) noticed the node deletion, and made R offline
> 4. The RS recovered from the failure, and at some point of time, tried to do the split again.
> 5. The master got an event RS_ZK_REGION_SPLIT but the server gave an error like - "Received SPLIT for region R from server RS but it doesn't exist anymore,.."
> 6. The RS apparently did the split successfully this time, but is stuck on the master to delete the znode for the region. It kept on saying - "org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the master to process the split for R" and it was stuck there forever. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-6152) Split abort is not handled properly

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HBASE-6152:
-------------------------------

    Description: 
I ran into this:
1. RegionServer started to split a region(R), but the split was taking a long time, and hence the split was aborted
2. As part of cleanup, the RS deleted the ZK node that it created initially for R
3. The master (AssignmentManager) noticed the node deletion, and made R offline
4. The RS recovered from the failure, and at some point of time, tried to do the split again.
5. The master got an event RS_ZK_REGION_SPLIT but the server gave an error like - "Received SPLIT for region R from server RS but it doesn't exist anymore,.."
6. The RS apparently did the split successfully this time, but is stuck on the master to delete the znode for the region. It kept on saying - "org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the master to process the split for R" and it was stuck there forever. 

  was:
I ran into this:
1. RegionServer started to split a region(R), but the split was taking a long time, and hence the split was aborted
2. As part of cleanup, the RS deleted the ZK node that it created initially for R
3. The master (AssignmentManager) noticed the node deletion, and made R offline
4. The RS recovered from the failure, and at some point of time, tried to do the split again.
5. The master got an event RS_ZK_REGION_SPLITTING but the server gave an error like - "Received SPLIT for region R from server RS but it doesn't exist anymore,.."
6. The RS apparently did the split successfully this time, but is stuck on the master to delete the znode for the region. It kept on saying - "org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the master to process the split for R" and it was stuck there forever. 

    
> Split abort is not handled properly
> -----------------------------------
>
>                 Key: HBASE-6152
>                 URL: https://issues.apache.org/jira/browse/HBASE-6152
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: Devaraj Das
>            Assignee: Devaraj Das
>
> I ran into this:
> 1. RegionServer started to split a region(R), but the split was taking a long time, and hence the split was aborted
> 2. As part of cleanup, the RS deleted the ZK node that it created initially for R
> 3. The master (AssignmentManager) noticed the node deletion, and made R offline
> 4. The RS recovered from the failure, and at some point of time, tried to do the split again.
> 5. The master got an event RS_ZK_REGION_SPLIT but the server gave an error like - "Received SPLIT for region R from server RS but it doesn't exist anymore,.."
> 6. The RS apparently did the split successfully this time, but is stuck on the master to delete the znode for the region. It kept on saying - "org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the master to process the split for R" and it was stuck there forever. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira