You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org> on 2012/05/23 15:17:41 UTC
[jira] [Created] (HBASE-6070) AM.nodeDeleted and SSH races creating
problems for regions under SPLIT
ramkrishna.s.vasudevan created HBASE-6070:
---------------------------------------------
Summary: AM.nodeDeleted and SSH races creating problems for regions under SPLIT
Key: HBASE-6070
URL: https://issues.apache.org/jira/browse/HBASE-6070
Project: HBase
Issue Type: Bug
Affects Versions: 0.94.0, 0.92.1
Reporter: ramkrishna.s.vasudevan
Fix For: 0.92.2, 0.96.0, 0.94.1
We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
While doing some more we found still there is one race condition.
-> Split has just started and the znode is in RS_SPLIT state.
-> RS goes down.
-> First call back for SSH comes.
-> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
-> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
-> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating
problems for regions under SPLIT
Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ramkrishna.s.vasudevan updated HBASE-6070:
------------------------------------------
Status: Open (was: Patch Available)
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.94.0, 0.92.1
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_trunk.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating
problems for regions under SPLIT
Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ramkrishna.s.vasudevan updated HBASE-6070:
------------------------------------------
Status: Patch Available (was: Open)
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.94.0, 0.92.1
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.94.patch, HBASE-6070_trunk.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races
creating problems for regions under SPLIT
Posted by "stack (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485262#comment-13485262 ]
stack commented on HBASE-6070:
------------------------------
[~tychang] Would you mind making a new issue to remove the dead code? Thank you.
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.94.1, 0.96.0
>
> Attachments: HBASE-6070_0.92_1.patch, HBASE-6070_0.92.patch, HBASE-6070_0.94_1.patch, HBASE-6070_0.94.patch, HBASE-6070_trunk_1.patch, HBASE-6070_trunk.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (HBASE-6070) AM.nodeDeleted and SSH races creating
problems for regions under SPLIT
Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lars Hofhansl closed HBASE-6070.
--------------------------------
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.94.1, 0.96.0
>
> Attachments: HBASE-6070_0.92_1.patch, HBASE-6070_0.92.patch, HBASE-6070_0.94_1.patch, HBASE-6070_0.94.patch, HBASE-6070_trunk_1.patch, HBASE-6070_trunk.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating
problems for regions under SPLIT
Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ramkrishna.s.vasudevan updated HBASE-6070:
------------------------------------------
Attachment: HBASE-6070_0.94_1.patch
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races
creating problems for regions under SPLIT
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283652#comment-13283652 ]
Hudson commented on HBASE-6070:
-------------------------------
Integrated in HBase-0.94 #217 (See [https://builds.apache.org/job/HBase-0.94/217/])
HBASE-6070 AM.nodeDeleted and SSH races creating problems for regions under SPLIT (Ramkrishna) (Revision 1342725)
Result = FAILURE
ramkrishna :
Files :
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating
problems for regions under SPLIT
Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ramkrishna.s.vasudevan updated HBASE-6070:
------------------------------------------
Attachment: (was: HBASE-6070_trunk_1.patch)
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races
creating problems for regions under SPLIT
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282483#comment-13282483 ]
Hadoop QA commented on HBASE-6070:
----------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12528962/HBASE-6070_trunk.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 6 new or modified tests.
+1 hadoop23. The patch compiles against the hadoop 0.23.x profile.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
-1 findbugs. The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.client.TestFromClientSide
Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1981//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1981//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1981//console
This message is automatically generated.
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.94.patch, HBASE-6070_trunk.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating
problems for regions under SPLIT
Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ramkrishna.s.vasudevan updated HBASE-6070:
------------------------------------------
Attachment: HBASE-6070_0.92.patch
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.94.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating
problems for regions under SPLIT
Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ramkrishna.s.vasudevan updated HBASE-6070:
------------------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races
creating problems for regions under SPLIT
Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283121#comment-13283121 ]
ramkrishna.s.vasudevan commented on HBASE-6070:
-----------------------------------------------
@Ted
TestServerCustomProtocol.testSingleMethod() passes with the patch. I saw that even in someother precommit build the same has failed.
-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.regionserver.TestServerCustomProtocol
Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1993//testReport/
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating
problems for regions under SPLIT
Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ramkrishna.s.vasudevan updated HBASE-6070:
------------------------------------------
Hadoop Flags: Reviewed
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races
creating problems for regions under SPLIT
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283658#comment-13283658 ]
Hudson commented on HBASE-6070:
-------------------------------
Integrated in HBase-TRUNK #2922 (See [https://builds.apache.org/job/HBase-TRUNK/2922/])
HBASE-6070 AM.nodeDeleted and SSH races creating problems for regions under SPLIT (Ramkrishna) (Revision 1342724)
Result = FAILURE
ramkrishna :
Files :
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/Mocking.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races
creating problems for regions under SPLIT
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283989#comment-13283989 ]
Hudson commented on HBASE-6070:
-------------------------------
Integrated in HBase-0.94-security #32 (See [https://builds.apache.org/job/HBase-0.94-security/32/])
HBASE-6070 AM.nodeDeleted and SSH races creating problems for regions under SPLIT (Ramkrishna) (Revision 1342725)
Result = SUCCESS
ramkrishna :
Files :
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
* /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating
problems for regions under SPLIT
Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ramkrishna.s.vasudevan updated HBASE-6070:
------------------------------------------
Attachment: HBASE-6070_0.92_1.patch
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_trunk.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating
problems for regions under SPLIT
Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ramkrishna.s.vasudevan updated HBASE-6070:
------------------------------------------
Attachment: HBASE-6070_trunk_1.patch
Just reattaching the patch.
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races
creating problems for regions under SPLIT
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283752#comment-13283752 ]
Hudson commented on HBASE-6070:
-------------------------------
Integrated in HBase-0.92 #421 (See [https://builds.apache.org/job/HBase-0.92/421/])
HBASE-6070 AM.nodeDeleted and SSH races creating problems for regions under SPLIT (Ramkrishna) (Revision 1342727)
Result = FAILURE
ramkrishna :
Files :
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races
creating problems for regions under SPLIT
Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283599#comment-13283599 ]
ramkrishna.s.vasudevan commented on HBASE-6070:
-----------------------------------------------
Committed to trunk, 0.94 and 0.92.
Thanks for the review Ted.
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating
problems for regions under SPLIT
Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ramkrishna.s.vasudevan updated HBASE-6070:
------------------------------------------
Attachment: HBASE-6070_trunk_1.patch
Updated patches fixing the comments. I tried running the failed testcase. It passed every time.
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races
creating problems for regions under SPLIT
Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282818#comment-13282818 ]
Zhihong Yu commented on HBASE-6070:
-----------------------------------
+1 on patch v2.
You may want to verify that the failed test below wasn't related to this change:
https://builds.apache.org/job/PreCommit-HBASE-Build/1987/console
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HBASE-6070) AM.nodeDeleted and SSH races
creating problems for regions under SPLIT
Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ramkrishna.s.vasudevan reassigned HBASE-6070:
---------------------------------------------
Assignee: ramkrishna.s.vasudevan
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating
problems for regions under SPLIT
Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ramkrishna.s.vasudevan updated HBASE-6070:
------------------------------------------
Status: Patch Available (was: Open)
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.94.0, 0.92.1
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races
creating problems for regions under SPLIT
Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13281678#comment-13281678 ]
ramkrishna.s.vasudevan commented on HBASE-6070:
-----------------------------------------------
I plan to make the following change in AM.nodeDeleted. Currently as SSH is trying to handle the RIT in splitting state doing the same in AM.nodeDeleted leads to race.
{code}
- if (rs.isSplitting() || rs.isSplit()) {
+ if (rs.isSplit()) {
LOG.debug("Ephemeral node deleted, regionserver crashed?, " +
"clearing from RIT; rs=" + rs);
regionOffline(rs.getRegion());
{code}
Pls provide your suggestions.
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races
creating problems for regions under SPLIT
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287233#comment-13287233 ]
Hudson commented on HBASE-6070:
-------------------------------
Integrated in HBase-0.92-security #109 (See [https://builds.apache.org/job/HBase-0.92-security/109/])
HBASE-6070 AM.nodeDeleted and SSH races creating problems for regions under SPLIT (Ramkrishna) (Revision 1342727)
Result = SUCCESS
ramkrishna :
Files :
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races
creating problems for regions under SPLIT
Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13282516#comment-13282516 ]
Zhihong Yu commented on HBASE-6070:
-----------------------------------
{code}
+ // but the RS had went down before completing the split process then will not try to
{code}
'had went down' -> 'had gone down'
{code}
+ if(response == null) return null;
{code}
Space after 'if'
{code}
+ static Result getMetaTableRowResultAsSplittedRegion(final HRegionInfo hri, final ServerName sn)
{code}
The method should be called getMetaTableRowResultAsSplitRegion().
Should investigate the test failure in TestFromClientSide
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.94.patch, HBASE-6070_trunk.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races
creating problems for regions under SPLIT
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283830#comment-13283830 ]
Hudson commented on HBASE-6070:
-------------------------------
Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #16 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/16/])
HBASE-6070 AM.nodeDeleted and SSH races creating problems for regions under SPLIT (Ramkrishna) (Revision 1342724)
Result = FAILURE
ramkrishna :
Files :
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/protobuf/ProtobufUtil.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/Mocking.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestAssignmentManager.java
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating
problems for regions under SPLIT
Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ramkrishna.s.vasudevan updated HBASE-6070:
------------------------------------------
Attachment: HBASE-6070_trunk.patch
Uploaded patches for all branches. Tested in cluster including scenarios for HBASE-5806. Pls review and provide your comments.
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.94.patch, HBASE-6070_trunk.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating
problems for regions under SPLIT
Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ramkrishna.s.vasudevan updated HBASE-6070:
------------------------------------------
Attachment: HBASE-6070_0.94.patch
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.94.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating
problems for regions under SPLIT
Posted by "Tianying Chang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tianying Chang updated HBASE-6070:
----------------------------------
@stack
Thanks. I want to get some second opinion from others. I guess it is better to do this by opening a separate jira. I have created HBASE-7058 for this purpose. If other people found no other potential problem, I can provide patch.
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.94.1, 0.96.0
>
> Attachments: HBASE-6070_0.92_1.patch, HBASE-6070_0.92.patch, HBASE-6070_0.94_1.patch, HBASE-6070_0.94.patch, HBASE-6070_trunk_1.patch, HBASE-6070_trunk.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6070) AM.nodeDeleted and SSH races
creating problems for regions under SPLIT
Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13283123#comment-13283123 ]
Zhihong Yu commented on HBASE-6070:
-----------------------------------
All right.
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6070_0.92.patch, HBASE-6070_0.92_1.patch, HBASE-6070_0.94.patch, HBASE-6070_0.94_1.patch, HBASE-6070_trunk.patch, HBASE-6070_trunk_1.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6070) AM.nodeDeleted and SSH races creating
problems for regions under SPLIT
Posted by "Tianying Chang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HBASE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tianying Chang updated HBASE-6070:
----------------------------------
@ram,
I am reading the code related to region split. I feel that this code below in AssignmentManager seems to be dead code. Because 1) I don't see any place that callls to update the regionState to be State.SPLIT. 2) for scenario when region has already been split and RS crashed, ServerShutdownHandler should have already taken care of it. Am I missing something here. Thanks
if (rs.isSplit()) {
LOG.debug("Ephemeral node deleted, regionserver crashed?, " +
"clearing from RIT; rs=" + rs);
regionOffline(rs.getRegion());
> AM.nodeDeleted and SSH races creating problems for regions under SPLIT
> ----------------------------------------------------------------------
>
> Key: HBASE-6070
> URL: https://issues.apache.org/jira/browse/HBASE-6070
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1, 0.94.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.94.1, 0.96.0
>
> Attachments: HBASE-6070_0.92_1.patch, HBASE-6070_0.92.patch, HBASE-6070_0.94_1.patch, HBASE-6070_0.94.patch, HBASE-6070_trunk_1.patch, HBASE-6070_trunk.patch
>
>
> We tried to address the problems in Master restart and RS restart while SPLIT region is in progress as part of HBASE-5806.
> While doing some more we found still there is one race condition.
> -> Split has just started and the znode is in RS_SPLIT state.
> -> RS goes down.
> -> First call back for SSH comes.
> -> As part of the fix for HBASE-5806 SSH knows that some region is in RIT.
> -> But now nodeDeleted event comes for the SPLIt node and there we try to delete the RIT.
> -> After this we try to see in the SSH whether any node is in RIT. As we dont find the region in RIT the region is never assigned.
> When we fixed HBASE-5806 step 6 happened first and then step 5 happened. So we missed it. Now we found that. Will come up with a patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira