You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "nkeywal (JIRA)" <ji...@apache.org> on 2012/05/03 16:24:49 UTC

[jira] [Created] (HBASE-5926) Delete the master znode after a znode crash

nkeywal created HBASE-5926:
------------------------------

             Summary: Delete the master znode after a znode crash
                 Key: HBASE-5926
                 URL: https://issues.apache.org/jira/browse/HBASE-5926
             Project: HBase
          Issue Type: Improvement
          Components: master, scripts
    Affects Versions: 0.96.0
            Reporter: nkeywal
            Assignee: nkeywal
            Priority: Minor


This is the continuation of the work done in HBASE-5844.
But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.

So if we apply the same strategy as for a regionserver, we may have this scenario:
1) Master starts
2) Backup master starts
3) Master dies
4) ZK detects it
5) Backup master receives the update from ZK
6) Backup master creates the new master node and become the main master
7) Previous master script continues
8) Previous master script delete the master node in ZK
9) => issue: we deleted the node just created by the new master

This should not happen often (usually the znode will be delete soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278682#comment-13278682 ] 

Hadoop QA commented on HBASE-5926:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12528018/5926.v13.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1927//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1927//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1927//console

This message is automatically generated.
                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279042#comment-13279042 ] 

Hadoop QA commented on HBASE-5926:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12528104/5926.v14.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1933//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1933//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1933//console

This message is automatically generated.
                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v14.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277180#comment-13277180 ] 

Zhihong Yu commented on HBASE-5926:
-----------------------------------

I cannot find CleanZnode class in the patch.
                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v6.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5926:
---------------------------

    Status: Open  (was: Patch Available)
    
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277971#comment-13277971 ] 

Zhihong Yu commented on HBASE-5926:
-----------------------------------

Please add javadoc for the new class:
{code}
+public class CleanZnode {
{code}

readMyEphemeralNodeOnDisk() throws IOException but writeMyEphemeralNodeOnDisk() doesn't. What was the reason ?
{code}
+   * Get the name of the file used to store the znode
{code}
Please add ' contents' at the end of the above.
{code}
+  public static int cleanZNode(Configuration conf) {
+    conf.setInt("zookeeper.recovery.retry", 0);
{code}
Should the setting be restored before exiting the above method ?
                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v6.patch, 5926.v8.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277702#comment-13277702 ] 

nkeywal commented on HBASE-5926:
--------------------------------

These tests run ok locally.
                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v6.patch, 5926.v8.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278967#comment-13278967 ] 

Zhihong Yu commented on HBASE-5926:
-----------------------------------

Minor comments:
{code}
+ * servers. It allows to delete immediately the znode when the master or the regions server crash.
{code}
'regions server crash' -> 'region server crashes'
{code}
+ * The region server / master write a specific file when they start / become main master. When they
{code}
'write a specific' -> 'writes a specific'. 'they start / become' -> 'it starts / becomes'
I think using 'they' is confusing because region server and master have different roles.
{code}
+    " clear  Delete the master znode in ZooKeeper after a master crash\n "+
{code}
'master crash' -> 'master crashes'

                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5926:
---------------------------

    Attachment: 5926.v11.patch
    
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278654#comment-13278654 ] 

nkeywal commented on HBASE-5926:
--------------------------------

v13 should do it...
                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279040#comment-13279040 ] 

Zhihong Yu commented on HBASE-5926:
-----------------------------------

ZNodeClearer.java isn't in source repo.
                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v14.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278065#comment-13278065 ] 

Zhihong Yu commented on HBASE-5926:
-----------------------------------

@Stack:
Can you comment on the latest patch ?
                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278305#comment-13278305 ] 

Zhihong Yu commented on HBASE-5926:
-----------------------------------

@N:
Did you forget to include ZNodeClearer class in the latest patch ?
                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278342#comment-13278342 ] 

stack commented on HBASE-5926:
------------------------------

Its fine putting it in file.  And what Ted said.  Thanks N.
                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278259#comment-13278259 ] 

nkeywal commented on HBASE-5926:
--------------------------------

Yes, it could be the node name only. Does it make a difference? To me, they can both be safely written in the fs.
I check the version in v11, so there is no race condition at all now.

                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277190#comment-13277190 ] 

Zhihong Yu commented on HBASE-5926:
-----------------------------------

{code}
+    "Usage: Master [opts] start|stop|cleanZNode\n" +
{code}
Please add document for cleanZNode command.
{code}
+   * delete the znode master if its content is same to the parameter
{code}
'znode master' -> 'master znode', 'same to' -> 'same as'
{code}
+    } catch (KeeperException ignore) {
+    } catch (IOException ignore) {
+    }
{code}
I would expect some logging for above cases.

                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v6.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5926:
---------------------------

    Status: Patch Available  (was: Open)
    
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v14.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279048#comment-13279048 ] 

stack commented on HBASE-5926:
------------------------------

Fixed.  Thanks Ted.
                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v14.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277684#comment-13277684 ] 

nkeywal commented on HBASE-5926:
--------------------------------

v8. with Ted's comments taken into account.
                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v6.patch, 5926.v8.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5926:
---------------------------

    Status: Open  (was: Patch Available)
    
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277997#comment-13277997 ] 

Zhihong Yu commented on HBASE-5926:
-----------------------------------

bq. I now clone the conf.
That is safer, avoiding race condition w.r.t. the original conf.
                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5926:
---------------------------

    Attachment: 5926.v10.patch
    
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277696#comment-13277696 ] 

Hadoop QA commented on HBASE-5926:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12527816/5926.v8.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.replication.TestReplication
                  org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1909//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1909//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1909//console

This message is automatically generated.
                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v6.patch, 5926.v8.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5926:
---------------------------

    Attachment: 5926.v14.patch
    
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v14.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277996#comment-13277996 ] 

nkeywal commented on HBASE-5926:
--------------------------------

bq. javadoc
done.

bq. readMyEphemeralNodeOnDisk() throws IOException but writeMyEphemeralNodeOnDisk() doesn't. What was the reason ?
When we write we ignore the results (i.e. we don't stop the master or the region server if we can't store the znode, we just continue). When we read, we're interested in the exception: the pattern in HMasterCommandLine is to return -1 on error.

bq. Please add ' contents' at the end of the above.
ok.

bq. Should the setting be restored before exiting the above method ?
I now clone the conf.

                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v6.patch, 5926.v8.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278919#comment-13278919 ] 

nkeywal commented on HBASE-5926:
--------------------------------

I think it's ok, I don't have this locally...
                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5926:
---------------------------

    Fix Version/s: 0.96.0
           Status: Patch Available  (was: Open)
    
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v6.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5926:
---------------------------

    Status: Open  (was: Patch Available)
    
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v14.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277172#comment-13277172 ] 

nkeywal commented on HBASE-5926:
--------------------------------

the race condition is decreased to a production-acceptable minimum imho. We do a compare & delete in the java code, so the race condition is now: between the comparison and the delete, we fail if, and only if: the session expires and the master node is deleted and the master backup recreates the node. That's unlikely. 
                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v6.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5926:
---------------------------

    Status: Patch Available  (was: Open)
    
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5926:
---------------------------

    Summary: Delete the master znode after a master crash  (was: Delete the master znode after a znode crash)
    
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279000#comment-13279000 ] 

stack commented on HBASE-5926:
------------------------------

+1 on patch v14
                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v14.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5926:
---------------------------

    Status: Patch Available  (was: Open)
    
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279005#comment-13279005 ] 

stack commented on HBASE-5926:
------------------------------

I'll fix Ted comments on commit.
                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v14.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5926:
---------------------------

    Status: Open  (was: Patch Available)
    
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5926:
---------------------------

    Attachment: 5926.v6.patch
    
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>         Attachments: 5926.v6.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5926:
---------------------------

    Status: Patch Available  (was: Open)
    
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278052#comment-13278052 ] 

Hadoop QA commented on HBASE-5926:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12527860/5926.v9.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 33 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1912//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1912//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1912//console

This message is automatically generated.
                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5926:
---------------------------

    Attachment: 5926.v13.patch
    
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5926:
---------------------------

    Status: Patch Available  (was: Open)
    
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5926:
---------------------------

    Attachment: 5926.v8.patch
    
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v6.patch, 5926.v8.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278180#comment-13278180 ] 

stack commented on HBASE-5926:
------------------------------

bq. We need the content. For the regionserver, the content is the znode path. For the master it's the full ServerName (stringified).

Does it have to be full path?  Can it not just be the node name?  Thats safe to put in the fs?  Master servername stringified should be safe in the fs too.

Yeah, version.
                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5926:
---------------------------

    Status: Open  (was: Patch Available)
    
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v6.patch, 5926.v8.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278078#comment-13278078 ] 

stack commented on HBASE-5926:
------------------------------

CleanZNode should be in the tool package or in zookeeper (probably the latter since thats its dependencies)  ... hmmm.. but hangon, I see why you have it here... because its used by master and regionserver packages.  Thats fine.  Its in the right place I'd say.

You should look at the javadoc that is created from your src.  Its going to be a jumble.  Check it out.  You need a little bit of html in there at least for your list of strategy dependencies.

What is the filecontent?  We don't need any, right?  The name of the file is enough?

This should be boolean rather than int?  Or is it returned to shell?  If so, should say so in the comment: "+   * @return if done returns 0 else -1."

Is CleanZNode a good name?  How about ZNodeCleaner or ZNodeClearer or CrashZNodeCleaner?

I think in HMasterCommandLine, should be start|stop|clear so it fits format of the other commands.

In MasterAddressTracker, can you get the znode sequence id and only delete if the sequence id  matches?

                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279031#comment-13279031 ] 

Hudson commented on HBASE-5926:
-------------------------------

Integrated in HBase-TRUNK #2899 (See [https://builds.apache.org/job/HBase-TRUNK/2899/])
    HBASE-5926 Delete the master znode after a master crash (Revision 1340185)

     Result = FAILURE
stack : 
Files : 
* /hbase/trunk/bin/hbase-daemon.sh
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MasterAddressTracker.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperNodeTracker.java

                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v14.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5926:
---------------------------

    Status: Patch Available  (was: Open)
    
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v6.patch, 5926.v8.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-5926:
-------------------------

      Resolution: Fixed
    Hadoop Flags: Reviewed
          Status: Resolved  (was: Patch Available)

Looks like N already addressed Ted's review comments.  Great.

Applied to trunk.  Thanks for the patch N.
                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v14.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279122#comment-13279122 ] 

Hudson commented on HBASE-5926:
-------------------------------

Integrated in HBase-TRUNK #2900 (See [https://builds.apache.org/job/HBase-TRUNK/2900/])
    HBASE-5926 Delete the master znode after a master crash (Revision 1340200)

     Result = SUCCESS
stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ZNodeClearer.java

                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v14.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278312#comment-13278312 ] 

Zhihong Yu commented on HBASE-5926:
-----------------------------------

{code}
+    } catch (KeeperException e) {
+      LOG.info("Can't get or delete the master znode", e);
+    } catch (DeserializationException e) {
+      LOG.info("Can't get or delete the master znode", e);
+    }
{code}
I think the above log should be at WARN level.
                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278147#comment-13278147 ] 

nkeywal commented on HBASE-5926:
--------------------------------

bq. You should look at the javadoc that is created from your src. Its going to be a jumble. Check it out. You need a little bit of html in there at least for your list of strategy dependencies.
Done.

bq. What is the filecontent? We don't need any, right? The name of the file is enough?
We need the content. For the regionserver, the content is the znode path. For the master it's the full ServerName (stringified).

bq. This should be boolean rather than int? Or is it returned to shell? If so, should say so in the comment: "+ * @return if done returns 0 else -1."
Done.

bq. Is CleanZNode a good name? How about ZNodeCleaner or ZNodeClearer or CrashZNodeCleaner?
Renamed to ZNodeClearer 

bq. I think in HMasterCommandLine, should be start|stop|clear so it fits format of the other commands.
Done.

bq. In MasterAddressTracker, can you get the znode sequence id and only delete if the sequence id matches?
We store the full ServerName so if there is a restart we will see it. But maybe you're speaking about the znode version? Because I looked at the zk api, and with the version we could remove totally the race condition...

                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5926:
---------------------------

    Status: Open  (was: Patch Available)
    
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13279333#comment-13279333 ] 

Hudson commented on HBASE-5926:
-------------------------------

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #10 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/10/])
    HBASE-5926 Delete the master znode after a master crash (Revision 1340200)
HBASE-5926 Delete the master znode after a master crash (Revision 1340185)

     Result = FAILURE
stack : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/ZNodeClearer.java

stack : 
Files : 
* /hbase/trunk/bin/hbase-daemon.sh
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMasterCommandLine.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MasterAddressTracker.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZooKeeperNodeTracker.java

                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v14.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5926) Delete the master znode after a master crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5926:
---------------------------

    Attachment: 5926.v9.patch
    
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5926) Delete the master znode after a master crash

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506801#comment-13506801 ] 

Jean-Daniel Cryans commented on HBASE-5926:
-------------------------------------------

This jira has the odd side-effect of printing out a lot of garbage when running in standalone and killing it with -9, gist of it being:

{noformat}
2012-11-29 13:08:27,227 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
2012-11-29 13:08:27,227 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 0 retries
2012-11-29 13:08:27,227 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: clean znode for master Unable to get data of znode /hbase/master
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/master
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1131)
        at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:291)
        at org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:562)
        at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:168)
        at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:150)
        at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:110)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:78)
        at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2298)
{noformat}

Basically the znode cleaner fails hard because ZK is offline.

I was confused to see more logs being printed out after running the kill.
                
> Delete the master znode after a master crash
> --------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>             Fix For: 0.96.0
>
>         Attachments: 5926.v10.patch, 5926.v11.patch, 5926.v13.patch, 5926.v14.patch, 5926.v6.patch, 5926.v8.patch, 5926.v9.patch
>
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5926) Delete the master znode after a znode crash

Posted by "nkeywal (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nkeywal updated HBASE-5926:
---------------------------

    Description: 
This is the continuation of the work done in HBASE-5844.
But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.

So if we apply the same strategy as for a regionserver, we may have this scenario:
1) Master starts
2) Backup master starts
3) Master dies
4) ZK detects it
5) Backup master receives the update from ZK
6) Backup master creates the new master node and become the main master
7) Previous master script continues
8) Previous master script deletes the master node in ZK
9) => issue: we deleted the node just created by the new master

This should not happen often (usually the znode will be deleted soon enough), but it can happen.

  was:
This is the continuation of the work done in HBASE-5844.
But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.

So if we apply the same strategy as for a regionserver, we may have this scenario:
1) Master starts
2) Backup master starts
3) Master dies
4) ZK detects it
5) Backup master receives the update from ZK
6) Backup master creates the new master node and become the main master
7) Previous master script continues
8) Previous master script delete the master node in ZK
9) => issue: we deleted the node just created by the new master

This should not happen often (usually the znode will be delete soon enough), but it can happen.

    
> Delete the master znode after a znode crash
> -------------------------------------------
>
>                 Key: HBASE-5926
>                 URL: https://issues.apache.org/jira/browse/HBASE-5926
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>            Priority: Minor
>
> This is the continuation of the work done in HBASE-5844.
> But we can't apply exactly the same strategy: for the region server, there is a znode per region server, while for the master & backup master there is a single znode for both.
> So if we apply the same strategy as for a regionserver, we may have this scenario:
> 1) Master starts
> 2) Backup master starts
> 3) Master dies
> 4) ZK detects it
> 5) Backup master receives the update from ZK
> 6) Backup master creates the new master node and become the main master
> 7) Previous master script continues
> 8) Previous master script deletes the master node in ZK
> 9) => issue: we deleted the node just created by the new master
> This should not happen often (usually the znode will be deleted soon enough), but it can happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira