You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Matteo Bertozzi (Created) (JIRA)" <ji...@apache.org> on 2012/03/28 22:39:27 UTC

[jira] [Created] (HBASE-5666) RegionServer doesn't retry to check if base node is available

RegionServer doesn't retry to check if base node is available
-------------------------------------------------------------

                 Key: HBASE-5666
                 URL: https://issues.apache.org/jira/browse/HBASE-5666
             Project: HBase
          Issue Type: Bug
          Components: regionserver, zookeeper
            Reporter: Matteo Bertozzi
         Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log

I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
{code}
$HBASE_HOME/bin/start-hbase.sh
$HBASE_HOME/bin/local-regionservers.sh start 1 2 3
{code}

but the region servers are not able to start...
It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
{code}
2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
java.io.IOException: Received the shutdown message while waiting.
	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
	at java.lang.Thread.run(Thread.java:662)
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246835#comment-13246835 ] 

Hadoop QA commented on HBASE-5666:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12521401/HBASE-5666-v5.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper
                  org.apache.hadoop.hbase.client.TestInstantSchemaChangeSplit
                  org.apache.hadoop.hbase.mapreduce.TestImportTsv
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
                  org.apache.hadoop.hbase.mapreduce.TestTableMapReduce

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1395//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1395//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1395//console

This message is automatically generated.
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242734#comment-13242734 ] 

Matteo Bertozzi commented on HBASE-5666:
----------------------------------------

The exception seems to be present in 0.92 and trunk, but I've only looked at trunk code
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-5666:
-----------------------------------

    Attachment:     (was: HBASE-5666-v0.patch)
    
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250915#comment-13250915 ] 

stack commented on HBASE-5666:
------------------------------

@Matteo Mind trying what client does when no hbase root node?  If all is well, I'll commit this latest patch.
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>    Affects Versions: 0.92.1, 0.94.0, 0.96.0
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246804#comment-13246804 ] 

Hadoop QA commented on HBASE-5666:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12521398/HBASE-5666-v5.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.coprocessor.TestMasterObserver

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1393//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1393//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1393//console

This message is automatically generated.
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248646#comment-13248646 ] 

Hadoop QA commented on HBASE-5666:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12521711/HBASE-5666-v6.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1433//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1433//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1433//console

This message is automatically generated.
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-5666:
-----------------------------------

    Attachment: HBASE-5666-v3.patch
    
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250941#comment-13250941 ] 

stack commented on HBASE-5666:
------------------------------

Should we admend the message to include 'Check the master is running?' Otherwise I don't think the message bad.
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>    Affects Versions: 0.92.1, 0.94.0, 0.96.0
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248666#comment-13248666 ] 

Matteo Bertozzi commented on HBASE-5666:
----------------------------------------

Since the logical flow is:
 * HMaster start and create the base node
 * Region Servers wait for base node to be available

I prefer to use HBASE-5666-v6.patch adding the retry logic to ZKUtil.checkExists() in this way we've the "wait for base node step".

In the other case (@stack: fixing the ZookeeperWatcher) the first (master|RS) that arrive create the base node... but doesn't sounds right. 
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-5666:
-----------------------------------

    Attachment: HBASE-5666-v5.patch
    
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244560#comment-13244560 ] 

stack commented on HBASE-5666:
------------------------------

Thanks for digging.  Seems like the RecoverableZK is failing silently (smile).  Seriously, it may be retrying any ConnectionLossException but if no base dir up on in zk, there's nothing for ZKW to 'watch'... it should fail construction (or this test needs to be moved out to an init method or something ...).  What you reckon?
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243871#comment-13243871 ] 

Hadoop QA commented on HBASE-5666:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12520825/HBASE-5666-v1.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.util.TestHBaseFsck
                  org.apache.hadoop.hbase.TestZooKeeper

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1360//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1360//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1360//console

This message is automatically generated.
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-5666:
-----------------------------------

    Attachment:     (was: HBASE-5666-v5.patch)
    
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-5666:
-----------------------------------

    Attachment: HBASE-5666-v6.patch
    
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-5666:
-----------------------------------

    Affects Version/s: 0.96.0
                       0.94.0
                       0.92.1
    
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>    Affects Versions: 0.92.1, 0.94.0, 0.96.0
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-5666:
-----------------------------------

    Attachment: HBASE-5666-v5.patch
    
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251377#comment-13251377 ] 

Hadoop QA commented on HBASE-5666:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12522213/HBASE-5666-v8.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.master.TestSplitLogManager
                  org.apache.hadoop.hbase.client.TestFromClientSide

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1475//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1475//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1475//console

This message is automatically generated.
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>    Affects Versions: 0.92.1, 0.94.0, 0.96.0
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-0.92.patch, HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, HBASE-5666-v8.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5666:
------------------------------

    Comment: was deleted

(was: -1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12521401/HBASE-5666-v5.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 3 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper
                  org.apache.hadoop.hbase.client.TestInstantSchemaChangeSplit
                  org.apache.hadoop.hbase.mapreduce.TestImportTsv
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
                  org.apache.hadoop.hbase.mapreduce.TestTableMapReduce

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1395//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1395//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1395//console

This message is automatically generated.)
    
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>    Affects Versions: 0.92.1, 0.94.0, 0.96.0
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-0.92.patch, HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, HBASE-5666-v8.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245795#comment-13245795 ] 

Zhihong Yu commented on HBASE-5666:
-----------------------------------

Interesting idea above.
+1 on removing special case of timeout == 0.
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-5666:
-----------------------------------

    Attachment: hbase-zookeeper.log
                hbase-regionserver.log
                hbase-master.log
                hbase-3-regionserver.log
                hbase-2-regionserver.log
                hbase-1-regionserver.log
    
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>         Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250927#comment-13250927 ] 

Hadoop QA commented on HBASE-5666:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12522134/HBASE-5666-v7.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1466//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1466//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1466//console

This message is automatically generated.
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>    Affects Versions: 0.92.1, 0.94.0, 0.96.0
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250889#comment-13250889 ] 

stack commented on HBASE-5666:
------------------------------

bq. If the client comes up during this time I think that should crash anyway because the HRegion is still in the initialize() method...

You might try it?

bq. but recoverableZookeeper.exists() retries in case of CONNECTIONLOSS, SESSIONEXPIRED and OPERATIONTIMEOUT.

Thats fine I'd say.  We want that.  We want it to actually get through the above and get to zk to check whether base node exists.

Otherwise I think the patch good.  Does this need to be public?  +  public boolean checkIfBaseNodeAvailable(int timeout) {?

                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>    Affects Versions: 0.92.1, 0.94.0, 0.96.0
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250868#comment-13250868 ] 

Matteo Bertozzi commented on HBASE-5666:
----------------------------------------

The SOCKET_RETRY_WAIT_MS is 200ms but yes, is better sleeping with interrupt since the code can accept interrupt. The only real difference is that you've to wait the timeout if you want kill the inizialization.

The retry loop is tricky to understand since RecoverableZookeeper is used...
So if you give 0 as timeout, you're supposed to try once... 
but recoverableZookeeper.exists() retries in case of CONNECTIONLOSS, SESSIONEXPIRED and OPERATIONTIMEOUT.
The idea here is to retry for x millisec until znode become available while (recoverableZookeeper.exists() == null)

If the client comes up during this time I think that should crash anyway because the HRegion is still in the initialize() method...
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>    Affects Versions: 0.92.1, 0.94.0, 0.96.0
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251351#comment-13251351 ] 

stack commented on HBASE-5666:
------------------------------

Yes.  That looks good.  Let me retry against hadoopqa.  A test hung above.
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>    Affects Versions: 0.92.1, 0.94.0, 0.96.0
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-0.92.patch, HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Uma Maheswara Rao G (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13249141#comment-13249141 ] 

Uma Maheswara Rao G commented on HBASE-5666:
--------------------------------------------

Wrong JIRA...please ignore my previous comment. It was for HBASE-5745
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-5666:
-----------------------------------

    Resolution: Duplicate
        Status: Resolved  (was: Patch Available)

Fixed with HBASE-5849
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>    Affects Versions: 0.92.1, 0.94.0, 0.96.0
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-0.92.patch, HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, HBASE-5666-v8.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243845#comment-13243845 ] 

Ted Yu commented on HBASE-5666:
-------------------------------

{code}
+    if (keeperEx != null)
+      throw keeperEx;
{code}
Please either lift the throw to the same line as if or add curly braces.
{code}
+        checkExists(zk, parentZNode, maxTimeMs);
+        LOG.info("Parent znode exists: " + parentZNode);
{code}
If checkExists() returns -1, would the log statement still be true ?
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243239#comment-13243239 ] 

Zhihong Yu commented on HBASE-5666:
-----------------------------------

Refactoring ZKUtil.waitForBaseZNode() so that it can be used by region server and the test would be good.
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-5666:
-----------------------------------

    Attachment:     (was: HBASE-5666-v5.patch)
    
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242684#comment-13242684 ] 

stack commented on HBASE-5666:
------------------------------

We used to retry our very first zk operation a few times IIRC.
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-5666:
-----------------------------------

    Attachment: HBASE-5666-v4.patch
    
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-5666:
-----------------------------------

    Attachment: HBASE-5666-0.92.patch
    
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>    Affects Versions: 0.92.1, 0.94.0, 0.96.0
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-0.92.patch, HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-5666:
-----------------------------------

    Attachment: HBASE-5666-v1.patch
    
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245759#comment-13245759 ] 

Matteo Bertozzi commented on HBASE-5666:
----------------------------------------

The problem that I see with the patch v4 and the if (timeout == 0) special case is that exists() is different for ZooKeeper and RecoverableZookeeper.

RecoverableZookeeper has some internal retry logic for CONNECTIONLOSS, SESSIONEXPIRED, and OPERATIONTIMEOUT, to keep the code simple we can add this logic in ZKUtil.checkExist() in this way we can remove the special case, and remove the code in RecoverabeZK.
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243855#comment-13243855 ] 

Matteo Bertozzi commented on HBASE-5666:
----------------------------------------

@Ted woo good catch
I've just "translated" the method without thinking...
and this "simplified" version emphasizes a problem already present in the previous version.
If you take a look at the original version, (the LOG.info is under if, ok)
but what happens if the method return and the znode is not available?
no exception is raised... but I think that the caller of waitForXyz() expect some exception in case of timeout, in the other case the value that I'm looking for must be present...
(this function is just called by one test)
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13240757#comment-13240757 ] 

Matteo Bertozzi commented on HBASE-5666:
----------------------------------------

@Zhihong Yu:
Probably a retry loop is the simplest solution...
But retry for how long? an infinite loop hoping that the node become available seems wrong, maybe we can add another parameter to define a ZK_RETRY_TIMEOUT.
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>         Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13246764#comment-13246764 ] 

Zhihong Yu commented on HBASE-5666:
-----------------------------------

For patch v5, please add javadoc for:
{code}
+  public static int checkExists(ZooKeeperWatcher zkw, String znode, int timeout)
{code}
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251056#comment-13251056 ] 

Hadoop QA commented on HBASE-5666:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12522165/HBASE-5666-0.92.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The patch appears to cause mvn compile goal to fail.

    -1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
     

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1469//testReport/
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1469//console

This message is automatically generated.
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>    Affects Versions: 0.92.1, 0.94.0, 0.96.0
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-0.92.patch, HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-5666:
-----------------------------------

    Attachment: HBASE-5666-v2.patch
    
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-5666:
-----------------------------------

    Attachment: HBASE-5666-v7.patch
    
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>    Affects Versions: 0.92.1, 0.94.0, 0.96.0
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-5666:
-----------------------------------

    Attachment: HBASE-5666-v0.patch

Patch attached to retry only on HRegionServer . Using "hbase.basenode.avail.timeout" as conf key.
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v0.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log, zk-exists-refactor-v0.patch
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251352#comment-13251352 ] 

stack commented on HBASE-5666:
------------------------------

I see that you don't have the above change in v7.  You want to add it in a v8 and retry hadoopqa?   Thanks Matteo.
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>    Affects Versions: 0.92.1, 0.94.0, 0.96.0
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-0.92.patch, HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250933#comment-13250933 ] 

Matteo Bertozzi commented on HBASE-5666:
----------------------------------------

@Stack The client crash with this message
(easy way to test put a long sleep in ZKUtil.createAndFailSilent() on /hbase node)
{code}
12/04/10 20:29:36 ERROR client.HConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
{code}
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>    Affects Versions: 0.92.1, 0.94.0, 0.96.0
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244120#comment-13244120 ] 

Hadoop QA commented on HBASE-5666:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12520963/HBASE-5666-v3.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.util.TestHBaseFsck
                  org.apache.hadoop.hbase.client.TestFromClientSide

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1368//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1368//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1368//console

This message is automatically generated.
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244537#comment-13244537 ] 

Matteo Bertozzi commented on HBASE-5666:
----------------------------------------

m... maybe i've lost something but, in 0.92 and trunk that code was removed and there's just a call to ZKUtil.createAndFailSilent() that doesn't retry. Any idea?

https://github.com/apache/hbase/commit/6dc7ccf3779add13188bd73011e0d25bbab77a05
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5666:
------------------------------

    Comment: was deleted

(was: The problem that I see with the patch v4 and the if (timeout == 0) special case is that exists() is different for ZooKeeper and RecoverableZookeeper.

RecoverableZookeeper has some internal retry logic for CONNECTIONLOSS, SESSIONEXPIRED, and OPERATIONTIMEOUT, to keep the code simple we can add this logic in ZKUtil.checkExist() in this way we can remove the special case, and remove the code in RecoverabeZK.)
    
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244471#comment-13244471 ] 

stack commented on HBASE-5666:
------------------------------

On creation of ZooKeeperWatcher, we do following.  Why is it not sufficient?

{code}
      // The first call against zk can fail with connection loss.  Seems common.
      // Apparently this is recoverable.  Retry a while.
      // See http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling
      // TODO: Generalize out in ZKUtil.
      long wait = conf.getLong(HConstants.ZOOKEEPER_RECOVERABLE_WAITTIME,
          HConstants.DEFAULT_ZOOKEPER_RECOVERABLE_WAITIME);
      long finished = System.currentTimeMillis() + wait;
      KeeperException ke = null;
      do {
        try {
          ZKUtil.createAndFailSilent(this, baseZNode);
          ke = null;
          break;
        } catch (KeeperException.ConnectionLossException e) {
          if (LOG.isDebugEnabled() && (isFinishedRetryingRecoverable(finished))) {
            LOG.debug("Retrying zk create for another " +
              (finished - System.currentTimeMillis()) +
              "ms; set 'hbase.zookeeper.recoverable.waittime' to change " +
              "wait time); " + e.getMessage());
          }
          ke = e;
        }
      } while (isFinishedRetryingRecoverable(finished));
{code}

Is the wait too short?
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250907#comment-13250907 ] 

Matteo Bertozzi commented on HBASE-5666:
----------------------------------------

{quote}
Does this need to be public? + public boolean checkIfBaseNodeAvailable(int timeout) {?
{quote}
ZooKeeperNodeTracker.checkIfBaseNodeAvailable(timeout) is used by the HRegion so needs to be public
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>    Affects Versions: 0.92.1, 0.94.0, 0.96.0
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242687#comment-13242687 ] 

Zhihong Yu commented on HBASE-5666:
-----------------------------------

I think we can introduce something similar to the following:
{code}
      conf.getInt("hbase.catalogtracker.default.timeout", 1000));
{code}
We can call the new config parameter "hbase.basenode.avail.timeout"
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-5666:
-----------------------------------

    Attachment:     (was: zk-exists-refactor-v0.patch)
    
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243296#comment-13243296 ] 

Zhihong Yu commented on HBASE-5666:
-----------------------------------

Patch makes sense.
Can you integrate it into HRegionServer ?

Thanks
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log, zk-exists-refactor-v0.patch
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245720#comment-13245720 ] 

Matteo Bertozzi commented on HBASE-5666:
----------------------------------------

Still looking at the 0.90 code...
The new ZooKeeperWatcher (>=0.92) calls the ZKUtil.createAndFailSilent(), to create base node and others, only if called by HMaster (canCreateBaseZNode = true), while before the code path was the same for everyone.

So now, if HMaster has not reached the "create base node" point, before the HRegionServer checks the existence of base node... the region server crashes...

If we want to keep the previous logic, the first one that arrives create the base node & co, we can remove the canCreateBaseZNode flag, else we can use HBASE-5666-v4.patch to wait and retry on checkExists().

what do you think?
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-5666:
-----------------------------------

    Attachment: HBASE-5666-v5.patch
    
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-5666:
-----------------------------------

    Attachment: HBASE-5666-v8.patch
    
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>    Affects Versions: 0.92.1, 0.94.0, 0.96.0
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-0.92.patch, HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, HBASE-5666-v8.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243310#comment-13243310 ] 

Zhihong Yu commented on HBASE-5666:
-----------------------------------

HConnectionImplementation.checkIfBaseNodeAvailable() doesn't take Abortable.
We can limit the scope of change in this JIRA.

Is that Okay ?
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log, zk-exists-refactor-v0.patch
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-5666:
-----------------------------------

    Status: Patch Available  (was: Open)
    
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v0.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log, zk-exists-refactor-v0.patch
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251100#comment-13251100 ] 

Matteo Bertozzi commented on HBASE-5666:
----------------------------------------

@stack
do you mean, change the log message in HConnectionManager?
{code}
Index: src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
===================================================================
--- src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java	(revision 1311874)
+++ src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java	(working copy)
@@ -776,6 +776,7 @@
         if (ZKUtil.checkExists(zkw, zkw.baseZNode) == -1) {
           errorMsg = "The node " + zkw.baseZNode+" is not in ZooKeeper. "
             + "It should have been written by the master. "
+            + "Check the master is running? "
             + "Check the value configured in 'zookeeper.znode.parent'. "
             + "There could be a mismatch with the one configured in the master.";
           LOG.error(errorMsg);
{code}
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>    Affects Versions: 0.92.1, 0.94.0, 0.96.0
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-0.92.patch, HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "nkeywal (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242796#comment-13242796 ] 

nkeywal commented on HBASE-5666:
--------------------------------

I confirm I didn't modify this part in trunk. But who knows. I will have a look at it this week end.
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-5666:
-----------------------------------

    Attachment: HBASE-5666-v8.patch
    
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>    Affects Versions: 0.92.1, 0.94.0, 0.96.0
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-0.92.patch, HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, HBASE-5666-v8.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251462#comment-13251462 ] 

Hadoop QA commented on HBASE-5666:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12522220/HBASE-5666-v8.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.regionserver.wal.TestHLog

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1477//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1477//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1477//console

This message is automatically generated.
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>    Affects Versions: 0.92.1, 0.94.0, 0.96.0
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-0.92.patch, HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, HBASE-5666-v8.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243302#comment-13243302 ] 

Matteo Bertozzi commented on HBASE-5666:
----------------------------------------

I was thinking to patch ZooKeeperNodeTracker.checkIfBaseNodeAvailable() and HConnectionImplementation.checkIfBaseNodeAvailable() instead of only the HRegionServer...

what do you think?

                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log, zk-exists-refactor-v0.patch
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-5666:
-----------------------------------

    Attachment: zk-exists-refactor-v0.patch

don't know... I've tried to refactor the method to get something useful and shared...
The problem is that checkExists() called by checkIfBaseNodeAvailable() uses a ZooKeeperWatcher and call exists() on  a RecoverableZooKeeper object, while waitForBaseZNode() has a plain ZooKeeper node... 
so the checkExists(ZooKeeperWatcher) implementation relays on the fact that the RecoverableZooKeeper.exists() is implemented as RZK.getZooKeeper().exists() which I don't like...
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log, zk-exists-refactor-v0.patch
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243236#comment-13243236 ] 

Matteo Bertozzi commented on HBASE-5666:
----------------------------------------

diving into the code, I've also noticed that there's a ZKUtil.waitForBaseZNode() that has already the retry logic but:
 - It takes a Configuration object
 - The timeout is set internally to 10000ms
 - It's only used by test/.../hbase/util/ProcessBasedLocalHBaseCluster.java

                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243902#comment-13243902 ] 

Hadoop QA commented on HBASE-5666:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12520850/HBASE-5666-v2.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.TestZooKeeper

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1363//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1363//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1363//console

This message is automatically generated.
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242801#comment-13242801 ] 

Matteo Bertozzi commented on HBASE-5666:
----------------------------------------

yep, just checked and the code seems unchanged from trunk to 0.92...
Anyone has a different idea on that?
or I can try to implement a retry loop?
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245734#comment-13245734 ] 

Zhihong Yu commented on HBASE-5666:
-----------------------------------

Patch v4 looks good. Minor comments below:

Since timeout of 0 is treated specially in this method:
{code}
+  public static int checkExists(ZooKeeperWatcher zkw, String znode, int timeout)
{code}
javadoc should mention special value of 0.
{code}
+   * @return true if baseznode exists.
+   *         false if doesnot exists.
+   */
+  public boolean checkIfBaseNodeAvailable(int timeout) {
{code}
The false return doesn't have to be mentioned. If you want to keep it, it should read 'if baseznode does not exist'
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244564#comment-13244564 ] 

Hadoop QA commented on HBASE-5666:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12521011/HBASE-5666-v4.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1370//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1370//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1370//console

This message is automatically generated.
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13245758#comment-13245758 ] 

Matteo Bertozzi commented on HBASE-5666:
----------------------------------------

The problem that I see with the patch v4 and the if (timeout == 0) special case is that exists() is different for ZooKeeper and RecoverableZookeeper.

RecoverableZookeeper has some internal retry logic for CONNECTIONLOSS, SESSIONEXPIRED, and OPERATIONTIMEOUT, to keep the code simple we can add this logic in ZKUtil.checkExist() in this way we can remove the special case, and remove the code in RecoverabeZK.
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi updated HBASE-5666:
-----------------------------------

    Attachment:     (was: HBASE-5666-v8.patch)
    
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>    Affects Versions: 0.92.1, 0.94.0, 0.96.0
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-0.92.patch, HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, HBASE-5666-v7.patch, HBASE-5666-v8.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13242696#comment-13242696 ] 

stack commented on HBASE-5666:
------------------------------

Please be careful here.

What version of hbase are we talking of.

We used to retry the first zk operation and then recoverablezk took over thereafter.   Code in trunk was recently refactored as part of 'HBASE-5399 Cut the link between the client and the zookeeper ensemble'.  Is this a teething issue that comes of that commit?
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Ted Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243863#comment-13243863 ] 

Ted Yu commented on HBASE-5666:
-------------------------------

Patch v2 looks good.
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250269#comment-13250269 ] 

stack commented on HBASE-5666:
------------------------------

Patch looks good.

Logs '{
+            LOG.warn(zkw.prefix("Unable to set watcher on znode (" + znode + ")"), keeperEx);
'
... but the method says its checkExists w/o setting watch.

I think this a bad idea; i.e. sleeping w/o interrupt.  How long is SOCKET_RETRY_WAIT_MS?  What if we try to stop the hosting server in meantime?  We have to wait on this to come up out of this loop?

+        Threads.sleepWithoutInterrupt(HConstants.SOCKET_RETRY_WAIT_MS);

Passing 0, are we supposed to try once only?  My guess is that we could try more than once given how the loop runs; i.e. we may loop multiple times in same millisecond.. you might want to exit loop if timeout is zero.

What happens if a client comes in during this time?  It will crash out immediately because no base node?

Thanks Matteo.
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>    Affects Versions: 0.92.1, 0.94.0, 0.96.0
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, HBASE-5666-v5.patch, HBASE-5666-v6.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matteo Bertozzi reassigned HBASE-5666:
--------------------------------------

    Assignee: Matteo Bertozzi
    
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13240751#comment-13240751 ] 

Zhihong Yu commented on HBASE-5666:
-----------------------------------

@Matteo:
You're suggesting addition of a loop to wait for base node to become available (in place of what we have now below) ?
{code}
    if (false == tracker.checkIfBaseNodeAvailable()) {
{code}

Ramkrishna added the check.
@Ram:
What do you think ?
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>         Attachments: hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "Matteo Bertozzi (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244488#comment-13244488 ] 

Matteo Bertozzi commented on HBASE-5666:
----------------------------------------

The problem here is that there's no ConnectionLossException... if you take a look at the log you can see that there's no KeeperException but zookeeper respond that the base node doesn't exists.
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5666) RegionServer doesn't retry to check if base node is available

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13244492#comment-13244492 ] 

stack commented on HBASE-5666:
------------------------------

This is the code that is supposed to create the base node right?  If we come out of here and there is no base node, then thats a problem?  Should the fix be down here in ZKW rather than up in regionserver?
                
> RegionServer doesn't retry to check if base node is available
> -------------------------------------------------------------
>
>                 Key: HBASE-5666
>                 URL: https://issues.apache.org/jira/browse/HBASE-5666
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver, zookeeper
>            Reporter: Matteo Bertozzi
>            Assignee: Matteo Bertozzi
>         Attachments: HBASE-5666-v1.patch, HBASE-5666-v2.patch, HBASE-5666-v3.patch, HBASE-5666-v4.patch, hbase-1-regionserver.log, hbase-2-regionserver.log, hbase-3-regionserver.log, hbase-master.log, hbase-regionserver.log, hbase-zookeeper.log
>
>
> I've a script that starts hbase and a couple of region servers in distributed mode (hbase.cluster.distributed = true)
> {code}
> $HBASE_HOME/bin/start-hbase.sh
> $HBASE_HOME/bin/local-regionservers.sh start 1 2 3
> {code}
> but the region servers are not able to start...
> It seems that during the RS start the the znode is still not available, and HRegionServer.initializeZooKeeper() check just once if the base not is available.
> {code}
> 2012-03-28 21:54:05,013 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.
> 2012-03-28 21:54:08,598 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server localhost,60202,1332964444824: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:626)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:596)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:558)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:672)
> 	at java.lang.Thread.run(Thread.java:662)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira