You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org> on 2012/11/29 22:36:59 UTC

[jira] [Commented] (HBASE-5844) Delete the region servers znode after a regions server crash

    [ https://issues.apache.org/jira/browse/HBASE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13506832#comment-13506832 ] 

Jean-Daniel Cryans commented on HBASE-5844:
-------------------------------------------

Encountered another problem that I think I can link to this jira, I was trying to run HBase from trunk without internet access and like in my Sept 25th comment, I get an empty line after start-hbase.sh but now nothing is running. The .log file doesn't show anything after logging ulimit and nothing's in the .out file. After running some bash -x, I was able to figure out that the nohup output was being suppressed. See:

{noformat}
jdcryans-MBPr:hbase-github jdcryans$ ./bin/start-hbase.sh 
jdcryans-MBPr:hbase-github jdcryans$
jdcryans-MBPr:hbase-github jdcryans$ bash -x ./bin/start-hbase.sh 
... some stuff then
+ /Users/jdcryans/git/hbase-github/bin/hbase-daemon.sh start master
jdcryans-MBPr:hbase-github jdcryans$ bash -x /Users/jdcryans/git/hbase-github/bin/hbase-daemon.sh start master
... more stuff
+ nohup /Users/jdcryans/git/hbase-github/bin/hbase-daemon.sh --config /Users/jdcryans/git/hbase-github/bin/../conf internal_start master
jdcryans-MBPr:hbase-github jdcryans$ nohup /Users/jdcryans/git/hbase-github/bin/hbase-daemon.sh --config /Users/jdcryans/git/hbase-github/bin/../conf internal_start master
appending output to nohup.out
{noformat}

So now I see that it's writing to nohup.out, which in turn tells me what really happened:

{noformat}
Caused by: java.lang.ClassNotFoundException: org.apache.zookeeper.KeeperException
	at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
{noformat}

Reproing can be done by physically deleting any jar listed in target/cached_classpath.txt. In my case I think the jar wasn't available because I had no internet connection.

I wonder what other errors it could hide like this.
                
> Delete the region servers znode after a regions server crash
> ------------------------------------------------------------
>
>                 Key: HBASE-5844
>                 URL: https://issues.apache.org/jira/browse/HBASE-5844
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>             Fix For: 0.96.0
>
>         Attachments: 5844.v1.patch, 5844.v2.patch, 5844.v3.patch, 5844.v3.patch, 5844.v4.patch
>
>
> today, if the regions server crashes, its znode is not deleted in ZooKeeper. So the recovery process will stop only after a timeout, usually 30s.
> By deleting the znode in start script, we remove this delay and the recovery starts immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira